| 研究生: |
林宇凡 Lin, Yu-Fan |
|---|---|
| 論文名稱: |
一個透過社群與類別關係之影響傳遞的重要新聞擷取方法 The Retrieval of Important News Stories by Influence Propagation among Communities and Categories |
| 指導教授: |
高宏宇
Kao, Hung-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 英文 |
| 論文頁數: | 70 |
| 中文關鍵詞: | 新聞粹取 、部落格 、突發性資訊 、影響傳遞 |
| 外文關鍵詞: | News story distillation, Blog, Information Bursty, Influence propagation |
| 相關次數: | 點閱:108 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
今日,了解每天發生的新聞已經成為我們生活中重要的事情。而新聞報紙或是網站的功能就是幫助使用者了解每天發生的新聞,而新聞編輯者會經過挑選將重要的新聞放在頭版上。但是每天的新聞數量往往很龐大,但僅僅有少數的重要新聞能被放到頭版上。所以說如何使用適當的資訊和方法讓電腦如新聞編輯者去自動判斷重要的新聞,更進一步能幫助讀者了解當天重要的新聞,這個議題就顯得頗為重要。我們研究的目標是如何讓系統自動為各個類別挑選當天重要的新聞並加以排名。在這篇論文中,我們利用專業衡量新聞價值的標準來進行我們的研究,將這些標準套用至社群、新聞與類別之間互相影響的關係中。過往的研究對於判斷各個類別底下的重要新聞,僅僅只用分類器將新聞分類而並未有更多的探討,但是我們認為新聞與類別之間不僅僅只是單純分類的關係,就算是屬於同類別的新聞,應該要能夠區分他們對類別重要性的不同。在我們的研究觀察下,更進一步發現不同類別之間的新聞可以透過部落格社群的表現而彼此之間也會受到相互影響。為此我們提出三個重要特徵值類別相關度(category relevance)、部落格關注力(bloggers’ attention)以及突發影響力(bursty influence),並建立一個社群、新聞與分類相互影響傳遞之模組,並且提出一個跨類別社群影響傳遞 (Cross-Category Social Influence Propagation, C-SIP) 方法將模組裡各個互相影響傳遞的關係結合,替各個類別找尋可能為重要的新聞並加以排名。最後我們方法的驗證方式採用TREC 2010 中Blog Track的資料,實驗顯示我們的方法在不同類別中均有較佳的表現,並且比TREC 2010 Blog Track 最佳表現的參賽者高出了9.94%的效能表現。
Nowadays, people receive information of the news stories not only from newspapers but also from online news websites. They search important news stories in order to know what happen today. However, it is hard to browse all the news stories published on a day. It is necessary to identify which news stories are more newsworthy on the specific day. In this paper, we investigate how to automatically identify the importance of news stories for different categories on a specific day by utilizing the influence propagation among communities and categories. In particular, we build an influence propagation model which consists of three features: category relevance, bloggers’ attention and bursty influence. Based on this influence propagation model, we propose a Cross-Category Social Influence Propagation (C-SIP) approach for scoring the importance of news stories on a specific day. For the feature category relevance, we measure the strength of relation between the categories and the news stories. We utilize category relevance to indicate the importance of the news stories for the categories. For the feature bloggers’ attention, the blogosphere are specific communities in which users express their attention on various news stories. We utilize the knowledge of the bloggers from the blogosphere to indicate the importance of the news stories on a specific day. Furthermore, we utilize the feature bursty influence to enhance our scoring approach. We evaluate our approach by using the judgment of Story Ranking Task in TREC 2010 Blog Track. The experiment shows our approach attains a prominent performance in the retrieval of important news stories and gets 9.94% improvement over the best performance of participating systems in TREC 2010 Blog Track.
[1] Adar, E., L. Zhang, L.A. Adamic, and R.M. Lukose, Implicit Structure and the Dynamics of Blogspace, in Wks. Weblogging Ecosystem. 2004.
[2] Aslam, J.A. and V. Pavlu. A practical sampling strategy for efficient retrieval evaluation. Technical Report, North Eastern University.
[3] Carterette, B., V. Pavlu, E. Kanoulas, J.A. Aslam, and J. Allan, Evaluation over thousands of queries, in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. 2008, ACM: Singapore, Singapore. p. 651-658.
[4] Gamon, M., S. Basu, D. Belenko, D. Fisher, M. Hurst, and A.C. Kanig, BLEWS: Using Blogs to Provide Context for News Articles, in ICWSM'08. 2008.
[5] Godbole, N., M. Srinivasaiah, and S. Skiena, Large-Scale Sentiment Analysis for News and Blogs, in ICWSM'07. 2007.
[6] Gruhl, D., D. Liben-Nowell, R. Guha, and A. Tomkins, Information diffusion through blogspace. SIGKDD Explor. Newsl., 2004. 6(2): p. 43-52.
[7] JONES, K.S., A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 1972. 28(1): p. 11-21.
[8] Jones, R. and F. Diaz, Temporal profiles of queries. ACM Trans. Inf. Syst., 2007. 25(3): p. 14.
[9] Keikha, M., P. Mahdabi, S. Gerani, G. Inches, J. Parapary, M. Carman, and F. Crestani. University of Lugano at TREC 2010. in TREC 2010 Blog Track. 2010.
[10] Kleinberg, J., Bursty and hierarchical structure in streams, in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 2002, ACM: Edmonton, Alberta, Canada. p. 91-101.
[11] Lee, Y., H.-y. Jung, W. Song, and J.-H. Lee, Mining the blogosphere for top news stories identification, in Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 2010, ACM: Geneva, Switzerland. p. 395-402.
[12] Lee, Y., W. Song, H.-y. Jung, V.T. Thanh, and J.-H. Lee. TREC 2010 Blog Track: Top Stories Identification. in TREC 2010 Blog Track. 2010.
[13] Lerman, K. and T. Hogg, Using a model of social dynamics to predict popularity of news, in Proceedings of the 19th international conference on World wide web. 2010, ACM: Raleigh, North Carolina, USA. p. 621-630.
[14] Leskovec, J., L. Backstrom, and J. Kleinberg, Meme-tracking and the dynamics of the news cycle, in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 2009, ACM: Paris, France. p. 497-506.
[15] Leskovec, J., M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst, Cascading Behavior in Large Blog Graphs, in SDM. 2007.
[16] Lin, Y.-F., J.-H. Wang, L.-C. Lai, and H.-Y. Kao. Top Stories Identification From Blog to News In TREC 2010 Blog Track. in TREC 2010 Blog Track. 2010.
[17] Macdonald, C., I. Ounis, and I. Soboroff. Overview of the TREC2009 Blog Track. in Proceeding of TREC 2009. 2009.
[18] Macdonald, C., R.L.T. Santos, I. Ounis, and I. Soboroff, Blog track research at TREC. SIGIR Forum, 2010. 44(1): p. 58-75.
[19] McCreadie, R., C. Macdonald, and I. Ounis, Crowdsourcing Blog Track Top News Judgments at TREC, in WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining (CSDM 2011). 2011.
[20] McCreadie, R.M.C., C. Macdonald, and I. Ounis, News article ranking: leveraging the wisdom of bloggers, in Adaptivity, Personalization and Fusion of Heterogeneous Information. 2010, LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE: Paris, France. p. 40-48.
[21] Mishne, G. and M.d. Rijke. A study of blog search. in Proceedings of ECIR 2006, pages 289-301. Springer, 2006.
[22] Obradovi´c, D., S. Baumann, and A. Dengel, A Social Network Analysis and Mining Methodology for the Monitoring of Specific Domains in the Blogosphere, in International Conference on Advances in Social Networks Analysis and Mining(ASONAM). 2010.
[23] Ounis, I., C. Macdonald, and I. Soboroff. Overview of the TREC2010 Blog Track. in In Proceedings of TREC 2010. 2010.
[24] Roussinov, D. University of Strathclyde at Headline Ranking TREC BLOG 2010. in TREC 2010 Blog Track. 2010.
[25] Sakaki, T., M. Okazaki, and Y. Matsuo, Earthquake shakes Twitter users: real-time event detection by social sensors, in Proceedings of the 19th international conference on World wide web. 2010, ACM: Raleigh, North Carolina, USA. p. 851-860.
[26] Santos, R.L.T., R. McCreadie, C. Macdonald, and I. Ounis. University of Glasgow at TREC 2010: Experiments with Terrier in Blog and Web tracks. in TREC 2010 Blog Track. 2010.
[27] Sayyadi, H., M. Hurst, and A. Maykov, Event Detection and Tracking in Social Streams, in Proceedings of the International Conference on Weblogs and Social Media (ICWSM). 2009.
[28] Schirru, R., D. Obradovic, S. Baumann, and P. Wortmann, Domain-specific identification of topics and trends in the blogosphere, in Proceedings of ICDM. 2010, Springer-Verlag: Berlin, Germany. p. 490-504.
[29] Takama, Y., A. Matsumura, and T. Kajinami, Visualization of News Distribution in Blog Space, in Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology. 2006, IEEE Computer Society. p. 413-416.
[30] Thelwall, M. Bloggers during the London attacks: Top information sources and topics. in Proceedings of the 3rd annual workshop on the Weblogging Ecosystem, WWW 2006, Edinburgh, Scotland.
[31] Xu, X., Y. Liu, H. Xu, X. Yu, Z. Peng, X. Cheng, L. Xiao, and S. Nie. ICTNET at Blog Track TREC 2010. in TREC 2010 Blog Track. 2010.