簡易檢索 / 詳目顯示

研究生: 彭智暉
Peng, Chih-Hui
論文名稱: 基於主題偵測與情緒分析之多階層式社群偵測研究
Multi-Level Community Detection Based on Topic Identification and Sentiment Analysis
指導教授: 郭耀煌
Kuo, Yaw-Huang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 92
中文關鍵詞: 隱藏社群偵測主題辨識意見探勘情緒分析
外文關鍵詞: hidden community detection, topic identification, opinion mining, sentiment analysis
相關次數: 點閱:114下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於社群網路與微網誌的興起,從中找尋具有相同屬性的使用者並構成社群成為一項有意義的議題。傳統的社群偵測研究使用明確的關係進行偵測,但是無法獲取隱含的資訊,這些隱含資訊必須藉由分析使用者的行為才能得到,因此,便衍生出隱藏式社群偵測研究,然而以往隱藏式社群偵測研究必須事先定義某些屬性(如:共同討論的話題)才能偵測出對應的社群,並不適合用於無標題的微網誌討論串中,該話題必須經由辨識討論串內容才能夠得到,此外以往的隱藏式社群偵測研究並沒有考慮使用者的想法,單純地將參與相同討論串的使用者們分配至同一個社群中,然而使用者們擁有相同的想法通常代表他們更類似於對方。
    為此,本論文採用主題辨識針對目標使用者參與的無標題討論串進行辨識,並利用使用者意見代表使用者對該討論串的想法,最後隱藏式社群是由與目標使用者有相同意見的使用者們所構成,即在同一個社群中,使用者們對於某個話題擁有相同的意見,因此,本論文所提出的隱藏式社群偵測框架將社群稱作意見一致性隱藏式社群,意見一致性隱藏式社群框架包括四個主要步驟:(一)蒐集目標使用者在微網誌參與相關文章的資訊並建立資料集合,(二)利用維基百科輔助辨識無標題討論串的主題,有別於Schonhofen 與Huynh使用維基百科上適當文章的類別當作該討論串的主題,本論文採用適當文章的標題當作該討論串的主題,實驗結果也指出本論文所提出的方法有不錯的效能表現,(三)大多的情緒分析研究使用二極性情緒分析,但是卻不能得到使用者的情緒程度,因此,本論文利用多面向情緒分析模型與定義情緒相似度以偵測與目標使用者意見一致的使用者群集,並讓同一個社群中的使用者們更為相似,以及將類似的主題進行合併,(四)利用不同的分析規模,建構出多階層式意見一致隱藏式社群,且不同階層中的使用者們關注的特性也不同。
    實驗的部分主要包括:(一)展示主題辨識的效能,在時間複雜度上分別領先Schonhofen與Huynh的方法1.8和26倍,(二)利用量化測量對多維度情緒分析模型與語意導向Pointwise Mutual Information (PMI)進行效能評估;與語意導向PMI相比,本論文提出的多維度情緒分析模型在意見一致性閥值為0.1時擁有最小的效能提升9.7%,閥值為0.9時達到最大的效能提升78.1%,(三)利用量化測量對多維度情緒分析與語意導向PMI在多階層式意見一致性隱藏社群進行效能評估;與語意導向PMI相比,本論文提出的多維度情緒分析模型在意見一致性閥值0.1時擁有最小的效能提升3.3%,閥值為0.7時達到最大的效能提升48.4%。
    本論文提出的方法與模型如下:(一)意見一致性隱藏社群偵測框架,(二)主題辨識方法,(三)多面向情緒分析模型,(四)多階層式社群偵測模型。藉由上述方法與模型可以得到與目標使用者意見一致的使用者群集。

    Owing to social network growth and micro-blog rise, the meaningful issue is to detect the users who have the same attributes on the social network for constructing the communities. Most of researches of traditional community detection use the explicit relationship to detect the community but it cannot acquire the implicit or hidden information of the users. The hidden information is received by analyzing the user behavior. Therefore, the researches of hidden community detection are derived. However, the previous approaches of hidden community detection which have to predefine some attributes (e.g. the discussed topic) are not suitable for the non-title posts in the micro-blog. The corresponding topics of the non-title posts can be identified by analyzing the post content. Moreover, the previous approaches of hidden community detection do not consider the user thinking, they purely assign the users to the same community when the users had participated in the post. However, the users have the same thinking which represents they are similar with each other.
    For these reasons, this thesis adopts topic identification to identify the topic of the non-title posts the target user participated, and the user opinions represent the user thinking on the post. Finally, the hidden communities are composed by the users who have the same opinion with the target user, namely the users have the same opinion at the topic in the same community. In this thesis, the community which is detected by the framework of hidden community detection is called Opinion-Consistent Hidden Community (OCHC), The OCHC framework consists of four major steps: firstly, the information of participated posts of the target user in micro-blog is collected to create the dataset. Secondly, Wikipedia is used to aid for identifying the topics of non-title posts. Different from Schonhofen’s and Huynh’s methods which use the appropriate category of the article from Wikipedia as the topic of the post, this thesis employs the appropriate title of the article. The experimental result shows that the method this thesis proposed has the fine performance. Thirdly, polarity sentiment analysis is used by most of the researches of sentiment analysis but it cannot receive the sentiment degree of the user. Hence, opinion-consistent users with the target user are detected by multi-dimension sentiment analysis (MDSA) model and the sentiment similarity criterion. The users in the same community will more similar to each other by MDSA model. Then, the similar topic posts are combined. Fourthly, the multi-level OCHC user sets are constructed by the different scopes. The users will focus on the different features in the different levels.
    In the experiments, firstly, the performance of topic identification is addressed. Our method is 1.8 times faster than the method Schonhofen proposed method and 26 times faster than the method Huynh proposed. Secondly, the quantitative measure for performance evaluation in MDSA model we proposed and Semantic Orientation Pointwise Mutual Information (SOPMI) model is presented. Comparing with SOPMI model, the performance of MDSA model has improved at least 9.7% in the threshold 0.1, and achieved 78.1% improvement at the most in the threshold 0.9. Thirdly, the quantitative measure for performance evaluation in multi-level OCHC with MDSA model and SOPMI model is presented. Comparing with SOPMI model, the performance of MDSA model has improved at least 3.3% in the threshold 0.1, and achieved 48.4% improvement at the most in the threshold 0.7.
    The methods and models this thesis proposed are as follows: firstly, an OCHC framework is addressed. Secondly, an approach of topic identification is presented. Thirdly, a MDSA model is proposed. Finally, a multi-level OCHC model is proposed. The users who have opinion-consistency with the target user can acquire by the methods and models mentioned above.

    List of Tables xiv List of Figures xv Chapter 1 Introduction 1 1.1 Motivation 3 1.2 Contributions 10 1.3 Organization 11 Chapter 2 Background and Related Work 12 2.1 Hidden Community Detection 12 2.2 Topic Identification 17 2.3 Opinion Analysis 22 Chapter 3 Opinion-Consistent Hidden Community Detection 24 3.1 Community Structure 24 3.2 Data Collection 28 3.3 Topic Identification 31 3.3.1 Noise Filtering 31 3.3.2 Document Generation 32 3.3.3 Corpus Collection and Identification 32 3.4 Opinion Analysis and Topic Combination 38 3.4.1 Comment Filtering 39 3.4.2 Sentiment Analysis 40 3.4.3 Sentiment Similarity Evaluation 44 3.4.4 Similar Topic Combination 46 3.5 Community Detection 49 3.5.1 Opinion-Consistent Hidden Community Lv1 50 3.5.2 Opinion-Consistent Hidden Community Lv2 52 3.5.3 Opinion-Consistent Hidden Community Lv3 58 Chapter 4 Experiments 60 4.1 Performance Evaluation for Topic Identification 60 4.2 Quantitative Measure for Sentiment Analysis Methods 67 4.2.1 Polarity Sentiment Analysis 67 4.2.2 Multi-Dimension Sentiment Analysis 73 4.3 Quantitative Measure for Multi-Level Community Detection 77 Chapter 5 Conclusions and Future Works 87 References 88

    [Cha99] S. Chakrabarti, M. van den Berg, and B. Dom, “Focused crawling : a new approach to topic-specific Web resource discovery,” Computer Networks, vol. 31, pp. 1623-1640, 1999.
    [Fil08] M. Filippone, F. Camastra, F. Masulli, and S. Rovetta, “A survey of kernel and spectral methods for clustering,” Pattern Recognition, vol. 41, pp. 176-190, 2008.
    [For10] S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, pp. 75-174, 2010.
    [Gir02] M. Girvan and M. E. J. Newman, “Community Structure in Social and Biological Networks,” In Proceedings of the National Academy of Sciences, vol. 99, pp. 7821-7826, 2002.
    [Huy09] D. T. Huynh, T. H. Cao, P. H. T. Pham, and T. N. Hoang, “Using Hyperlink Texts to Improve Quality of Identifying Document Topics Based on Wikipedia,” International Conference on Knowledge and Systems Engineering (KSE '09), pp. 249-254, 2009.
    [Joa98] T. Joachims, “Text Categorization with Suport Vector Machines: Learning with Many Relevant Features,” In Proceedings of the 10th European Conference on Machine Learning, vol. 1398, pp. 137-142, 1998.
    [Lai09] H. F. Lai, “Identify Implicit Social Network by RST/FL Framework,” International Conference on Advances in Social Network Analysis and Mining (ASONAM '09), pp. 362-363, 2009.
    [Lia09] W. Liang, C. Leckie, K. Ramamohanarao, and J. Bezdek, “Automatically Determining the Number of Clusters in Unlabeled Data Sets,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, pp. 335-350, 2009.
    [Lux07] U. Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, pp. 395-416, 2007.
    [New04] M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical Review, vol. E 69, nr. 026113, 2004.
    [Pan02] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sentiment classification using machine learning techniques,” In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '02), vol. 10, pp. 79-86, 2002.
    [Por09] M. A. Porter, J.-P. Onnela, and P. J. Mucha, “Communities in Networks,” Notices of the American Mathematical Society, vol.56, pp. 1082-1097, 2009.
    [San09] A. G. Evsukoff, B. S.L.P. de Lima, and N. F. F. Ebecken, “Potential collaboration discovery using document clustering and community structure detection,” In Proceeding of the 1st ACM international workshop on Complex networks meet information & knowledge management (CNIKM '09), pp. 39-46, 2009.
    [Sch06] P. Schonhofen, “Identifying Document Topics Using the Wikipedia Category Network,” In Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI '06), pp. 456-462, 2006.
    [Sco91] J. Scott, “Social Network Analysis: A Handbook,” SAGE, London, 1991.
    [Sha09] S. K. Shandilya and S. Jain, “Opinion Extraction & Classification of Reviews from Web Documents,” In IEEE International Advance Computing Conference (IACC '06), pp. 924-927, 2009.
    [Smi09] M. Smith, C. Giraud-Carrier, N. Purser, “Implicit affinity networks and social capital,” Information Technology and Management, vol. 10, pp. 123-134, 2009.
    [Sto63] P. J. Stone and E. B. Hunt, “A computer approach to content analysis: studies using the General Inquirer system,” In Proceedings of the spring joint computer conference (AFIPS '63), pp. 241-256, 1963.
    [Tiu01] S. Tiun, R. Abdullah, and T. E. Kong, “Automatic Topic Identification Using Ontology Hierarchy,” In Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing (CICLing '01), pp.444-453, 2001.
    [Tur02] P. D. Turney, “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews,” In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417-424, 2002.
    [Tur03] P. D. Turney and M. L. Littman, “Measuring praise and criticism: Inference of semantic orientation from association,” ACM Transactions on Information System, vol. 21, pp. 315-346, 2003.
    [Yon09] S-H Yoon, J-H Shin, S-W Kim, and S. Park, “Extraction of a latent blog community based on subject,” In Proceeding of the 18th ACM conference on Information and knowledge management (CIKM '09), pp. 1529-1532, 2009.

    下載圖示 校內:2012-09-02公開
    校外:2013-09-02公開
    QR CODE