| 研究生: |
戴延任 Tai, Yen-Jen |
|---|---|
| 論文名稱: |
透過標籤傳遞的自動化專屬領域情緒字典之建立 Automatic Domain-Specific Sentiment Lexicon Generation with Label Propagation |
| 指導教授: |
高宏宇
Kao, Hung-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 64 |
| 中文關鍵詞: | 情緒字典 、情感分析 、推特 |
| 外文關鍵詞: | Sentiment Lexicon, Sentiment Analysis, Twitter |
| 相關次數: | 點閱:83 下載:5 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
今日,由於社群媒體的優點使得大眾的意見數量呈現爆炸性的成長,隨之而來的情緒分析研究受到眾人的矚目。目前,研究情緒分析主要區分成兩種方法,以字典為基礎的方法及透過標定的資料來建立模型,然後第二種方法面臨了困難點便是需要收集大量人工標定的資料。第一種方法則需要透過已存在的情緒字典來決定意見的好壞。 雖然目前有許多公認適合的字典可供使用,但是並不存在單一的字典可以涵蓋所有領域字詞的意義,因此建立一本專屬領域的情緒字典成為一個重要的工作。
在我們的研究中,我們提出了一個建立專屬領域情緒字典的方法。首先,我們在一個無任何人工標定的資料集中計算字與字之間的語意相似度,接下來將字與字的關係建立成文字關聯圖,最後透過以圖為基礎的半監督式傳播方法將事先標定的值傳播到詞性未知的字。我們使用推特的資料集建立金融領域情緒字典,經由評估證實我們的方法除了比其他方法優異,相對於一般的情緒字典也有更突出的表現。
Nowadays, the advance of social media has led to the explosive growth of opinion data. Therefore, sentiment analysis has attracted a lot of attentions. Currently, sentiment analysis applications are divided into two main approaches, the lexicon-based approach and the machine-learning approach. However, both of them face the challenge of obtaining a large amount of human-labeled training data and corpus. For the lexicon-based approach, it requires a sentiment lexicon (sentiment dictionary) to determine the opinion polarity. There are many existing benchmark sentiment lexicons, but they cannot cover all the domain-specific words meanings. Thus, automatic generation of a domain-specific sentiment lexicon becomes an important task.
In this paper, we propose a framework to automatically generate sentiment lexicon. First, we determine the semantic similarity between two words in the entire unlabeled corpus. We treat the words as nodes and similarities as weighted edges to construct word graphs. A graph-based semi-supervised label propagation method finally assigns the polarity to unlabeled words through the proposed propagation process. Experiments conducted on the microblog data, Twitter, show that our approach leads to a better performance than baseline approaches and general-purpose sentiment dictionaries.
[1]. Bar-Haim, R., E. Dinur, R. Feldman, M. Fresko, and G. Goldstein, Identifying and following expert investors in stock microblogs, in Proceedings of the Conference on Empirical Methods in Natural Language Processing(ACL). 2011, Association for Computational Linguistics: Edinburgh, United Kingdom. p. 1310-1319.
[2]. Bollen, J., H. Mao, and X. Zeng, Twitter mood predicts the stock market. Journal of Computational Science, 2011. 2(1): p. 1-8.
[3]. Bollen, J., A. Pepe, and H. Mao, Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. CoRR, 2009: p. -1--1.
[4]. Fung, G.P.C., J.X. Yu, and W. Lam, News Sensitive Stock Trend Prediction, in Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. 2002, Springer-Verlag. p. 481-493.
[5]. Gidofalvi, G., Using News Articles to Predict Stock Price Movements. Department of Computer Science and Engineering University of California, San Diego, 2001.
[6]. Gimpel, K., N. Schneider, B. O'Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N.A. Smith, Part-of-speech tagging for Twitter: annotation, features, and experiments, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2(ACL). 2011, Association for Computational Linguistics: Portland, Oregon. p. 42-47.
[7]. Hatzivassiloglou, V. and K.R. McKeown, Predicting the semantic orientation of adjectives. 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 1997: p. 174-181.
[8]. Hu, M. and B. Liu, Mining and summarizing customer reviews, in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. 2004, ACM: Seattle, WA, USA. p. 168-177.
[9]. Islam, A. and D. Inkpen, Second order co-occurrence PMI for determining the semantic similarity of words, in Proceedings of the International Conference on Language Resources and Evaluation (LREC). 2006. p. 1033-1038.
[10]. Kamps, J., M. Marx, R.J. Mokken, and M.D. Rijke, Using wordnet to measure semantic orientation of adjectives, in National Institute for. 2004. p. 1115-1118.
[11]. Kim, S.-M. and E. Hovy, Identifying and analyzing judgment opinions, in Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (ACL). 2006, Association for Computational Linguistics: New York, New York. p. 200-207.
[12]. Lavrenko, V., M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan, Mining of Concurrent Text and Time Series. In proceedings of the 6 th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining Workshop on Text Mining, 2000.
[13]. Loughran, T. and B. McDonald, When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks. Journal of Finance, 2010.
[14]. Lu, Y., M. Castellanos, U. Dayal, and C. Zhai, Automatic construction of a context-aware sentiment lexicon: an optimization approach, in Proceedings of the 20th international conference on World wide web. 2011, ACM: Hyderabad, India. p. 347-356.
[15]. Mao, H., S. Counts, and J. Bollen, Predicting Financial Markets- Comparing Survey,news,twitter and search engine data. 2011.
[16]. Marneff, M.-c.D., B. Maccartney, and C.D. Manning, GeneratingTyped Dependency Parses from Phrase Structure Parses, in Proceedings of the conference on Language Resources and Evaluation (LREC). 2006. p. 449-454.
[17]. Miller, G.A., R. Beckwith, C. Fellbaum, D. Gross, and K.J. Miller, Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography(special issue), 1990: p. 3(4):235-312.
[18]. O'Connor, B., R. Balasubramanyan, B.R. Routledge, and N.A. Smith, From tweets to polls: Linking text sentiment to public opinion time series. Fourth International AAAI Conference on Weblogs and Social Media, 2010.
[19]. Rao, D. and D. Ravichandran, Semi-supervised polarity lexicon induction, in Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. 2009, Association for Computational Linguistics: Athens, Greece. p. 675-682.
[20]. Ruiz, E.J., V. Hristidis, C. Castillo, and A. Gionis, Correlating Financial Time Series with Micro-Blogging Activity. Proceedings of the fifth ACM international conference on Web search and data mining (WSDM), 2012: p. 513-522
[21]. Sebastiani, A.E.a.F. SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining. in In Proceedings of the 5th Conference on Language Resources and Evaluation(LREC). 2006.
[22]. Stone, P.J. and E.B. Hunt, A computer approach to content analysis: studies using the General Inquirer system, in Proceedings of the May 21-23, 1963, spring joint computer conference. 1963, ACM: Detroit, Michigan. p. 241-256.
[23]. Turney, P.D., Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews, in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002, Association for Computational Linguistics: Philadelphia, Pennsylvania. p. 417-424.
[24]. Turney, P.D. and M.L. Littman, Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst., 2003. 21(4): p. 315-346.
[25]. Yu, H.-C., T.-H. Huang, and H.-H. Chen, Domain Dependent Word Polarity Analysis for Sentiment Classification, in Computational Linguistics and Chinese Language Processing(ROCLING). 2012. p. 33-48.
[26]. Zhang, W. and S. Skiena, Trading Strategies To Exploit Blog and News Sentimen. Proceedinga of the Fourth International AAAI Conference on Weblogs and Social media, 2010.