| 研究生: |
李冠霖 Li, Guan-Lin |
|---|---|
| 論文名稱: |
未知評分短文潛在意見面向探勘 Latent Aspect Mining for Short and Unrated Review |
| 指導教授: |
高宏宇
Kao, Hung-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 英文 |
| 論文頁數: | 50 |
| 中文關鍵詞: | 短文本 、主題模型 、潛在面向分析 |
| 外文關鍵詞: | Short Text, Topic Model, Latent Aspect Rating Analysis |
| 相關次數: | 點閱:129 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網路的興起,越來越多的評論網站隨之而生,像是TripAdvisor和Amazon等,而大量的文字讓使用者不易快速理解作者的意思,因此評分就變成多數網站必備的資訊之一,甚至可以針對不同面向做評分,藉由分數讓其它人可以一目了然,但並非所有網站皆有完整的評分資訊,潛在面向評分分析(LARAM與SACM)被提出來試圖從這些評論中推測出不同面向的評分。近年來,隨著社群網站的崛起,使用者的習慣隨之改變,短文在評論上的比例增加,在資訊量不足的情況要如何準確的預測不同面向的評分變成了一大問題,因而有了潛在面向評分在短文上的想法。然而在短文上難以利用主題模型做面向辨識,同時過於稀疏的文字所推論出的面向評分無法貼近實際情況。而目前很少有針對短文上研究成功的案例,如面向辨識與評分模型(Aspect Identification and Rating, AIR),假設高分的評論出現正向詞機率越高,反之越低,除了面向分佈還加入了情緒分佈,同時以評論中情緒詞的比例加上總分的影響來推測面向評分,但AIR過度依賴總評分,若使用者有面向評分與總分差異過大或者沒有總評分,預測就容易失準。
在本篇論文中,我們基於AIR架構上提出一個叫RAIR的模型和兩個預測總評分的方法。我們的方法從訓練文本中產生文字評分分佈來預測測試文本的總評分,再由取樣出不同面向的情緒詞推斷面向評分。實驗顯示在真實的資料集中,我們的方法在沒有已知的總評分下比起AIR用預測總評分還要來的好。
With the growth of the Internet, more and more review websites are born, such as TripAdvisor and Amazon. However, numerous text makes user hardly to understand opinion from the author in short time. Therefore, rating becomes one essential information on most of the websites, even scoring for different aspect. It makes people understand from glance by the score of rating, but not all sites contain complete information for aspect rating. Latent Aspect Rating Analysis (e.g. LARAM and SACM) has been proposed to infer aspect and aspect rating from reviews. In recent years, with the rapidly growing of social media, the habit of users is changing with tend and the proportion of short text in reviews are increasing. How to accurately predict the aspect rating on sparse data becomes a big issue, since using the topic model to implement aspect identification in short text and sparse information is difficult to match ground-truth. Therefore, there are few success cases of Latent Aspect Rating Analysis in short text, one of them is Aspect Identification and Rating (AIR). AIR assumes high scored reviews are more likely to occur positive polarity word, on the contrary is negative polarity word. By this assumption, AIR combines sentiment distribution into topic model, then uses word sentiment proportion by sampling to infer aspect rating. Furthermore, if the gap of aspect rating and overall rating is too large, or overall rating is missing, the accuracy of AIR would be inaccurate, since AIR is over-reliance on overall rating.
In this paper, we propose a unified generative model, named RAIR, based on the structure of AIR and two predicting overall rating method. Our method will generate rating distribution from the training data and predict the overall rating of unrated data. Then, we sample words to different aspect and sentiment to infer latent aspect rating. Experiment results on real world dataset without overall rating demonstrate the effect of our method is better than AIR with predicting overall rating.
[1] Blei, D.M., A.Y. Ng, and M.I. Jordan, "Latent dirichlet allocation". Journal of machine Learning research, 2003. 3(Jan): p. 993-1022.
[2] Brody, S. and N. Elhadad. "An unsupervised aspect-sentiment model for online reviews". in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2010. 804-812: Association for Computational Linguistics.
[3] Chen, G.-B. and H.-Y. Kao, "Word co-occurrence augmented topic model in short text". Intelligent Data Analysis, 2017. 21(S1): p. S55-S70.
[4] Hai, Z., et al. "Coarse-to-fine review selection via supervised joint aspect and sentiment model". in Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 2014. 617-626: ACM.
[5] Hofmann, T. "Probabilistic latent semantic indexing". in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 1999. 50-57: ACM.
[6] Jo, Y. and A.H. Oh. "Aspect and sentiment unification model for online review analysis". in Proceedings of the fourth ACM international conference on Web search and data mining. 2011. 815-824: ACM.
[7] Le, Q. and T. Mikolov. "Distributed representations of sentences and documents". in Proceedings of the 31st International Conference on Machine Learning (ICML-14). 2014. 1188-1196.
[8] Lei, X., X. Qian, and G. Zhao, "Rating prediction based on social sentiment from textual reviews". IEEE Transactions on Multimedia, 2016. 18(9): p. 1910-1921.
[9] Li, C., et al. "Topic modeling for short texts with auxiliary word embeddings". in Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 2016. 165-174: ACM.
[10] Li, H., et al. "Generative models for mining latent aspects and their ratings from short reviews". in Data Mining (ICDM), 2015 IEEE International Conference on. 2015. 241-250: IEEE.
[11] Lu, B., et al. "Multi-aspect sentiment analysis with topic models". in Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on. 2011. 81-88: IEEE.
[12] Lu, Y., C. Zhai, and N. Sundaresan. "Rated aspect summarization of short comments". in Proceedings of the 18th international conference on World wide web. 2009. 131-140: ACM.
[13] Luo, W., et al. "Ratable aspects over sentiments: Predicting ratings for unrated reviews". in Data Mining (ICDM), 2014 IEEE International Conference on. 2014. 380-389: IEEE.
[14] McAuley, J., J. Leskovec, and D. Jurafsky. "Learning attitudes and attributes from multi-aspect reviews". in Data Mining (ICDM), 2012 IEEE 12th International Conference on. 2012. 1020-1025: IEEE.
[15] Mcauliffe, J.D. and D.M. Blei. "Supervised topic models". in Advances in neural information processing systems. 2008. 121-128.
[16] Porter, M.F., "An algorithm for suffix stripping". Program, 1980. 14(3): p. 130-137.
[17] Ramage, D., et al. "Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora". in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. 2009. 248-256: Association for Computational Linguistics.
[18] Snyder, B. and R. Barzilay. "Multiple Aspect Ranking Using the Good Grief Algorithm". in HLT-NAACL. 2007. 300-307.
[19] Tang, D., et al. "User Modeling with Neural Network for Review Rating Prediction". in IJCAI. 2015. 1340-1346.
[20] Wang, H., Y. Lu, and C. Zhai. "Latent aspect rating analysis on review text data: a rating regression approach". in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 2010. 783-792: ACM.
[21] Wang, H., Y. Lu, and C. Zhai. "Latent aspect rating analysis without aspect keyword supervision". in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 2011. 618-626: ACM.
[22] Wang, S., Z. Chen, and B. Liu. "Mining aspect-specific opinion using a holistic lifelong topic model". in Proceedings of the 25th International Conference on World Wide Web. 2016. 167-176: International World Wide Web Conferences Steering Committee.
[23] Xu, Y., et al. "Latent aspect mining via exploring sparsity and intrinsic information". in Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 2014. 879-888: ACM.
[24] Yan, X., et al. "A biterm topic model for short texts". in Proceedings of the 22nd international conference on World Wide Web. 2013. 1445-1456: ACM.
[25] Yang, Y. and J.O. Pedersen. "A comparative study on feature selection in text categorization". in Icml. 1997. 412-420.
[26] Yin, J. and J. Wang. "A dirichlet multinomial mixture model-based approach for short text clustering". in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014. 233-242: ACM.