| 研究生: |
陳奕廷 Chen, Yi-Ting |
|---|---|
| 論文名稱: |
3C產品網路中文評論文章之摘要生成系統 Abstract Generation System for Chinese Articles and Reviews of 3C Products |
| 指導教授: |
王宗一
Wang, Tzone I |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 58 |
| 中文關鍵詞: | 摘要提取 、意見探勘 、情感分析 、數位媒體 |
| 外文關鍵詞: | Opinion mining, Text summarization, Sentiment analysis, Digital media |
| 相關次數: | 點閱:96 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
數位媒體是許多人找尋參考資訊的管道,如Facebook、PTT與新聞網站等。當消費者想購買某一產品時,通常會先上網找尋相關的產品的評論資料,經過一番評估後再決定是否購買。然而,數位媒體所包含的文章、產品評論皆來自不同作者,格式排版較為雜亂,文章結構也無固定的格式,因此消費者必須花費許多時間彙整與過濾這些參考資料,並記住各篇評論文中所提及產品之優劣點。因此,要如何整合這些與產品相關的網路文章與評論,並將有用的資訊摘要後提供給消費者參考,是一個非常有意義的議題。過去雖然已有許多類似的的研究,但是以意見探勘、情感分析等為主,且主要是以英文評論為主要對象。
在本研究中,主要以中文的產品相關文章與評論為對象。目的是建立一套系統,消費者只需輸入產品名稱,系統便能輸出該項產品的相關摘要資訊,輔助消費者在短時間內做出購買的決策。本研究設計之系統具有爬蟲程式定時爬取網路3C評論文章存至資料庫。使用者經由系統介面輸入產品名稱,系統會根據資料庫找出與產品名稱最相關的10篇文章。針對這些文章,利用關聯式演算法取得產品重要特徵。獲取產品特徵後需得到對應的意見詞,本研究利用斷詞工具來標註詞性,並選取其中之形容詞、動詞與慣用詞作為可能的意見詞,再將這些可能的意見詞與產品特徵進行距離與關聯性的計算過濾出適合的意見詞。將含有產品特徵及意見詞的句子視為與產品相關的句子。但因為許多提及相同概念的句子,本研究會進行句子相似度的計算,將最不相似的句子保留下來進行排名,排名的方式是根據每個句子的關鍵詞的數量,最後取前20名的句子作為系統摘要。本研究設計了五個實驗,前四個實驗著重於參數的選擇,最後一個實驗則是比較不同摘要提取方法,根據實驗結果,本研究的方法與TextRank、Luhn方法進行比較後,證實本研究的方法可以達到更好的效果。
Internet has become popular and convenient. Product review articles are written by people on digital media platforms such as Facebook, PTT, Mobile01, and Apple Daily News. Most people read many articles and reviews on digital media before they want to buy a product. However, an overwhelming number of articles and reviews of products is available on the Internet, and a prospective buyer can become confused. Most people would spend a lot of time in reading articles and reviews for a product and try to organize useful information before deciding on whether to buy the product or not. Therefore, it is essential to summarize available data quickly and precisely and provide customers useful information.
Many researchers have investigated this task, but most studies have been focused on English reviews. This work focuses on Chinese articles and reviews in digital media, and propose a system designed to summarize data from them. When a user is interested in a product, the system extracts features and opinion words of the product from review articles and uses these features to identify sentences highly related to the product. After obtaining these sentences, the approach in this work selects top 20 important sentences to form the summary of the product, which is presented to the user. This work conducts several experiments to compare the effectiveness of TextRank, Luhn’s method, and the proposed approach. Among them, the approach proposed in this work exhibits the best performance.
網站
韓國Trendmonitor公司調查:https://www.smartm.com.tw/article/35343537cea3?fbclid=IwAR0IHMa0v5CGPOyqCzWbxw6am8mh9CpO6dusyKHBqPT5QDcv234mfJzMfU0
AYTM研究調查:https://group.dailyview.tw/article/detail/37?fbclid=IwAR08sbF4EgwarGLkn7ykG1q3YDsAa8gISFeVMJz40z00y3H1Jgy_CFfG67I
Fan and Fuel (2017)研究調查:http://www.halcyon.com/pub/journals/21ps03-vidmar
英文
Agrawal, R. and Srikant, R. (1994). Fast algorithm for mining association rules. VLDB’94
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Balahur, A., Kabadjov, M., Steinberger, J., Steinberger, R., & Montoyo, A. (2012). Challenges and solutions in the opinion summarization of user-generated content. Journal of Intelligent Information Systems, 39(2), 375-398.
Barrios, F., López, F., Argerich, L., & Wachenchauzer, R. (2016). Variations of the similarity function of textrank for automated summarization. arXiv preprint arXiv:1602.03606.
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of machine learning research, 3(Feb), 1137-1155.
Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22, 457-479.
Gene Zucker, H. (1978). The variable nature of news media influence. Annals of the International Communication Association, 2(1), 225-240.
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177). ACM.
Lin, C. Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81).
Lizhen, L., Wei, S., Hanshi, W., Chuchu, L., & Jingli, L. (2014). A novel feature-based method for sentiment analysis of Chinese product reviews. China communications, 11(3), 154-164.
Ma, S., Sun, X., Lin, J., & Wang, H. (2018). Autoencoder as assistant supervisor: Improving text representation for chinese social media text summarization. arXiv preprint arXiv:1805.04869.
Mangold, W. G., & Faulds, D. J. (2009). Social media: The new hybrid element of the promotion mix. Business horizons, 52(4), 357-365.
Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411).
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Mukherjee, S., & Bhattacharyya, P. (2013). Sentiment analysis: A literature survey. arXiv preprint arXiv:1304.4520.
Mutz, D. C. (1989). The influence of perceptions of media influence: Third person effects and the public expression of opinions. International Journal of Public Opinion Research, 1(1), 3-23.
Naik, S. S., & Gaonkar, M. N. (2017). Extractive text summarization by feature-based sentence extraction using rule-based concept. In 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT) (pp. 1364-1368). IEEE.
Nallapati, R., Zhou, B., Gulcehre, C., & Xiang, B. (2016). Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023.
Osatuyi, B. (2013). Information sharing on social media sites. Computers in Human Behavior, 29(6), 2622-2631.
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.
Pei, Y., C., Shin, y., W. and Jungsun Y. (2004). The Impact of Online Recommendations and Consumer Feedback on Sales. ICIS 2004 Proceedings, 58.
Popescu, A. M., & Etzioni, O. (2007). Extracting product features and opinions from reviews. In Natural language processing and text mining (pp. 9-28). Springer, London.
Saggionα, H., & Funk, A. (2010). Interpreting SentiWordNet for opinion classification. In Proceedings of the seventh conference on international language resources and evaluation LREC10.
Stoyanov, V., & Cardie, C. (2006). Toward opinion summarization: Linking the sources. In Proceedings of the Workshop on Sentiment and Subjectivity in Text (pp. 9-14).
Tang, D., Qin, B., Feng, X., & Liu, T. (2015). Effective LSTMs for target-dependent sentiment classification. arXiv preprint arXiv:1512.01100.
Tang, D., Qin, B., & Liu, T. (2015). Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 1422-1432).
Terrana, D., Augello, A., & Pilato, G. (2014). Automatic unsupervised polarity detection on a twitter data stream. In 2014 IEEE International Conference on Semantic Computing (pp. 128-134). IEEE.
Turney, P. D. (2002, July). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417-424). Association for Computational Linguistics.
Wang, B., & Wang, H. (2008). Bootstrapping both product features and opinion words from chinese customer reviews with cross-inducing. In Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I.
Yu, L. C., Wang, J., Lai, K. R., & Zhang, X. (2017). Refining word embeddings using intensity scores for sentiment analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(3), 671-681.