簡易檢索 / 詳目顯示

研究生: 陳奕廷
Chen, Yi-Ting
論文名稱: 3C產品網路中文評論文章之摘要生成系統
Abstract Generation System for Chinese Articles and Reviews of 3C Products
指導教授: 王宗一
Wang, Tzone I
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 58
中文關鍵詞: 摘要提取意見探勘情感分析數位媒體
外文關鍵詞: Opinion mining, Text summarization, Sentiment analysis, Digital media
相關次數: 點閱:96下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 數位媒體是許多人找尋參考資訊的管道,如Facebook、PTT與新聞網站等。當消費者想購買某一產品時,通常會先上網找尋相關的產品的評論資料,經過一番評估後再決定是否購買。然而,數位媒體所包含的文章、產品評論皆來自不同作者,格式排版較為雜亂,文章結構也無固定的格式,因此消費者必須花費許多時間彙整與過濾這些參考資料,並記住各篇評論文中所提及產品之優劣點。因此,要如何整合這些與產品相關的網路文章與評論,並將有用的資訊摘要後提供給消費者參考,是一個非常有意義的議題。過去雖然已有許多類似的的研究,但是以意見探勘、情感分析等為主,且主要是以英文評論為主要對象。
    在本研究中,主要以中文的產品相關文章與評論為對象。目的是建立一套系統,消費者只需輸入產品名稱,系統便能輸出該項產品的相關摘要資訊,輔助消費者在短時間內做出購買的決策。本研究設計之系統具有爬蟲程式定時爬取網路3C評論文章存至資料庫。使用者經由系統介面輸入產品名稱,系統會根據資料庫找出與產品名稱最相關的10篇文章。針對這些文章,利用關聯式演算法取得產品重要特徵。獲取產品特徵後需得到對應的意見詞,本研究利用斷詞工具來標註詞性,並選取其中之形容詞、動詞與慣用詞作為可能的意見詞,再將這些可能的意見詞與產品特徵進行距離與關聯性的計算過濾出適合的意見詞。將含有產品特徵及意見詞的句子視為與產品相關的句子。但因為許多提及相同概念的句子,本研究會進行句子相似度的計算,將最不相似的句子保留下來進行排名,排名的方式是根據每個句子的關鍵詞的數量,最後取前20名的句子作為系統摘要。本研究設計了五個實驗,前四個實驗著重於參數的選擇,最後一個實驗則是比較不同摘要提取方法,根據實驗結果,本研究的方法與TextRank、Luhn方法進行比較後,證實本研究的方法可以達到更好的效果。

    Internet has become popular and convenient. Product review articles are written by people on digital media platforms such as Facebook, PTT, Mobile01, and Apple Daily News. Most people read many articles and reviews on digital media before they want to buy a product. However, an overwhelming number of articles and reviews of products is available on the Internet, and a prospective buyer can become confused. Most people would spend a lot of time in reading articles and reviews for a product and try to organize useful information before deciding on whether to buy the product or not. Therefore, it is essential to summarize available data quickly and precisely and provide customers useful information.
    Many researchers have investigated this task, but most studies have been focused on English reviews. This work focuses on Chinese articles and reviews in digital media, and propose a system designed to summarize data from them. When a user is interested in a product, the system extracts features and opinion words of the product from review articles and uses these features to identify sentences highly related to the product. After obtaining these sentences, the approach in this work selects top 20 important sentences to form the summary of the product, which is presented to the user. This work conducts several experiments to compare the effectiveness of TextRank, Luhn’s method, and the proposed approach. Among them, the approach proposed in this work exhibits the best performance.

    摘要 I Extented Abstract II 致謝 V 目錄 VI 表目錄 VIII 圖目錄 IX 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 4 1.3 研究方法 4 1.4 研究貢獻 5 第二章 文獻探討 6 2.1 產品評論提取 6 2.2 摘要提取 7 2.3 數位媒體 9 2.4 文本處理技術 10 第三章 系統設計與架構 13 3.1 系統架構 13 3.2 取得產品相關文章與資料前處理 14 3.3 產品特徵提取 15 3.4 意見詞提取 18 3.5 上下文 20 3.6 句子分數 22 3.7 摘要提取 22 第四章 實驗設計與結果 24 4.1 實驗設計 24 4.2 評估工具 24 4.3 實驗結果與分析 26 4.4 系統雛型 35 第五章 結論與未來展望 38 5.1 結論 38 5.2 未來展望 38 參考文獻 39 附錄一、 mate 20 pro 人工摘要、TextRank、Luhn與本研究的摘要結果 42 附錄二、 Google pixel 3人工摘要、TextRank、Luhn與本研究的摘要結果 51

    網站
    韓國Trendmonitor公司調查:https://www.smartm.com.tw/article/35343537cea3?fbclid=IwAR0IHMa0v5CGPOyqCzWbxw6am8mh9CpO6dusyKHBqPT5QDcv234mfJzMfU0
    AYTM研究調查:https://group.dailyview.tw/article/detail/37?fbclid=IwAR08sbF4EgwarGLkn7ykG1q3YDsAa8gISFeVMJz40z00y3H1Jgy_CFfG67I
    Fan and Fuel (2017)研究調查:http://www.halcyon.com/pub/journals/21ps03-vidmar

    英文
    Agrawal, R. and Srikant, R. (1994). Fast algorithm for mining association rules. VLDB’94
    Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
    Balahur, A., Kabadjov, M., Steinberger, J., Steinberger, R., & Montoyo, A. (2012). Challenges and solutions in the opinion summarization of user-generated content. Journal of Intelligent Information Systems, 39(2), 375-398.
    Barrios, F., López, F., Argerich, L., & Wachenchauzer, R. (2016). Variations of the similarity function of textrank for automated summarization. arXiv preprint arXiv:1602.03606.
    Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of machine learning research, 3(Feb), 1137-1155.
    Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22, 457-479.
    Gene Zucker, H. (1978). The variable nature of news media influence. Annals of the International Communication Association, 2(1), 225-240.
    Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177). ACM.
    Lin, C. Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81).
    Lizhen, L., Wei, S., Hanshi, W., Chuchu, L., & Jingli, L. (2014). A novel feature-based method for sentiment analysis of Chinese product reviews. China communications, 11(3), 154-164.
    Ma, S., Sun, X., Lin, J., & Wang, H. (2018). Autoencoder as assistant supervisor: Improving text representation for chinese social media text summarization. arXiv preprint arXiv:1805.04869.
    Mangold, W. G., & Faulds, D. J. (2009). Social media: The new hybrid element of the promotion mix. Business horizons, 52(4), 357-365.
    Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411).
    Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
    Mukherjee, S., & Bhattacharyya, P. (2013). Sentiment analysis: A literature survey. arXiv preprint arXiv:1304.4520.
    Mutz, D. C. (1989). The influence of perceptions of media influence: Third person effects and the public expression of opinions. International Journal of Public Opinion Research, 1(1), 3-23.
    Naik, S. S., & Gaonkar, M. N. (2017). Extractive text summarization by feature-based sentence extraction using rule-based concept. In 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT) (pp. 1364-1368). IEEE.
    Nallapati, R., Zhou, B., Gulcehre, C., & Xiang, B. (2016). Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023.
    Osatuyi, B. (2013). Information sharing on social media sites. Computers in Human Behavior, 29(6), 2622-2631.
    Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.
    Pei, Y., C., Shin, y., W. and Jungsun Y. (2004). The Impact of Online Recommendations and Consumer Feedback on Sales. ICIS 2004 Proceedings, 58.
    Popescu, A. M., & Etzioni, O. (2007). Extracting product features and opinions from reviews. In Natural language processing and text mining (pp. 9-28). Springer, London.

    Saggionα, H., & Funk, A. (2010). Interpreting SentiWordNet for opinion classification. In Proceedings of the seventh conference on international language resources and evaluation LREC10.
    Stoyanov, V., & Cardie, C. (2006). Toward opinion summarization: Linking the sources. In Proceedings of the Workshop on Sentiment and Subjectivity in Text (pp. 9-14).
    Tang, D., Qin, B., Feng, X., & Liu, T. (2015). Effective LSTMs for target-dependent sentiment classification. arXiv preprint arXiv:1512.01100.
    Tang, D., Qin, B., & Liu, T. (2015). Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 1422-1432).
    Terrana, D., Augello, A., & Pilato, G. (2014). Automatic unsupervised polarity detection on a twitter data stream. In 2014 IEEE International Conference on Semantic Computing (pp. 128-134). IEEE.
    Turney, P. D. (2002, July). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417-424). Association for Computational Linguistics.
    Wang, B., & Wang, H. (2008). Bootstrapping both product features and opinion words from chinese customer reviews with cross-inducing. In Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I.
    Yu, L. C., Wang, J., Lai, K. R., & Zhang, X. (2017). Refining word embeddings using intensity scores for sentiment analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(3), 671-681.

    下載圖示
    2024-08-17公開
    QR CODE