| 研究生: |
陳冠熙 Chen, Kwan-Hsi |
|---|---|
| 論文名稱: |
利用使用者評論及產品概述網頁擷取產品特色與評價 Extraction of Product Feature and Opinion from Customer Reviews and Product Overview Pages |
| 指導教授: |
盧文祥
Lu, Wen-Hsiang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 中文 |
| 論文頁數: | 67 |
| 中文關鍵詞: | 產品 、評論 、特徵 、意見 |
| 外文關鍵詞: | product, review, feature, opinion |
| 相關次數: | 點閱:89 下載:6 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著web的蓬勃發展,隨著越來越多的使用者會發表產品的使用者評論在web中,web成為最大的使用者評論來源,整合web上的使用者評論的工作也變得越來越重要。使用者觀看一則使用者評論主要是希望得到產品的feature以及相關的opinion。因此在本論文中我們提出了Unsupervised方法擷取產品評論的feature以及opinion,利用網路資源協助我們擷取feature以及opinion,為了方便使用者閱讀,我們將使用者評論中的feature與相關的opinion結合,以feature-opinion pair的形式呈現給使用者瀏覽。另一方面,我們也會利用使用者評論中的feature-opinion pair將同類型的產品排序,給予使用者選擇產品上的幫助。以下我們將介紹本論文中重要的工作。
本篇論文提出了以網路資源為基礎的產品feature擷取方法。在過去的研究中feature擷取只限於使用者評論,但是仍有許多feature在使用者評論中無法被擷取,因此我們除了從使用者評論中擷取feature外,我們另外採用使用產品介紹網頁做為feature擷取的來源,將這兩個來源取得的feature整合後,我們可以得到更好的結果。
Opinion可以分為已知以及未知二類。論文中使用的已知opinion是由General Inquirer網站取得人工整理的形容詞opinion word list。由於opinion沒有特定的詞性,此外opinion word list也無法完全列出全部的形容詞opinion,因此本論文中將會針對opinion word list沒有列出的未知opinion進行處理,利用使用者評論中opinion word經常與feature一起出現的特性發展擷取未知opinion的方法。
在使用者評論中feature與鄰近的opinion不一定是有關聯的,因此在我們取得feature以及opinion後,必須對於feature以及其附近的opinion做關聯性的驗證。論文中利用語法結構驗證feature以及opinion的關連性,判斷feature與opinion在使用者評論中是否為相關聯的Feature-Opinion pair。
為了方便使用者在選擇產品時有依據,因此我們希望將使用者評論數據化後,計算每個產品的好壞提供給使用者參考,Feature-Opinion pair可以幫助我們將評論簡化為不同的Feature-Opinion pair組成的集合,利用評論中Feature-Opinion pair出現次數將評論量化,利用量化過的Feature-Opinion pair就可以計算每個產品在使用者評論中被推薦的機率值。
As the web becomes more and more popular, more and more users post their reviews on the web. Since the web has become the largest resource of customer review, the intergration of these reviews becomes important. Users often focus on products’ features and their related opinions in writing or reading customer reviews. In this paper, we propose several unsupervised web based methods to extract products’ features and opinions from customer reviews and product overview pages. Then we can present users feature-opinion pairs after we validating the features and their nearby opinion. Furthermore we can use feature-opinion pairs to rank products. Product rank can help users to choos products which meet their needs. We will present the work of this thesis in the following parts.
In this thesis, we propose a web based feature extraction method. Customer reviews is the only resource of feature extraction in the previous researchs. But there are still many infrequent features in the customer reviews which can not be extracted. We not only extract frequent features from customer reviews but also extract infrequent features from product overview pages. Using these two resources, we can get a better feature extraction results.
In this thesis, we classify opinions into two types. The one is known opinion and the other is unknown opinion. We get the adjective known opinion word list which is a manually collected opinion list from the General Inquirer website. Since the part of speech of an opinion is not only adjective, there are opinions with other kinds of part of speech, and the opinion word list is not complete, we would like to extract opinions that are not in the opinion word list. Based on the property that feature and opinion often get together in customer reviews, we propose an unknown opinion extraction method.
A feature is always close to with the nearby opinion in customer reviews. It depends on the semantic relation. If we want to know a feature and its related opinion in customer reviews, it is necessary to validate the relation between a feature and its nearby opinion. In this paper, we use the POS tag structure as a clue to validate if a feature and its nearby opinion is a Feature-Opinion pair in customer reviews.
In order to facilitate users to choose appropriate products, we would like to transform the customers into Feature-Opinion pair vectors. Using these Feature-Opinion pair vectors, we can calculate the suggestion probability of a product among all the customer reviews. Using these suggestion probabilities we can produce a product rank. Users can reference feature-opinion pairs of this product while choosing a certain product.
[1] Carenini, G., Ng, R., Zwart, E. 2005. Extracting Knowledge from Evaluative Text. K-CAP’05
[2] Das, S. and Chen, M., 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. APFA’01.
[3] Dave, K., Lawrence, S., and Pennock, D. 2003. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. WWW’03.
[4] Dave, K., Lawrence, S., and Pennock, M. 2005. Opinion Extraction and Semantic Classification of Product Reviews. WWW’05
[5] Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D., and Yates, A. 2005. Unsupervised named-entity extraction from the web: An experimental study. EMNLP’05
[6] Ghani, R., and Jones, R., A comparison of efficacy of bootstrapping algorithms for information extraction. In LREC 2002 Workshop on Linguistic Knowledge Acquisition, 2002.
[7] Grenager, T., Klein, D., and Manning, C. 2005. Unsupervised Learning of Field Segmentation Models for Information Extraction. ACL’05
[8] Ghani, R., Probst, K., Liu, Y. Krema, M., and Fano, A. 2006. Text Mining for Product Attribute Extraction. SIGKDD Explorations, 8(1):41-48
[9] Hatzivassiloglou, V., and McKeown, K. 1997. Predicting the semantic
orientation of adjectives. ACL/EACL’97
[10] Hatzivassiloglou, V., and Wiebe, J. 2000. Effects of adjective orientation and gradability on sentence subjectivity. COLING’00.
[11] Hu, M., and Liu, B. 2004. Mining and summarizing customer reviews. KDD’04
[12] Hu, M., and Liu, B. 2004 Mining opinion features in customer reviews. AAAI’04.
[13] Hu, M., Liu, B., and Cheng, J. 2005. Opinion observer: Analyzing and comparing opinions on the web. WWW’05.
[14] Kim, S., and Hovy, E. 2004. Determining the sentiment of opinions. COLING’04.
[15] Ku, L., Liang, Y., and Chen, H. 2006. Opinion Extraction, Summarization and Tracking. AAAI’06
[16] Lin D., Automatic retrieval and clustering of similar words. COLING-ACL’98, pages 768–774
[17] Lafferty, J., McCallum, A., Pereira, F. 2001. Conditional random fields: probabilistic models for segmenting and labeling or sequence data. ICML’01.
[18] Lin, W., Wilson, T., Wiebe, J., Hauptmann, A. 2006. Which Side are You on? Identifying Perspectives at the Document and Sentence Levels. CoNLL’06
[19] Morinaga, S., Yamanishi, K., Tateishi, K., and Fukushima, T. Mining product reputations on the web. ACM SIGKDD, 2002.
[20] Nasukawa, T. and Yi, J. 2003. Sentiment analysis: Capturing favorability using natural language processing. K-CA’03
[21] Pang, B., Lee, L. and Vaithyanathan, S. 2005. Thumbs up? Sentiment Classification Using Machine Techniques. EMNLP’ 02.
[22] Popescu, A., and Etzioni, O., 2005. Extracting product features and opinions from reviews. EMNLP’05
[23] Radev, D., and McKeown, K. 1998. Generating natural language summaries from multiple on-line sources. Computational Linguistics, 24(3):469-500, September 1998.
[24] Riloff, E., and Wiebe, J. 2003. Learning extraction patterns for subjective expressions. EMNLP’2003.
[25] Rosario, B., and Hearst, M. 2004. Classifying Semantic Relations in Bioscience Text. ACL’ 04.
[26] Tong, R. 2001. An operational system for detecting and tracking opinions in on-line discussion. In SIGIR Workshop on Operational Text Classifiation, 2001.
[27] Turney, P. 2002. Thumbs Up or Thumbs Down?Semantic Orientation Applied to Unsupervised Classification of Reviews. ACL’02.
[28] Wiebe, J., 2000. Learning subjective adjectives from corpora. AAAI/IAAI’00
[29] Zhai, Y., and Liu, B. 2005. Web data extraction based on partial tree alignment. WWW’05.
[30] Zhuang, L., Jing, F., and Zhu, X. 2006. Movie Review Mining and Summarization. CIKM’06.