簡易檢索 / 詳目顯示

研究生: 楊文籐
Yang, Wen-Teng
論文名稱: 一個基於維基百科歷史修訂資訊之字詞語意關聯度評量方法
Measuring Semantic Relatedness by Wikipedia Revision Information
指導教授: 高宏宇
Kao, Hung-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 54
中文關鍵詞: 概念關聯語意關係HITS演算法維基百科知識
外文關鍵詞: Concept association, Semantic relatedness, HITS Algorithm, Wikipedia knowledge
相關次數: 點閱:116下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今在網際網路上,維基百科是一個高準確性、高使用率、高含蓋率且是讀者共同編輯的百科全書。同時因為維基百科具有高度連結密度、即時更新、網址辨識概念和完整的編輯歷程等特性利於研究學者擷取結構性大量知識,所以也是研究領域的一塊無價瑰寶。在這篇論文中,我們利用字詞在維基百科中的文章來代表字詞的涵意,此外每篇維基百科的文章代表個別的概念且同時其內容還包含其他概念。也就是說,評量文章之間的關聯便是字詞之間的關聯。為此我們提出Editor-Contribution-based Rank (ECR)演算法將出現在文章編輯歷程內容的所有概念分等並且將這些分等概念組成一個代表該文章的向量。而ECR演算法概念分等是依靠編輯者和概念之間的關係來決定。我們依照編輯者的編輯行為是增加或刪除與被編輯的概念是合適或不合適該文章將關系分成四種。最後實驗顯示我們的方法隨著維基百科的發展能取得更好的結果,並且比先前計算兩個字詞概念的關聯的方法增加了4.4%的效能改進。

    Nowadays, Wikipedia is an accurate, fashion, huge and wiki-based encyclopedia on the WWW. Simultaneously, Wikipedia is invaluable resource for research work because Wikipedia have many useful properties to enrich. Those properties contain high dense links, live update, URL identification for concepts and complete revision history, etc. In this paper, we deal with the articles which the words represent in Wikipedia. Moreover, each Wikipedia article represents an individual concept and simultaneously contains other concepts which are hyperlinks of other articles in its content. Namely, the semantic relatedness between two articles is also the semantic relatedness between two words. Therefore, we propose an Editor-Contribution-based Rank (ECR) algorithm for ranking the concepts in the content of all revisions of article and take the ranked concepts as a vector to represent the article. ECR ranks those concepts depending on the relationship between concepts and the editors. We classify four types of relationship which behavior of addition and deletion maps to appropriate and inappropriate concepts. The experiment shows our method is better with the development of Wikipedia and gets 4.4% improvement over previous methods which calculate the relatedness between two articles.

    中文摘要 III ABSTRACT IV 誌謝 V 序章 1 相關研究 2 方法 3 實驗 4 結論 5 1. INTRODUCTION 6 1.1 BACKGROUND 6 1.2 MOTIVATION 9 1.3 OUR METHOD 12 2. RELATED WORK 15 2.1 NON-WIKIPEDIA-BASED APPROACHES 15 2.1.1. Corpus-based approaches 15 2.1.2. Lexical-based approaches 16 2.1.3. Search Engine-based approaches 16 2.2 WIKIPEDIA-BASED APPROACHES 17 2.2.1 WikiRelate 17 2.2.2 Explicit Semantic Analysis 17 2.2.3 Wikipedia Link-based Measure 18 2.2.4 Path Frequency Inversed Backward link Frequency 19 2.2.5 Link co-occurrence Analysis 20 3. METHOD 22 3.1 PRE-EVALUATION 22 3.1.1 Time factor 23 3.1.2 Editor factor 25 3.2 EDITOR-CONTRIBUTION-BASED RANKING (ECR) ALGORITHM 26 3.2.1 HITS algorithm 27 3.2.2 Concept Classification 28 3.2.3 Contribution Relationship 28 3.2.4 Iterative Calculation 32 3.2.5 ECR analysis in distinct cases 34 3.2.6 Measure relatedness between words 35 3.3 CONCEPT EXTENSION FROM EDITORS 35 4. EXPERIMENTS 38 4.1 DATASET 38 4.2 EVALUATION CRITERIONS 39 4.3 PRELIMINARY OBSERVATION OF CONCEPTS AND RELATIONSHIPS WITH DIFFERENT PARAMETER Α 40 4.4 COMPARISON OF RESULTS OF RANK CORRELATION COEFFICIENT 41 4.4.1 Correlation coefficients of ECR of different parameter α 42 4.4.2 Correlation coefficients of baselines and previous methods 43 4.4.3 Correlation coefficients of ECR with different period ranges 47 4.5 VALIDITY OF THE EVALUATION 48 4.6 EVALUATION ON EXTENSION VECTOR 50 5. CONCLUSION 52 6. REFERENCES 53

    [1] F. Bellomi and R. Bonato, "Network Analysis for Wikipedia," in Wikimania 2005 - The First International WikimediaConference, 2005.
    [2] A. Bhole, B. Fortuna, M. Grobelnik, and D. Mladenic, "Extracting Named Entities and Relating Them over Time Based on Wikipedia.," Informatica (Slovenia), pp. 463-468, 2007.
    [3] U. Brandes, P. Kenis, J. Lerner, and D. v. Raaij, "Network analysis of collaboration structure in Wikipedia," in Proceedings of the 18th international conference on World wide web Madrid, Spain: ACM, 2009.
    [4] R. L. Cilibrasi and P. M. B. Vitanyi, "The Google Similarity Distance," IEEE Trans. on Knowl. and Data Eng., vol. 19, pp. 370-383, 2007.
    [5] C. Fellbaum, "WordNet: An Electronic Lexical Database," in Cambridge, MA: MIT Press, 1998.
    [6] L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin, "Placing search in context: the concept revisited," in ACM Trans. Inf. Syst., 2002, pp. 116-131.
    [7] E. Gabrilovich and S. Markovitch, "Computing semantic relatedness using Wikipedia-based explicit semantic analysis," in Proceedings of the 20th international joint conference on Artifical intelligence Hyderabad, India: Morgan Kaufmann Publishers Inc., 2007.
    [8] J. Giles, "Internet encyclopedias go head to head," Nature,438:900-901, 2005.
    [9] X. Han and J. Zhao, "Named entity disambiguation by leveraging wikipedia semantic knowledge," in Proceeding of the 18th ACM conference on Information and knowledge management Hong Kong, China: ACM, 2009.
    [10] G. Hirst and D. St-Onge, "Lexical chains as representations of context for the detection and correction of malapropisms," in Cambridge, Mass: MIT Press, 1998, pp. 305-332.
    [11] X. Hu, N. Sun, C. Zhang, and T.-S. Chua, "Exploiting internal and external semantics for the clustering of short texts using world knowledge," in Proceeding of the 18th ACM conference on Information and knowledge management Hong Kong, China: ACM, 2009.
    [12] M. Ito, K. Nakayama, T. Hara, and S. Nishio, "Association thesaurus construction methods based on link co-occurrence analysis for wikipedia," in Proceeding of the 17th ACM conference on Information and knowledge management Napa Valley, California, USA: ACM, 2008.
    [13] M. Jarmasz and S. Szpakowicz, "Roget's Thesaurus and Semantic Similarity," in Conference on Recent Advances in Natural Language Processing 2003, 2003, pp. 212-219.
    [14] J. M. Kleinberg, "Authoritative sources in a hyperlinked environment," J. ACM, vol. 46, pp. 604-632, 1999.
    [15] T. K. Landauer, P. W. Foltz, and D. Laham, "An introduction to latent semantic analysis," in Discourse Processes 25(2-3), 1998, pp. 259-284.
    [16] O. Medelyan, D. Milne, C. Legg, and I. H. Witten, "Mining meaning from Wikipedia," Int. J. Hum.-Comput. Stud., vol. 67, pp. 716-754, 2009.
    [17] O. Medelyan, I. H. Witten, and D. Milne, "Topic Indexing with Wikipedia," in In WIKIAI, AAAI 2008, 2008.
    [18] D. Milne and I. H. Witten, "An effective, low-cost measure of semantic relatedness obtained from Wikipedia links," in Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence Chicago, IL, 2008.
    [19] K. Nakayama, T. Hara, and S. Nishio, "Wikipedia mining for an association web thesaurus construction," in Proceedings of the 8th international conference on Web information systems engineering Nancy, France: Springer-Verlag, 2007.
    [20] Roget, "Roget's Thesaurus of English Words and Phrases," in Longman Group Ltd, 1852.
    [21] G. Salton and M. J. McGill, "An introduction to Modern Information Retrieval," in McGraw-Hill, 1983.
    [22] H. Schutze and J. O. Pedersen, "A cooccurrence-based thesaurus and two applications to information retrieval," Inf. Process. Manage., vol. 33, pp. 307-318, 1997.
    [23] M. Shirakawa, K. Nakayama, T. Hara, and S. Nishio, "Concept vector extraction from Wikipedia category network," in Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication Suwon, Korea: ACM, 2009.
    [24] M. Strube and S. P. Ponzetto, "WikiRelate! computing semantic relatedness using wikipedia," in proceedings of the 21st national conference on Artificial intelligence - Volume 2 Boston, Massachusetts: AAAI Press, 2006.
    [25] P. Turney, "Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL," in Proceedings of ECML 2001, 2001.
    [26] F. B. Viegas, M. Wattenberg, and K. Dave, "Studying cooperation and conflict between authors with history flow visualizations," in Proceedings of the SIGCHI conference on Human factors in computing systems Vienna, Austria: ACM, 2004.
    [27] J. Zobal and A. Moffat, "Exploring the similarity space," in ACM SIGIR Forum, 1998, pp. 18-34.

    下載圖示 校內:2011-08-27公開
    校外:2012-08-27公開
    QR CODE