簡易檢索 / 詳目顯示

研究生: 陳弘宇
Chen, Hung-Yu
論文名稱: 一個以搜尋引擎為基礎的生醫字詞語意相關度量測之互生方法
A Search Engine-based Mutually Reinforcing Approach on Measuring Semantics Relatedness of Biomedical Terms
指導教授: 高宏宇
kao, Hung-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 醫學資訊研究所
Institute of Medical Informatics
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 59
中文關鍵詞: 語意關係詞彙樣式HITS演算法搜尋引擎
外文關鍵詞: Semantic relatedness, Lexical pattern, HITS Algorithm, Search engine
相關次數: 點閱:123下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 評估兩個生物醫學字詞之間語意的關聯程度對於生物醫學領域的資訊檢索、自然語言處理或是文獻探勘是一個很重要的任務。當兩個字詞出現在同一個句子中,通常是描述兩個字詞之間存在的關係,從這些關係中可以了解字詞之間的關聯程度。在以往的多數研究中,學者利用搜尋引擎提供的資訊,觀察生物醫學字詞之間在句子中組成的形式並且針對組成的形式建立成一個詞彙樣式(lexical pattern),而詞彙樣式的確提供了了解兩個字詞之間關聯的特性,但是字詞之間的關係不能說明字詞之間關聯性的強弱。所以,在這篇研究中,我們提出了一個語意樣式互生排名Mutually Reinforcing Lexical Pattern Ranking (ReLPR)演算法,針對高度關聯的生物醫學同義詞組學習具有影響力的生物醫學同義詞詞彙樣式,利用這些生物醫學同義詞詞彙樣式評估生物醫學字詞之間的關聯程度。ReLPR演算法的概念是針對裝載詞彙樣式的容器和詞彙樣式之間的關係決定詞彙樣式的一個影響力,辭彙容器所含有的詞彙樣式會決定一個辭彙容器提供字詞之間關聯程度是否是重要的資訊,因此容器內的詞彙樣式就有其影響能力。最後的實驗顯示我們的方法在兩個生物醫學的資料集下,共相關係數達到0.803~0.838,與之前研究的方法比較我們評估生物醫學字詞之間語意的關聯程度有明顯的改善。

    Identifying the semantic relatedness of two biomedical terms is an important task for the information retrieval, natural language processing, and text mining in the biomedical field. When two terms co-occur in a sentence, there are several semantic relations between them. In view of these semantic relations, we can understand the manner in which two terms are associated. In previous study, the information of search engines has been extensively used to analyze the patterns of biomedical terms and transform them into lexical patterns. The lexical patterns represent the characteristics of two terms, but they are unable to estimate the correlations of two terms. Therefore, in this work, we propose the Mutually Reinforcing Lexical Pattern Ranking (ReLPR) algorithm for learning and exploring the lexical patterns of synonym pairs in the biomedical field. The ReLPR algorithm employs the lexical patterns and their pattern containers to assess the influence of pattern structures from search engines, and the lexical patterns of containers determine the capability of semantic relatedness. As a result, the correlation coefficients of the Re algorithm, on the average, achieved 0.82 on various datasets, which shows the ReLPR algorithm performed significantly better than previous methods.

    CONTENT 中文摘要 III ABSTRACT IV 誌謝 V FIGURE LISTING VIII TABLE LISTING X 1. INTRODUCTION 1 1.1 Background 1 1.2 Motivation 5 1.3 Our approach 12 1.4 Paper structure 13 2. RELATED WORK 14 2.1 Related search 14 2.1.1 Ontology-based approach 14 2.1.2 Corpus-based approach 15 2.1.3 Search engine-based approach 16 2.2 Knowledge resources 19 2.2.1 Yahoo! search BOSS 19 2.2.2 MedicineNet.com 20 2.2.3 Synonyms.net 20 3. METHOD 22 3.1 Acquisition of synonym pairs 23 3.2 Crawl concept pair from search engine 24 3.3 Extracting Lexical Pattern from Snippets 25 3.4 ReLPR: Mutually Reinforcing Lexical Pattern Ranking algorithm 26 3.5 Measuring Semantic Relatedness 31 4. EXPERIMENTS 33 4.1 Dataset 33 4.2 Evaluation criterions 36 4.3 Description of comparing baseline method 36 4.4 Comparison of results of rank correlation coefficient 38 4.4.1 Analysis of training set 38 4.4.2 Compare with baseline method 47 4.4.3 Compare with other method 54 5. CONCLUSIONS 56 6. REFERENCES 57

    REFERENCES
    [1]. Al-Mubaid, H. and H.A. Nguyen, Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 2009. 39(4): p. 389-398.
    [2]. Ashburner, M., C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000. 25(1): p. 25-9.
    [3]. Bollegala, D., Y. Matsuo, and M. Ishizuka, Measuring semantic similarity between words using web search engines, in Proceedings of the 16th international conference on World Wide Web. 2007, ACM: Banff, Alberta, Canada.
    [4]. Bollegala, D., Y. Matsuo, and M. Ishizuka, Measuring the similarity between implicit semantic relations using web search engines, in Proceedings of the Second ACM International Conference on Web Search and Data Mining. 2009, ACM: Barcelona, Spain.
    [5]. Bollegala, D., Y. Matsuo, and M. Ishizuka, A Web Search Engine-Based Approach to Measure Semantic Similarity between Words. Knowledge and Data Engineering, IEEE Transactions on, 2010. PP(99): p. 1-1.
    [6]. Bollegala, D., N. Okazaki, and M. Ishizuka, A bottom-up approach to sentence ordering for multi-document summarization. Information Processing & Management, 2010. 46(1): p. 89-109.
    [7]. Caviedes, J.E. and J.J. Cimino, Towards the development of a conceptual distance metric for the UMLS. Journal of Biomedical Informatics, 2004. 37(2): p. 77-85.
    [8]. Chen, C.H., S.L. Hsieh, Y.C. Weng, W.Y. Chang, and F. Lai, Semantic similarity measure in biomedical domain leverage web search engine. Conf Proc IEEE Eng Med Biol Soc, 2010. 2010: p. 4436-9.
    [9]. Chen, H.-H., M.-S. Lin, and Y.-C. Wei, Novel association measures using web search with double checking, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. 2006, Association for Computational Linguistics: Sydney, Australia.
    [10]. Church, K.W. and P. Hanks, Word association norms, mutual information, and lexicography. Comput. Linguist., 1990. 16(1): p. 22-29.
    [11]. Cilibrasi, R.L. and P.M.B. Vitanyi, The Google Similarity Distance. IEEE Trans. on Knowl. and Data Eng., 2007. 19(3): p. 370-383.
    [12]. Deerwester, S., S. Dumais, G. Furnas, T. Landauer, and R. Harshman, Indexing by latent semantic analysis. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1990. 41(6): p. 391-407.
    [13]. Hliaoutakis, A., Semantic similarity measures in MeSH ontology and their application to information retrieval on Medline. Master's thesis, 2005.
    [14]. Jiang, J.J. and D.W. Conrath. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. in International Conference Research on Computational Linguistics (ROCLING X). 1997.
    [15]. Kleinberg, J.M., Authoritative sources in a hyperlinked environment. J. ACM, 1999. 46(5): p. 604-632.
    [16]. Leacock, C. and M. Chodorow, Combining Local Context and WordNet Similarity for Word Sense Identification. WordNet: A Lexical Reference System and its Application, 1998: p. 265-283.
    [17]. Li, M., X. Chen, X. Li, B. Ma, P. Vit, #225, and nyi, The similarity metric, in Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms. 2003, Society for Industrial and Applied Mathematics: Baltimore, Maryland.
    [18]. Lin, D., Automatic retrieval and clustering of similar words, in Proceedings of the 17th international conference on Computational linguistics - Volume 2. 1998, Association for Computational Linguistics: Montreal, Quebec, Canada.
    [19]. Lin, D., Review of WordNet An Electronic Lexical Database, C. Fellbaum, Editor. 1998.
    [20]. Lord, P.W., R.D. Stevens, A. Brass, and C.A. Goble, Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics, 2003. 19(10): p. 1275-83.
    [21]. McCrae, J. and N. Collier, Synonym set extraction from the biomedical literature by lexical pattern discovery. BMC Bioinformatics, 2008. 9: p. 159.
    [22]. Patwardhan, S. and T. Pedersen. Using {WordNet}-based Context Vectors to Estimate the Semantic Relatedness of Concepts. in EACL 2006 Workshop Making Sense of Sense---Bringing Computational Linguistics and Psycholinguistics Together. 2006.
    [23]. Pedersen, T., S.V.S. Pakhomov, S. Patwardhan, and C.G. Chute, Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 2007. 40(3): p. 288-299.
    [24]. Rada, R., H. Mili, E. Bicknell, and M. Blettner, Development and application of a metric on semantic nets. Systems, Man and Cybernetics, IEEE Transactions on, 1989. 19(1): p. 17-30.
    [25]. Resnik, P., Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research, 1999. 11: p. 95-130.
    [26]. Rubenstein, H. and J.B. Goodenough, Contextual correlates of synonymy. Commun. ACM, 1965. 8(10): p. 627-633.
    [27]. Sahami, M. and T.D. Heilman, A web-based kernel function for measuring the similarity of short text snippets, in Proceedings of the 15th international conference on World Wide Web. 2006, ACM: Edinburgh, Scotland.
    [28]. Sch, H., and tze, Automatic word sense discrimination. Comput. Linguist., 1998. 24(1): p. 97-123.
    [29]. Wilbur, W.J. and Y. Yang, An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts. Computers in Biology and Medicine, 1996. 26(3): p. 209-222.
    [30]. Wu, Z. and M. Palmer, Verbs semantics and lexical selection, in Proceedings of the 32nd annual meeting on Association for Computational Linguistics. 1994, Association for Computational Linguistics: Las Cruces, New Mexico.

    下載圖示 校內:2012-08-30公開
    校外:2012-08-30公開
    QR CODE