研究生: |
郭育妏 Guo, Yu-Wen |
---|---|
論文名稱: |
一個用於MeSH本體論之以族譜為基礎的自我擴展演算法 Genealogical-based Method for Ontology Self-extension in MeSH |
指導教授: |
高宏宇
Kao, Hung-Yu |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 醫學資訊研究所 Institute of Medical Informatics |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 英文 |
論文頁數: | 46 |
中文關鍵詞: | 本體論自我擴展 、MeSH本體論 、基於族譜的演算法 |
外文關鍵詞: | Ontology Self-extension, MeSH Ontology, Genealogical-based method |
相關次數: | 點閱:72 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在過去十年中,本體論的出現對生命科學領域的生物標定有非常深遠的影響。MeSH本體論是一個用來標定生物醫學文獻的字典,進而輔助使用者在PubMed的文獻搜尋。由於近年來資訊爆炸性成長,有非常多的字詞與概念陸續推陳出新。自動化擴展既有字詞的技術可以輔助生物標定專家以較有系統的機制來維護、更新本體論。然而,大多數的相關技術都是基於一個裝載文字的容器中以詞彙樣式規則來進行字詞擴展,這使得本體論中偏狹義的字詞難於達到好的擴展效果。因此,在這篇研究中,我們設計了一個以族譜為基礎的本體論自我擴展演算法。基於一個字詞在本體論族譜中的廣度與深度,我們提出一個同輩關係字詞衍生法與一個修剪過濾方法。實驗結果顯示,我們的各類別平均精確度為0.50,而其中在有機體學本體論類別中有最好的精確度0.83表現,與我們的基準方法比較下有明顯改善。此外,我們也進一步發現利用其他領域本體論來進行字詞擴展會有不錯的效果。
During the last decade, the advent of Ontologies used for biomedical annotation has had a deep impact on life science. MeSH is a well-known Ontology for the purpose of indexing journal articles in PubMed, improving literature searching on multi-domain topics. Since the explosion of data growth in recent years, there are new terms, concepts that weed through the old and bring forth the new. Automatically extending sets of existing terms will enable bio-curators to systematically improve text-based ontologies level by level. However, most of the related techniques which apply symbolic patterns based on a literature corpus tend to focus on more general but not specific parts of the ontology. Therefore, in this work, we present a novel method for utilizing genealogical information from Ontology itself to find suitable siblings for ontology extension. Based on the breadth and depth dimensions, the sibling generation stage and pruning strategy are proposed in our approach. As a result, on the average, the precision of the genealogical-based method achieved 0.5, with the best 0.83 performance of category “Organisms”. We also achieve average precision 0.69 of 229 new terms in MeSH 2013 version. Furthermore, we found that there is an opportunity for extending Ontology by multiple domains, with employing the knowledge from Ontologies of different domains.
[1]. A., O., Zoogeographical studies on the solenoid fish found in Japan and its neighbouring regions. Bull Jpn Soc Fish Sci 1957: p. 22:526–30.
[2]. Aerts, S., D. Lambrechts, S. Maity, P. Van Loo, B. Coessens, F. De Smet, L.C. Tranchevent, B. De Moor, P. Marynen, B. Hassan, P. Carmeliet, and Y. Moreau, Gene prioritization through genomic data fusion (vol 24, pg 537, 2006). Nature Biotechnology, 2006. 24(6): p. 719-719.
[3]. Agirre, E., O. Ansa, E.H. Hovy, and D. Martinez, Enriching very large ontologies using the WWW, in ECAI Workshop on Ontology Learning. 2000.
[4]. Al-Mubaid, H. and H.A. Nguyen, A cluster-based approach for semantic similarity in the biomedical domain. Conf Proc IEEE Eng Med Biol Soc, 2006. 1: p. 2713-7.
[5]. Alterovitz, G., M. Xiang, D.P. Hill, J. Lomax, J. Liu, M. Cherkassky, J. Dreyfuss, C. Mungall, M.A. Harris, M.E. Dolan, J.A. Blake, and M.F. Ramoni, Ontology engineering. Nature Biotechnology, 2010. 28(2): p. 128-130.
[6]. Andronis, C., A. Sharma, V. Virvilis, S. Deftereos, and A. Persidis, Literature mining, ontologies and information visualization for drug repurposing. Briefings in Bioinformatics, 2011. 12(4): p. 357-368.
[7]. Ashburner, M., C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000. 25(1): p. 25-9.
[8]. Bodenreider, O., T.C. Rindflesch, and A. Burgun, Unsupervised, corpus-based method for extending a biomedical terminology, in Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3. 2002, Association for Computational Linguistics: Phildadelphia, Pennsylvania. p. 53-60.
[9]. Bodenreider, O. and R. Stevens, Bio-ontologies: current trends and future directions. Brief Bioinform, 2006. 7(3): p. 256-74.
[10]. Braun-Blanquet, J.F.G.D.C.H.S., Plant sociology; the study of plant communities. 1965, New York: Hafner Pub. Co.
[11]. Caraballo, S.A., Automatic construction of a hypernym-labeled noun hierarchy from text, in Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. 1999, Association for Computational Linguistics: College Park, Maryland. p. 120-126.
[12]. Cederberg, S. and D. Widdows, Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction, in Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4. 2003, Association for Computational Linguistics: Edmonton, Canada. p. 111-118.
[13]. Chen, S.H., Elements of information theory - Cover,TM, Thomas,JA. Journal of Economic Dynamics & Control, 1996. 20(5): p. 819-824.
[14]. Day-Richter, J., M.A. Harris, and M. Haendel, OBO-Edit - an ontology editor for biologists. Bioinformatics, 2007. 23(16): p. 2198-2200.
[15]. Deerwester, S., S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman, Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 1990. 41(6): p. 391-407.
[16]. Dice, L.R., Measures of the Amount of Ecologic Association between Species. Ecology, 1945. 26(3): p. 297-302.
[17]. Edgar, R. and T. Barrett, NCBI GEO standards and services for microarray data. Nat Biotechnol, 2006. 24(12): p. 1471-2.
[18]. Fabian, G., T. Wachter, and M. Schroeder, Extending ontologies by finding siblings using set expansion techniques. Bioinformatics, 2012. 28(12): p. I292-I300.
[19]. Franke, L., H. van Bakel, L. Fokkens, E.D. de Jong, M. Egmont-Petersen, and C. Wijmenga, Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. American Journal of Human Genetics, 2006. 78(6): p. 1011-1025.
[20]. Hearst, M.A., Automatic acquisition of hyponyms from large text corpora, in Proceedings of the 14th conference on Computational linguistics - Volume 2. 1992, Association for Computational Linguistics: Nantes, France. p. 539-545.
[21]. Howe, D., M. Costanzo, P. Fey, T. Gojobori, L. Hannick, W. Hide, D.P. Hill, R. Kania, M. Schaeffer, S. St Pierre, S. Twigger, O. White, and S.Y. Rhee, Big data: The future of biocuration. Nature, 2008. 455(7209): p. 47-50.
[22]. Jaccard, P., THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE. New Phytologist, 1912. 11(2): p. 37-50.
[23]. Liu, K.H., W.R. Hogan, and R.S. Crowley, Natural Language Processing methods and systems for biomedical ontology learning. Journal of Biomedical Informatics, 2011. 44(1): p. 163-179.
[24]. Lu, Z., W. Kim, and W.J. Wilbur, Evaluation of query expansion using MeSH in PubMed. Information Retrieval, 2009. 12(1): p. 69-80.
[25]. Lu, Z., W.J. Wilbur, J.R. McEntyre, A. Iskhakov, and L. Szilagyi, Finding query suggestions for PubMed. AMIA Annu Symp Proc, 2009. 2009: p. 396-400.
[26]. MacKay, D.J.C., Information Theory, Inference and Learning Algorithms. 2002: Cambridge University Press.
[27]. Morin, E.J., Christian, Automatic Acquisition and Expansion of Hypernym Links. Computers & the Humanities;Nov2004, Vol. 38 Issue 4, p363.
[28]. Pedersen, T., S.V.S. Pakhomov, S. Patwardhan, and C.G. Chute, Measures of semantic similarity and relatedness in the biomedical domain. J. of Biomedical Informatics, 2007. 40(3): p. 288-299.
[29]. Pum-Mo Ryu, K.-S.C., Measuring the Specificity of Terms for Automatic Hierarchy Construction. Workshop on Ontology Learning and Population at 16th General European conference on artificial intelligence, Valencia, Spain, 2004. , 2004.
[30]. Riloff, E., Automatically generating extraction patterns from untagged text. Proceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, Vols 1 and 2, 1996: p. 1044-1049.
[31]. Rindflesch, T.C., J.V. Rajan, and L. Hunter, Extracting molecular binding relationships from biomedical text. 6th Applied Natural Language Processing Conference/1st Meeting of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Conference and Proceedings of the Anlp-Naacl 2000 Student Research Workshop, 2000: p. 188-195.
[32]. Sanchez, D. and M. Batet, Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective. Journal of Biomedical Informatics, 2011. 44(5): p. 749-759.
[33]. Simpson, G.G., Notes on the Measurement of Faunal Resemblance. American Journal of Science, 1960. 258: p. 300-311.
[34]. Snow, R., D. Jurafsky, and A.Y. Ng, Learning syntactic patterns for automatic hypernym discovery, in Advances in Neural Information Processing Systems (NIPS 2004). 2004: Vancouver, British Columbia.
[35]. Velardi, P., R. Navigli, A. Cucchiarelli, and F. Neri, Evaluation of OntoLearn, a methodology for automatic population of domain ontologies, in Ontology Learning from Text: Methods, Applications and Evaluation, P. Buitelaar, P. Cimiano, and B. Magnini, Editors. 2006, IOS Press.
[36]. W, T., G. Fabian, and M. Schroeder, DOG4DAG: semi-automated ontology generation in OBO-Edit and Protégé, in Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences. 2012, ACM: London, United Kingdom. p. 119-120.
[37]. Wachter, T. and M. Schroeder, Semi-automated ontology generation within OBO-Edit. Bioinformatics, 2010. 26(12): p. i88-i96.
[38]. Xiang, Z.a.H., Yongqun., Improvement of PubMed Literature Searching using Biomedical Ontology. International Conference on Biomedical Ontology, 25 July 2009, 2009.
[39]. Yao, L., A. Divoli, I. Mayzus, J.A. Evans, and A. Rzhetsky, Benchmarking ontologies: bigger or better? PLoS Comput Biol, 2011. 7(1): p. e1001055.
[40]. Zouaq, A. and R. Nkambou, A Survey of Domain Ontology Engineering: Methods and Tools. Advances in Intelligent Tutoring Systems, 2010. 308: p. 103-119.