簡易檢索 / 詳目顯示

研究生: 余旭鈞
Yu, Hsu-Chun
論文名稱: 自生醫文獻中萃取出蛋白質功能註解之資訊
Discovering Protein Functions from Biomedical Literature
指導教授: 蔣榮先
Chiang, Jung-Hsien
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 100
中文關鍵詞: 蛋白質功能註解文件探勘
外文關鍵詞: bioinformatics, information extraction, text mining
相關次數: 點閱:111下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   隨著基因體研究的快速進展,相關之研究文獻亦呈現爆炸性成長,對於生醫研究者而言,要搜尋他們在研究的基因或蛋白質之相關資訊,也變得越來越困難。在這些生醫研究者所需要的資訊中,蛋白質的功能是極為基本而重要的部份。因此,如何利用數量龐大的生醫文獻,將其中的蛋白質功能相關知識轉換成容易存取的資料庫格式,以達成蛋白質功能註解,遂成為一個極重要的課題。而在此課題中,文件探勘乃是不可或缺的關鍵技術。
      
      在本論文中,我們提出了一個由生醫文獻中萃取出蛋白質功能的文件探勘程序。對於此問題,我們也開發了一個文件探勘系統──MeKE。在此文件探勘程序中,我們提出了兩個不同的辨識模型,以辨認正確的蛋白質功能。
      
      第一個辨識模型使用以文句比對為基礎的分類方式(sentence alignment-based classification)辨識在文件中描述到的蛋白質功能。我們採用貝氏(Naive Bayes)分類器對句子進行分類,而文句比對則是用來取得分類器所需的特徵。這些特徵是關鍵詞,而不是關鍵字,目的在於更精確表達文句上下文的特徵。
      
      第二個辨識模型使用文句樣式探勘(sentence pattern mining)和文句樣式比對(sentence pattern matching)辨識在文件中描述到的蛋白質功能。文句樣式探勘可以擷取在文件中常見的遣辭用字或寫作格式。在本論文的問題上,這些樣式即為文件作者們常用來描述蛋白質功能的格式。而文句樣式比對是以片語為單位進行比對,目的是達到具有容錯性的辨識。
      
      我們也參加了一個跟此問題相關的競賽──BioCreAtIvE。在此競賽中,我們合併了上述的兩個辨識模型,以取得兩種方法各自的優點。文句分類的優點是可以完全自動化,而文句樣式比對的優點是精確度高。此一合併之方法也確實可以達到比使用單一方法更佳的效能。
      
      在本論文中所提出的文件探勘程序可以用來幫助基因資料庫建構者有效率地註解蛋白質的功能,也可以用來輔助生物學家和醫學研究者快速地在生醫文獻中搜尋蛋白質功能的資訊。

     With the rapid growth of articles about genomics research, it has become a challenge for biomedical researchers to access this ever-increasing quantity of information to understand newest discovery of functions of proteins they are studying. To facilitate functional annotation of proteins by utilizing the huge amounts of biomedical literature and transforming the knowledge into easily accessible database formats, the text mining technique thus becomes essential.

     In this dissertation, we propose a text mining procedure for extracting protein functions from biomedical literature, and we develop a text mining system MeKE for this task as well. In this text mining procedure, two different recognition models are proposed to identify correct protein functions. The first recognition model uses sentence alignment-based classification to recognize protein functions in text, and the second recognition model uses sentence pattern mining and sentence pattern matching.

     When participating in the BioCreAtIvE competition, we combine the above two recognition models to utilize the little-human-effort advantage of the sentence classification approach and the high-precision advantage of the sentence pattern matching approach, and this hybrid method creates great potential for achieving higher performance than that of using either one of the two approaches.
      
     The proposed text mining procedure can be used to aid database curators in annotating protein functions efficiently, and to assist biologists and medical researchers in searching protein functions from biomedical literature rapidly.

    中文摘要 I Abstract II 誌謝 III Table of Contents IV List of Figures VI List of Tables VIII Chapter 1 Introduction 1  1.1 Motivation 1  1.2 Applications of Text Mining in the Bioinformatics Field 1  1.3 Annotation of Protein Functions 2   1.3.1 Gene Ontology 2   1.3.2 Gene Ontology Annotation 4  1.4 Problem Description 7  1.5 Organization of Dissertation 8 Chapter 2 Biomedical Text Mining 9  2.1 Text Mining 10  2.2 Recognition of Gene and Protein Names 12  2.3 Extraction of Protein-Protein Interactions 14  2.4 Related Competitions and Evaluations 16 Chapter 3 The MeKE Text Mining System 21  3.1 Overview of the MeKE System 21  3.2 The Text Mining Procedure 23   3.2.1 Preprocessing 25   3.2.2 Indexing of Protein Names and GO Terms 25   3.2.3 Recognition of GO Term Variants 26   3.2.4 Extraction of Co-occurrence Sentences 29 Chapter 4 Sentence Alignment-Based Classification 31  4.1 Sentence Alignment 32  4.2 Sentence Classification 33  4.3 Experimental Results 35   4.3.1 Data Set 35   4.3.2 Evaluation of GO Term Indexing 35   4.3.3 Evaluation of Protein Name Indexing 36   4.3.4 Evaluation of Sentence Classification 36  4.4 Concluding Remarks 39 Chapter 5 Sentence Pattern Mining and Matching 41  5.1 Phrase Parsing 42  5.2 Sentence Pattern Mining 45  5.3 Sentence Pattern Matching 55  5.4 Experimental Results 55   5.4.1 Data Set 55   5.4.2 Evaluation of GO Term Indexing 56   5.4.3 Evaluation of Protein Name Indexing 59   5.4.4 Evaluation of Sentence Pattern Mining 59  5.5 Concluding Remarks 69 Chapter 6 The Hybrid Method in the BioCreAtIvE Competition 71  6.1 The BioCreAtIvE Competition 71  6.2 The Hybrid Method 79   6.2.1 Sentence Detection and Indexing 80   6.2.2 GO Variant Mining 81   6.2.3 Sentence Transform 81   6.2.4 Template Screening 83  6.3 Results and Discussion 84  6.4 Concluding Remarks 87 Chapter 7 Conclusion and Future Work 88 References 90 References 91 Vita 100

    [1] A. V. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools. Addison Wesley, Reading, MA, pp. 114-117, 1988.

    [2] T. K. Attwood and D. J. Parry-Smith, Introduction to Bioinformatics. Prentice Hall, Harlow, England, pp. 46-49, 1999.

    [3] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. ACM Press, New York, NY, pp. 168-169, 1999.

    [4] A. Bairoch, B. Boeckmann, S. Ferro, and E. Gasteiger, "Swiss-Prot: Juggling between Evolution and Stability," Briefings in Bioinformatics, vol. 5, no. 1, pp. 39-55, Mar. 2004. http://www.expasy.org/sprot/.

    [5] G. Bhalotia, P. I. Nakov, A. S. Schwartz, and M. A. Hearst, "BioText Team Report for the TREC 2003 Genomics Track," Proceedings of the Twelfth Text Retrieval Conference (TREC 2003), pp. 612-621, 2003. http://trec.nist.gov/pubs/trec12/t12_proceedings.html.

    [6] S. Bickel, U. Brefeld, L. Faulstich, J. Hakenberg, U. Leser, C. Plake, and T. Scheffer, "A Support Vector Machine Classifier for Gene Name Recognition," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [7] E. Camon, D. Barrell, V. Lee, E. Dimmer, and R. Apweiler, "The Gene Ontology Annotation (GOA) Database - An Integrated Resource of GO Annotations to the UniProt Knowledgebase," In Silico Biology, vol. 4, no. 1, pp. 5-6, 2004. http://www.ebi.ac.uk/GOA/.

    [8] E. Camon, M. Magrane, D. Barrell, D. Binns, W. Fleischmann, P. Kersey, N. Mulder, T. Oinn, J. Maslen, A. Cox, and R. Apweiler, "The Gene Ontology Annotation (GOA) Project: Implementation of GO in SWISS-PROT, TrEMBL and InterPro," Genome Research, vol. 13, no. 4, pp. 662-672, 2003.

    [9] B. Cestnik, "Estimating Probabilities: A Crucial Task in Machine Learning," Proceedings of the 9th European Conference on Artificial Intelligence (ECAI'90), pp. 147-149, 1990.

    [10] J. T. Chang, S. Raychaudhuri, and R. B. Altman, "Including Biological Literature Improves Homology Search," Proceedings of the Pacific Symposium on Biocomputing (PSB) 2001, pp. 374-383, 2001. http://psb.stanford.edu/.

    [11] J.-H. Chiang and H.-C. Yu, "MeKE: Discovering the Functions of Gene Products from Biomedical Literature via Sentence Alignment," Bioinformatics, vol. 19, no. 11, pp. 1417-1422, 2003. http://iir.csie.ncku.edu.tw/~yuhc/meke/.

    [12] J.-H. Chiang and H.-C. Yu, "Extracting Functional Annotations of Proteins Based on Hybrid Text Mining Approaches," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [13] J.-H. Chiang and H.-C. Yu, "Literature Extraction of Protein Functions Using Sentence Pattern Mining," IEEE Transactions on Knowledge and Data Engineering, submitted for publication in 2005.

    [14] J.-H. Chiang, H.-C. Yu, and H.-J. Hsu, "GIS: A Biomedical Text-Mining System for Gene Information Discovery," Bioinformatics, vol. 20, no. 1, pp. 120-121, 2004. http://iir.csie.ncku.edu.tw/~yuhc/gis/.

    [15] F. M. Couto, M. J. Silva, and P. Coutinho, "FiGO: Finding GO Terms in Unstructured Text," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [16] M. Craven and J. Kumlien, "Constructing Biological Knowledge Bases by Extracting Information from Text Sources," Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB'99), pp. 77-86, 1999.

    [17] N. Daraselia, A. Yuryev, S. Egorov, S. Novichkova, A. Nikitin, and I. Mazo, "Extracting Human Protein Interactions from MEDLINE Using a Full-Sentence Parser," Bioinformatics, vol. 20, no. 5, pp. 604-611, 2004.

    [18] B. de Bruijn and J. Martin, "Finiding Gene Function using LitMiner," Proceedings of the Twelfth Text Retrieval Conference (TREC 2003), pp. 451-459, 2003. http://trec.nist.gov/pubs/trec12/t12_proceedings.html.

    [19] J. Ding, D. Berleant, D. Nettleton, and E. Wurtele, "Mining MEDLINE: Abstracts, Sentences, or Phrases?" Proceedings of the Pacific Symposium on Biocomputing (PSB) 2002, pp. 326-337, 2002. http://psb.stanford.edu/.

    [20] S. Dingare, J. Finkel, C. Manning, M. Nissim, and B. Alex, "Exploring the Boundaries: Gene and Protein Identification in Biomedical Text," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [21] F. Ehrler and P. Ruch, "Preliminary Report on the BioCreative Experiment: Task Presentation, System Description and Preliminary Results," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [22] C. Fellbaum, editor, WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998. http://wordnet.princeton.edu/.

    [23] C. Friedman, P. Kra, H. Yu, M. Krauthammer, and A. Rzhetsky, "GENIES: A Natural-Language Processing System for the Extraction of Molecular Pathways from Journal Articles," Bioinformatics, vol. 17, suppl. 1, pp. S74-S82, 2001.

    [24] K. Fukuda, T. Tsunoda, A. Tamura, and T. Takagi, "Toward Information Extraction: Identifying Protein Names from Biological Papers," Proceedings of the Pacific Symposium on Biocomputing (PSB) 1998, pp. 707-718, 1998. http://psb.stanford.edu/.

    [25] M. M. Ghanem, Y. Guo, H. Lodhi, and Y. Zhang, "Automatic Scientific Text Classification Using Local Patterns: KDD CUP 2002 (Task 1)," SIGKDD Explorations, vol. 4, no. 2, pp. 95-96, 2002.

    [26] R. Grishman, "Information Extraction: Techniques and Challenges," Materials for Information Extraction (International Summer School SCIE-97), M. T. Pazienza, editor, Springer-Verlag, 1997.

    [27] Grok. http://grok.sourceforge.net/.

    [28] Y.-K. Guo, H. Harkema, I. Roberts, R. Gaizauskas, and M. Hepple, "University of Sheffield: Preliminary Investigation of a Dictionary-Based Approach to the Biocreative Gene and Protein Identification Task," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [29] U. Hahn, M. Romacker, and S. Schulz, "Creating Knowledge Repositories from Biomedical Reports: The MEDSYNDIKATE Text Mining System," Proceedings of the Pacific Symposium on Biocomputing (PSB) 2002, pp. 338-349, 2002. http://psb.stanford.edu/.

    [30] M. A. Hearst, "Text Data Mining: Issues, Techniques, and the Relationship to Information Access," Presentation Notes for UW/MS Workshop on Data Mining, 1997.

    [31] W. Hersh and R. T. Bhupatiraju, "TREC Genomics Track Overview," Proceedings of the Twelfth Text Retrieval Conference (TREC 2003), pp. 14-23, 2003. http://trec.nist.gov/pubs/trec12/t12_proceedings.html.

    [32] L. Hirschman, J. C. Park, J. Tsujii, L. Wong, and C. H. Wu, "Accomplishments and Challenges in Literature Data Mining for Biology," Bioinformatics, vol. 18, no. 12, pp. 1553-1561, 2002.

    [33] I. Iliopoulos, A. J. Enright, and C. A. Ouzounis, "TEXTQUEST: Document Clustering of MEDLINE Abstracts for Concept Discovery in Molecular Biology," Proceedings of the Pacific Symposium on Biocomputing (PSB) 2001, pp. 384-395, 2001. http://psb.stanford.edu/.

    [34] C. Jacquemin, Spotting and Discovering Terms through Natural Language Processing. MIT Press, Cambridge, MA, pp. 3-6, 2001.

    [35] R. Jelier, M. Schuemie, C. van der Eijk, M. Weeber, E. van Mulligen, B. Schijvenaars, B. Mons, and J. Kors, "Searching for GeneRIFs: Concept-Based Query Expansion and Bayes Classification," Proceedings of the Twelfth Text Retrieval Conference (TREC 2003), pp. 225-233, 2003. http://trec.nist.gov/pubs/trec12/t12_proceedings.html.

    [36] M. Kayaalp, A. R. Aronson, S. M. Humphrey, N. C. Ide, L. K. Tanabe, L. H. Smith, D. Demner, R. R. Loane, J. G. Mork, and O. Bodenreider, "Methods for Accurate Retrieval of MEDLINE Citations in Functional Genomics," Proceedings of the Twelfth Text Retrieval Conference (TREC 2003), pp. 441-450, 2003. http://trec.nist.gov/pubs/trec12/t12_proceedings.html.

    [37] J. Kazama, T. Makino, Y. Ohta, and J. Tsujii, "Tuning Support Vector Machines for Biomedical Named Entity Recognition," Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain, pp. 1-8, 2002.

    [38] S. S. Keerthi, C.-J. Ong, K.-B. Siah, D. B.-L. Lim, W. Chu, M. Shi, D. S. Edwin, R. Menon, L. Shen, J. Y.-K. Lim, and H.-T. Loh, "A Machine Learning Approach for the Curation of Biomedical Literature -- KDD Cup 2002 (Task 1)," SIGKDD Explorations, vol. 4, no. 2, pp. 93-94, 2002.

    [39] S. Kinoshita, K. B. Cohen, P. Ogren, and L. Hunter, "Entity Identification in the Molecular Biology Domain with a Stochastic POS Tagger: the BioCreative Task," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [40] H. Kirsch and D. Rebholz, "Method Used for BioCreAtIvE Task 1A," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [41] M. Krallinger and M. M. Padron, "Prediction of GO Annotation by Combining Entity Specific Sentence Sliding Window Profiles," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [42] Y. Krymolowski, B. Alex, and J. L. Leidner, "BioCreative Task 2.1: The Edinburgh-Stanford System," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [43] V. Lee, E. Camon, E. Dimmer, D. Barrell, and R. Apweiler, "Who Tangos with GOA? Use of Gene Ontology Annotation (GOA) for Biological Interpretation of '-omics' Data and for Validation of Automatic Annotation Tools," In Silico Biology, vol. 5, no. 0002, 2005.

    [44] G. Leroy and H. Chen, "Filling Preposition-Based Templates to Capture Information from Medical Abstracts," Proceedings of the Pacific Symposium on Biocomputing (PSB) 2002, pp. 350-361, 2002. http://psb.stanford.edu/.

    [45] D. Maglott, J. Ostell, K. D. Pruitt, and T. Tatusova, "Entrez Gene: Gene-Centered Information at NCBI," Nucleic Acids Research, vol. 33, Database Issue, pp. D54-D58, 2005. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene.

    [46] C. D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, 1999.

    [47] E. M. Marcotte, I. Xenarios, and D. Eisenberg, "Mining Literature for Protein-Protein Interactions," Bioinformatics, vol. 17, no. 4, pp. 359-363, 2001.

    [48] A. T. McCray, S. Srinivasan, and A. C. Browne, "Lexical Methods for Managing Variation in Biomedical Terminologies," Proceedings of the 18th Symposium on Computer Applications in Medical Care (SCAMC'94), pp. 235-239, 1994. http://umlslex.nlm.nih.gov/.

    [49] R. McDonald and F. Pereira, "Identifying Gene and Protein Mentions in Text Using Conditional Random Fields," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [50] T. Mitsumori, S. Fation, M. Murata, K. Doi, and H. Doi, "Gene/Protein Name Recognition Using Support Vector Machine after Dictionary Matching," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [51] M. Narayanaswamy, K. E. Ravikumar, and K. Vijay-Shanker, "A Biological Named Entity Recognizer," Proceedings of the Pacific Symposium on Biocomputing (PSB) 2003, pp. 427-438, 2003. http://psb.stanford.edu/.

    [52] S. J. Nelson, W. D. Johnston, and B. L. Humphreys, "Relationships in Medical Subject Headings," Relationships in the Organization of Knowledge, C. A. Bean and R. Green, editors, Kluwer Academic Publishers, New York, NY, pp. 171-184, 2001. http://www.nlm.nih.gov/mesh/.

    [53] S.-K. Ng and M. Wong, "Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts," Genome Informatics, vol. 10, pp. 104-112, 1999.

    [54] T. Ono, H. Hishigaki, A. Tanigami, and T. Takagi, "Automated Extraction of Information on Protein-Protein Interactions from the Biological Literature," Bioinformatics, vol. 17, no. 2, pp. 155-161, 2001.

    [55] J. C. Park, H. S. Kim, and J. J. Kim, "Bidirectional Incremental Parsing for Automatic Pathway Identification with Combinatory Categorial Grammar," Proceedings of the Pacific Symposium on Biocomputing (PSB) 2001, pp. 396-407, 2001. http://psb.stanford.edu/.

    [56] C. Perez-Iratxeta, P. Bork, M. A. Andrade, "Exploring MEDLINE abstracts with XplorMed," Drugs Today, vol. 38, no. 6, pp. 381-389, 2002.

    [57] M. F. Porter, "An Algorithm for Suffix Stripping," Readings in Information Retrieval, K. Sparck Jones and P. Willet, editors, Morgan Kaufmann Publishers, Inc., San Francisco, CA, pp. 313-316, 1997.

    [58] J. Pustejovsky, J. Castaño, J. Zhang, M. Kotecki, and B. Cochran, "Robust Relational Parsing Over Biomedical Literature: Extracting Inhibit Relations," Proceedings of the Pacific Symposium on Biocomputing (PSB) 2002, pp. 362-373, 2002. http://psb.stanford.edu/.

    [59] S. Ray and M. Craven, "Learning Statistical Models for Annotating Proteins with Function Information using Biomedical Text," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [60] S. Raychaudhuri, J. T. Chang, F. Imam, and R. B. Altman, "The Computational Analysis of Scientific Literature to Define and Recognize Gene Expression Clusters," Nucleic Acids Research, vol. 31, no. 15, pp. 4553-4560, 2003.

    [61] Y. Regev, M. Finkelstein-Landau, R. Feldman, M. Gorodetsky, X. Zheng, S. Levy, R. Charlab, C. Lawrence, R. A. Lippert, Q. Zhang, and H. Shatkay, "Rule-based Extraction of Experimental Evidence in the Biomedical Domain: the KDD Cup 2002 (Task 1)," ACM SIGKDD Explorations Newsletter, vol. 4, no. 2, pp. 90-92, 2002.

    [62] S. B. Rice, G. Nenadic, and B. J. Stapley, "Protein Function Assignment Using Term-Based Support Vector Machines - BioCreative Task Two 2003," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [63] Semio Corporation, "Text Mining and the Knowledge Management Space", White paper, Semio Corporation, 1999.

    [64] J. C. Setubal and J. Meidanis, Introduction to Computational Molecular Biology. PWS Publishing Company, Boston, MA, pp. 55, 1997.

    [65] M. Sipser, Introduction to the Theory of Computation. PWS Publishing Company, Boston, MA, pp. 35-37, 1997.

    [66] Y. Song, E. Yi, E. Kim, and G. G. Lee, "POSBIOTMNER: A Machine Learning Approach for Bio-Named Entity Recognition," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [67] B. J. Stapley and G. Benoit, "Biobibliometrics: Information Retrieval and Visualization from Co-Occurrences of Gene Names in Medline Abstracts," Proceedings of the Pacific Symposium on Biocomputing (PSB) 2000, pp. 526-537, 2000. http://psb.stanford.edu/.

    [68] B. J. Stapley, L. A. Kelley, and M. J. E. Sternberg, "Predicting the Sub-Cellular Location of Proteins from Text Using Support Vector Machines," Proceedings of the Pacific Symposium on Biocomputing (PSB) 2002, pp. 374-385, 2002. http://psb.stanford.edu/.

    [69] M. Stephens, M. Palakal, S. Mukhopadhyay, R. Raje, and J. Mostafa, "Detecting Gene Relations from MEDLINE Abstracts," Proceedings of the Pacific Symposium on Biocomputing (PSB) 2001, pp. 383-496, 2001. http://psb.stanford.edu/.

    [70] A.-H. Tan, "Text Mining - The State of the Art and the Challenges," Proceedings of the PAKDD'99 Workshop on Knowledge Disocovery from Advanced Databases, pp. 65-70, 1999.

    [71] L. Tanabe and W. J. Wilbur, "Tagging Gene and Protein Names in Biomedical Text," Bioinformatics, vol. 18, no. 8, pp. 1124-1132, 2002.

    [72] J. M. Temkin and M. R. Gilder, "Extraction of Protein Interaction Information from Unstructured Text Using a Context-Free Grammar," Bioinformatics, vol. 19, no. 16, pp. 2046-2053, 2003.

    [73] The Gene Ontology Consortium, "The Gene Ontology (GO) Database and Informatics Resource," Nucleic Acids Research, vol. 32, pp. D258-D261, 2004. http://www.geneontology.org/.

    [74] K. Verspoor, J. Cohn, C. Joslyn, S. Mniszewski, A. Rechtsteiner, L. M. Rocha, and T. Simas, "Protein Annotation as Term Categorization in the Gene Ontology," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [75] W. J. Wilbur, "A Thematic Analysis of the AIDS Literature," Proceedings of the Pacific Symposium on Biocomputing (PSB) 2002, pp. 386-397, 2002. http://psb.stanford.edu/.

    [76] L. Wong, "PIES, a Protein Interaction Extraction System," Proceedings of the Pacific Symposium on Biocomputing (PSB) 2001, pp. 520-531, 2001. http://psb.stanford.edu/.

    [77] A. Yakushiji, Y. Tateisi, Y. Miyao, and J. Tsujii, "Event Extraction from Biomedical Papers Using a Full Parser," Proceedings of the Pacific Symposium on Biocomputing (PSB) 2001, pp. 408-419, 2001. http://psb.stanford.edu/.

    [78] D.-M. Yao, J.-B. Wang, Y.-M. Lu, N. Noble, H.-D. Sun, X.-Y. Zhu, N. Lin, D. G. Payan, M. Li, and K.-B. Qu, "PathwayFinder: Paving the Way Towards Automatic Pathway Extraction," Proceedings of the Second Asia-Pacific Bioinformatics Conference (APBC2004), pp. 53-62, 2004.

    [79] A. Yeh, L. Hirschman, and A. Morgan, "Background and Overview for KDD Cup 2002 Task 1: Information Extraction from Biomedical Articles," SIGKDD Explorations, vol. 4, no. 2, pp. 87-89, 2002.

    [80] H. Yu and E. Agichtein, "Extracting Synonymous Gene and Protein Terms from Biological Literature," Bioinformatics, vol. 19, suppl. 1, pp. i340-i349, 2003.

    [81] G.-D. Zhou, D. Shen, J. Zhang, J. Su, S.-H. Tan, and C.-L. Tan, "Recognition of Protein/Gene Names from Text using an Ensemble of Classifiers and Effective Abbreviation Resolution," Proceedings of the Workshop on Critical Assessment for Information Extraction in Biology, 2004. http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/.

    [82] G.-D. Zhou, J. Zhang, J. Su, D. Shen, and C.-L. Tan, "Recognizing Names in Biomedical Texts: A Machine Learning Approach," Bioinformatics, vol. 20, no. 7, pp. 1178-1190, 2004.

    下載圖示 校內:立即公開
    校外:2005-07-15公開
    QR CODE