研究生: |
劉詠熙 Liu, Yong-Xi |
---|---|
論文名稱: |
自動化應用PageRank由生醫文件中辨識蛋白質交互作用之句子 Automated PageRank-based Sentences Ranking to Identify Protein Relations from Literature |
指導教授: |
蔣榮先
Chiang, Jung-Hsien |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 中文 |
論文頁數: | 62 |
中文關鍵詞: | 蛋白質交互作用 、文件探勘 |
外文關鍵詞: | Protein-Protein Interaction, PageRank, Text Mining |
相關次數: | 點閱:116 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著資訊科學的進步,大量生物醫學的研究成果被記載在文獻中。針對於蛋白質交互作用方面,現今雖有相關的資料庫,但是所能提供的資料有限,並且是以人工的方式從生物醫學文獻中萃取及驗證蛋白質交互作用的資訊,過程是費時及昂貴的。我們在本研究中提出了自動化萃取蛋白質交互作用的過程,使用階層式樣板的比對,來找出對於蛋白質交互作用描述之句子,並且針對於這些句子建立彼此的鏈結關係,以修改的PageRank演算法來對其作排序,找出蛋白質的交互作用關係。在本論文研究中,我們實作一個自動化的文件探勘系統,除了驗證在KEGG反應路徑上所記載的重要蛋白質關係之外,並且找出了大量可能的潛在蛋白質關係。與過去存在類似的系統或資料庫相比,本系統提供蛋白質交互作用驗證句、蛋白質作用關係等較為豐富且大量的資訊。在最後的實驗結果也顯示,本系統對於蛋白質交互作用資訊的找出有相當不錯的能力。
Along with the improvement of information and computational techniques, increasing number of biomedical researches and literatures have been reported at the public databases such as PubMed. As for identifying protein-protein interactions, there have been some related databases manually evidence and extract interaction data from biomedical literatures. But they offer limited information and the process is time-consuming and expensive in labor power. To enhance the protein-protein interaction extraction process, we implemented an automated framework that combing hierarchical template-based sentence matching and PageRank-based sentence ranking approaches. Using this framework, we extract the interaction evidence sentences and their interaction relations. In this research, we implement a text-mining system to identify many important relations in KEGG pathway databases and discover a great number of novel relations that could potentially extend the existing protein interactions and pathways databases.
[1] S.T. Ahmed, D. Chidambaram, H. Davulcu, and C. Baral, “ IntEx: A Syntactic Role Driven Protein-Protein interaction Extractor for Bio-Medical Text”, Proceedings of the ACL-ISMB Workshop , pp. 54-61, 2005.
[2] C. Blaschke, M.A. Andrade, C. Ouzounis, and A. Valencia, “Automatic extraction of biological information from scientific text: protein-protein interactions”, Proc. International Conference on Intelligent System for Molecular Biology, pp. 60-67, 1999.
[3] K.B. Cohen and L. Hunter, “Natural Language Processing and Systems Biology”, Technical report, University of Colorado School of Medicine Denver, CO, USA, 2004.
[4] D.P.A. Corney, B.F. Buxton, W.B. Langdon, and D.T. Jones, “BioRAT: extracting biological information from full-length papers”, Bioinformatics, vol. 20, no. 17, pp. 3206-3213, 2004.
[5] J. Ding, D. Berleant, D. Nettleton, and E. Wurtele, “Mining Medline: Abstracts, Sentences, Or Phrases?”, Pacific. Symposium on Biocomputing, pp. 326–337, 2002.
[6] G. Erkan and D.R. Radev, “LexRank: Graph-based Lexical Centrality as Salience in Text Summarization”, Journal of Articial Intelligence Research 22, vol. 20, no. 17, pp. 3206-3213, 2004.
[7] C. Friedman, P. Kra, H. Yu, M. Krauthammer, and A. Rzhetsky, “GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles”, Bioinformatics, vol. 17, suppl. 1, pp. S74-S82, 2001.
[8] R. Grishman, “Information Extraction: Techniques and Challenges”, SCIE-97, Springer-Verlag, vol. 1299, pp.10-27, 1997.
[9] M. Huang, X. Zhu, and M. Li, “A hybrid method for relation extraction from Biomedical literature”, International Journal of Medical Informatics, vol. 75, pp. 443-455, 2006.
[10] M. Huang, X. Zhu, Y. Hao, D. Payan, K. Qu, and M. Li, “Discovering patterns to extract protein-protein interactions from full texts”, Bioinformatics, vol. 20, no. 18, pp. 3604-3612, 2004.
[11] M. Kanehisa and S. Goto , “KEGG: Kyoto Encyclopedia of Genes and Genomes”, Nucleic Acids Research, vol. 28, no. 1, pp. 27-30, 2000.
[12] J.-J. Kim, Z. Zhang, J.C. Park, and S.-K. Ng, “BioContrasts: extracting and exploiting protein–protein contrastive relations from biomedical literature”, Bioinformatics, vol. 22, no. 5, pp. 597-605, 2006.
[13] O. Kurland, L. Lee, and C. Domshlak, “PageRank without hyperlinks: Structural re-ranking using links induced by language models”, Proceedings of SIGIR 2005, pp. 19-26, 2005.
[14] R. Mihalcea and P. Tarau, “A Language Independent Algorithm for Single and Multiple Document Summarization”, ACL, 2004.
[15] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web”, Technical Report, Stanford Digital Libraries Technologies Project, 1998.
[16] B. Settles, “ABNER: an open source tool for automatically tagging genes,proteins and other entity names in text”, Bioinformatics, vol. 21, no. 14, pp. 3191-3192, 2005.
[17] B.J. Stapley and G. Benoit, “Biobliometrics: information retrieval and visualization from co-occurrences of gene names in medline abstracts”, Pacific Symposium on Biocomputing, pp. 526-537, 2000.
[18] L. Tanabe and W.J. Wilbur , “Tagging Gene and Protein Names in Full Text Articles”, Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain, pp. 9-13, 2002.
[19] J.M. Temkin and M.R. Gilder , “Extraction of protein interaction information from unstructured text using a context-free grammar”, Bioinformatics, vol. 19, no. 16, pp. 2046-2053, 2003.
[20] A. Vailaya, P. Bluvas, R. Kincaid, A. Kuchinsky, M. Creech, and A. Adler, “An Architecture for Biological Information Extraction and Representation”, Bioinformatics, vol. 21, no. 4, pp. 430-438, 2005.
[21] T. Wattarujeekrit , P. K. Shah , and N. Collier , “PASBio: predicate-argument structures for event extraction in molecular biology”, BMC BioInformatics, vol. 5, no. 155, 2004.
[22] F. Wolf and E. Gibson, “Paragraph-, Word-, and Coherence-based Approaches to Sentence Ranking: A Comparison of Algorithm and Human Performance”, ACL, pp. 383-390, 2004.
[23] J. Xiao, J. Su, G. Zhou, and C. Tan, “Protein-protein interaction extraction: A supervised learning approach”, First International Symposium on Semantic Mining in Biomedicine (SMBM), vol. 148, 2005.
[24] 蔣明村,“使用自動化樣板建立的蛋白質交互作用驗證系統”,國立成功大學資訊工程學系碩士論文,未出版,2007。
[25] BioCreative:http://biocreative.sourceforge.net/
[26] BMC central : http://www.biomedcentral.com/home/
[27] Eutils:http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
[28] HPRD:http://www.hprd.org/
[29] LingPipe:http://www.alias-i.com/lingpipe/
[30] LLL workshop:http://genome.jouy.inra.fr/texte/LLLchallenge/
[31] MontyTagger:http://web.media.mit.edu/~hugo/
[32] NCBI:http://www.ncbi.nlm.nih.gov/
[33] OpenNLP project:http://opennlp.sourceforge.net/
[34] PubMed central:http://www.pubmedcentral.nih.gov/
[35] PubMed Help:http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.chapter.pubmedhelp
[36] UniProt database:http://www.ebi.uniprot.org/index.shtml