研究生: |
魏至軒 Wei, Chih-Hsuan |
---|---|
論文名稱: |
生醫名詞擷取與正規化 The recognition and normalization of biomedical and biological concepts |
指導教授: |
高宏宇
Kao, Hung-Yu |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 英文 |
論文頁數: | 119 |
中文關鍵詞: | 生醫文獻探勘 、生醫名詞辨識 、生醫名詞正規化 、生醫文獻標定 |
外文關鍵詞: | Biomedical text mining, bioconcepts mention recognition, bioconcepts name normalization, Biocuration |
相關次數: | 點閱:82 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著生醫文獻數量的快速發展,生醫文獻檢索和探勘已經成為相當重要的研究議題並且吸引許多研究學者的關注。因此,如何萃取、了解、分析以及利用從大量生醫文獻中所獲得的資訊,已經成為一個具有吸引力與挑戰性的研究方向。其中,生醫名詞(如:基因、疾病和物種等)擷取則是該研究方向中最重要的議題之一,本研究針將對該議題中的當中的兩大項目(生醫名詞辨識與生醫名詞正規化)做深入的討論。
在生醫名詞辨識部份,本論文針對目前顯為人研究的序列突變辨識,設計了一個多元素狀態擷取條件隨機域模型用於擷取大範圍的序列突變(包括蛋白質、核糖核酸(DNA, RNA)序列),我們定義了11種突變元素(如:原始核甘酸、突變後核甘酸、突變位置),透過該模型自行辨識各元素所在位置與組成的樣版,進而有效提升突變名詞辨識的準確率。
在生醫名詞擷取正規化的部份,我們則針對最難的基因號碼辨識,開發了一個模糊感知推論網絡模型,利用基因名稱與文章間的微弱訊息,推論基因名稱可能所屬的Entrez Gene資料庫號碼。由於基因的同名異義問題非常的嚴重,同一個基因隸屬於不同的物種則有不同的資料庫號碼,為了解決物種與基因碼號的對應關係,本研究開發了一個物種表現指定值,透過計算文章中的指定子的代表性,即使文章中未出現任何物種名稱,本方法還是能有效的給定基因所屬的物種,並有效的提升了基因名稱正規化的正確率。
最後,我們有效將本研究的自動生醫名詞擷取的方法與文獻標定者的工作結合,並開發了一個文獻註解標定系統(PubTator),實驗結果亦顯示,標定者採用PubTator能有效的加強標定的效果,以及加快標定者的標定速度。
In recent years, the amount of biological literature has increased rapidly. Today's biomedical literature retrieval and mining has become a very important research issue and attracted many researchers to pay attention. Therefore, extracting, analyzing, and mining those biomedical literatures become a very attractive and challenging research. Moreover, bioconcepts (e.g., Gene, disease, and species) extraction is one of the most important issues in this area. In this study, we mainly focused on two major topics in the bioconcepts extraction issue.
The first is bioconcepts mention recognition. Here we focused on the human sequence variation mention recognition which was rare studied in the past. We defined a multiple component state extracting conditional random field (CRF) - model for extracting a wide range of sequence variants described at protein, DNA, and RNA levels. We defined eleven variation components, e.g., wild type, position, and mutant, for each variation mentions. Since the model can recognize the locations of each component and the assembly of components, the accuracy has been largely improved.
The second is bioconcepts name normalization. We focused on gene normalization which is one of the hardest topics in whole biomedical text mining category. We defined an ambiguity-aware inference network model which collects the slightly evidence between gene mentions and literatures to infer the possible Entrez Gene database identifiers for gene mentions. Furthermore, the gene name ambiguity is the most critical issue of gene normalization. One gene in different species is assigned by different identifiers. To figure out the mapping between species and gene identifiers, our study has developed the species representation indicator (SRI) to calculate the representation of indicators. Even though there is no any species name in the articles, this method still can assign suitable species to genes, and further effectively improve the performance of gene normalization.
Based on our completely studies on bioconcepts mention recognition and name normalization, we developed a literatures curation system (PubTator) which equipped our developed name entity recognition and normalization methods. PubTator provided a very friendly interface and an adoptable annotation environment which can really help curator to annotate bioconcepts and their relationships. Moreover, the experiment result shows that using PubTator can enhance curation performance and curation speed effectively.
[1] Z. Lu, et al., "The Gene Normalization Task in BioCreative III," BMC Bioinformatics, vol. 12, p. S9, 2011.
[2] H.-J. Dai, et al., "Multistage Gene Normalization and SVM-Based Ranking for Protein Interactor Extraction in Full-Text Articles," IEEE/ACM Transactions On Computational Biology And Bioinformatics, vol. 7, pp. 412-420, 2010.
[3] R. Winnenburg, et al., "Improved mutation tagging with gene identifiers applied to membrane protein stability prediction," BMC Bioinformatics, vol. 10, p. S3, 2009.
[4] J. Wermter, et al., "High-Performance Gene Name Normalization with GENO," Bioinformatics, 2009.
[5] L. Smith, et al., "Overview of BioCreative II gene mention recognition," Genome Biology, vol. 9, p. S2, 2008.
[6] A. A. Morgan, et al., "Overview of BioCreative II gene normalization," Genome Biology, vol. 9, p. S3, 2008.
[7] C.-N. Hsu, et al., "Integrating High Dimensional Bi-directional Parsing Models for Gene Mention Tagging," Bioinformatics, vol. 24, pp. i286-i294, 2008.
[8] J. Hakenberg, et al., "Gene mention normalization and interaction extraction with context models and sentence motifs," Genome Biology, vol. 9, p. S14, 2008.
[9] J. Hakenberg, et al., "Inter-species normalization of gene mentions with GNAT," Bioinformatics, vol. 24, pp. i126-i132, 2008.
[10] J. F. Heinz, et al., "ProMiner: Recognition of Human Gene and Protein Names using regularly updated Dictionaries," presented at the the Second BioCreative Challenge Evaluation Workshop, 2007.
[11] B. Carpenter, "LingPipe for 99.99% Recall of Gene Mentions," presented at the BioCreative workshop, Valencia, Spain, 2007.
[12] B. Settles, "ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text," Bioinformatics, vol. 21, pp. 3191-3192, July 15, 2005 2005.
[13] L. Hirschman, et al., "Overview of BioCreAtIvE task 1B: normalized gene lists," BMC Bioinformatics, vol. 6, p. S11, 2005.
[14] A. Yeh, et al., "BioCreAtIvE Task 1A: gene mention finding evaluation," BMC Bioinformatics, vol. 6, p. S2, 2004.
[15] C.-H. Wei and H.-Y. Kao, "Cross-species gene normalization by species inference," BMC Bioinformatics, vol. 12, p. S6, 2011.
[16] M. Huang, et al., "GeneTUKit: a software for document-level gene normalization," Bioinformatics, vol. 27, pp. 1032-1033, 2011.
[17] C.-H. Wei, et al., "SR4GN: a species recognition software tool for gene normalization," Plos one, vol. 7, p. e38460, 2012.
[18] C.-H. Wei, et al., "tmVar: A text mining approach for extracting sequence variants in biomedical literature," Bioinformatics, vol. Published, 2013.
[19] R. I. Doğan and Z. Lu, "An improved corpus of disease mentions in PubMed citations," presented at the Workshop on Biomedical Natural Language Processing, 2012.
[20] M. Vazquez, et al., "Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications," molecular informatics, vol. 30, pp. 506-519, 2011.
[21] N. Naderi, et al., "OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents," Bioinformatics, vol. 27, pp. 2721-2729, 2011.
[22] K. M. Hettne, et al., "A dictionary to identify small molecules and drugs in free text," Bioinformatics, vol. 25, pp. 2983-2991, 2009.
[23] R. Leaman and G. Gonzalez, "BANNER: An executable survey of advances in biomedical named entity recognition," presented at the Proceedsing of the Pacific Symposium on Biocomputing, 2008.
[24] R. Klinger, et al., "Detection of IUPAC and IUPAC-like chemical names," Bioinformatics, vol. 24, pp. i268-i276, 2008.
[25] J. G. Caporaso, et al., "MutationFinder: a high-performance system for extracting point mutation mentions from text," Bioinformatics, vol. 23, pp. 1862-1865, 2007.
[26] D. Rebholz-Schuhmann, et al., "Assessment of NER solutions against the first and second CALBC Silver Standard Corpus," Journal of Biomedical Semantics, vol. Suppl 5, 2011.
[27] J.-D. KIM, et al., "Introduction to the Bio-Entity Recognition Task at JNLPBA," JNLPBA, pp. 70-75, 2004.
[28] M. Gerner, et al., "LINNAEUS: A species name identification system for biomedical literature," BMC Bioinformatics, vol. 11, 2010.
[29] T. Rocktäschel, et al., "ChemSpot: A Hybrid System for Chemical Named Entity Recognition," Bioinformatics, vol. Published, 2012.
[30] D. M. Jessop, et al., "OSCAR4: a flexible architecture for chemical text-mining," Journal of Cheminformatics, vol. 3, p. 41, 2011.
[31] E. Doughty, et al., "Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature.," Bioinformatics, vol. 27, pp. 408-415, 2011.
[32] R. I. Doğan and Z. Lu, "An Inference Method for Disease Name Normalization," presented at the Association for the Advancement of Artificial Intelligence, Arlington,Virginia,USA, 2012.
[33] F. Leitner, et al., "An Overview of BioCreative II.5," IEEE/ACM Transactions On Computational Biology And Bioinformatics, vol. 7, pp. 385-399, 2010.
[34] A. A. Morgan, et al., "Overview of BioCreative II gene normalization.," Genome Biology, vol. 9, p. S3, 2008.
[35] B. Alex, et al., "Assisted curation: does text mining really help?," presented at the Pacific Symposium on Biocomputing, 2008.
[36] I. Donaldson, et al., "PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.," BMC Bioinformatics, vol. 4, p. 11, 2003.
[37] A. Névéol, et al., "Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction.," Journal of Biomed Informatics, vol. 44, pp. 310-318, 2011.
[38] C. N. Arighi, et al., "BioCreative III interactive task: an overview.," BMC Bioinformatics, vol. 12, p. S4, 2011.
[39] K. N, et al., "Integrating natural language processing with FlyBase curation.," presented at the Pacific Symposium on Biocomputing, 2007.
[40] C. N. Arighi, et al., "Overview of the BioCreative III Workshop," BMC Bioinformatics, vol. 12, p. S1, 2011.
[41] M. Krallinger, et al., "The Protein-Protein Interaction tasks of BioCreative III: classication/ranking of articles and linking bio-ontology concepts to full text.," BMC Bioinformatics, vol. 12, p. S3, 2011.
[42] L. Hirschman, et al., "Text mining for the biocuration workflow.," Database, vol. 2012, p. bas020, 2012.
[43] H.-M. Müller, et al., "Textpresso: an ontology-based information retrieval and extraction system for biological literature.," PLoS Biology, vol. 2, p. e309, 2004.
[44] K. Yook, et al., "WormBase 2012: more genomes, more data, new website.," Nucleic Acids Research, vol. 40, pp. D735-41, 2012.
[45] C. N. Arighi, et al., "An Overview of the BioCreative 2012 Workshop Track III: Interactive Text Mining Task," Database, p. bas056, 2013.
[46] M. Erdogmus and O. U. Sezerman, "Application of automatic mutation-gene pair extraction to diseases," Journal of Bioinform Comput Biology, vol. 5, pp. 1261-1275, 2007.
[47] M. Schenck, et al., "Extraction of Genetic Mutations Associated with Cancer from Public Literature," Journal of Health & Medical Informatics, p. S2, 2012.
[48] R. Kuipers, et al., "Novel tools for extraction and validation of disease-related mutations applied to Fabry disease.," Hum. Mutat., vol. 31, pp. 1026-1032, 2012.
[49] G. Gyimesi, et al., "ABCMdb: a database for the comparative analysis of protein mutations in ABC transporters, and a potential framework for a general application.," Hum. Mutat., vol. 33, pp. 1547-1556, 2012.
[50] J. M. G. Izarzugaza, et al., "Interpretation of the consequences of mutations in protein kinases: combined use of bioinformatics and text mining," Front Physiol., vol. 3, p. 323, 2012.
[51] E. Capriotti, et al., "Bioinformatics for personal genome interpretation.," Brief. Bioinform., vol. 13, pp. 495-512, 2012.
[52] W. Yu, et al., "The need for genetic variant naming standards in published abstracts of human genetic association studies," BMC Res. Notes, vol. 2, 2009.
[53] R. T. McDonald, et al., "An entity tagger for recognizing acquired genomic variations in cancer literature.," Bioinformatics, vol. 20, pp. 3249-3251, 2004.
[54] D. Rebholz-Schuhmann, et al., "Automatic extraction of mutations from Medline and cross-validation with OMIM," Nucleic Acids Research, vol. 32, pp. 135-142, 2004.
[55] F. Horn, et al., "Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors," Bioinformatics, vol. 20, pp. 557-568, 2004.
[56] S. Yeniterzi and U. Sezerman, "EnzyMiner : automatic identification of protein level mutations and their impact on target enzymes from PubMed abstracts," BMC Bioinformatics, vol. 10, p. S2, 2009.
[57] L. I. Furlong, et al., "OSIRISv1.2: a named entity recognition system for sequence variants of genes in biomedical literature.," BMC Bioinformatics, vol. 2008, p. 84, 2008.
[58] N. Naderi and R. Witte, "Automated extraction and semantic analysis of mutation impacts from the biomedical literature," BMC Genomics, vol. Suppl 4, 2012.
[59] R. Kanagasabai, et al., "A workflow for mutation extraction and structure annotation," Journal of Bioinform Comput Biology, vol. 5, pp. 1319–1337, 2007.
[60] J. Lafferty, et al., "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," presented at the Proceedings of the International Conference on Machine Learning (ICML 01), 2001.
[61] J. J. Webster and C. Kit, "Tokenization As The Initial Phase In NLP," presented at the Proceedings of the 14th conference on computational linguistics, 1992.
[62] D. C. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization," Mathematical Programming, Series B, vol. 45, pp. 503-528, 1989.
[63] H. M. Wallach, Conditional Random Fields: An Introduction vol. Technical Report MS-CIS-04-21: Technical Report MS-CIS-04-21. Department of Computer and Information Science, University of Pennsylvania, 2004.
[64] J. Bonis, et al., "OSIRIS: a tool for retrieving literature about sequence variants.," Bioinformatics, vol. 22, pp. 2567-2569, 2006.
[65] C.-H. Wei, et al., "Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts," Database (Oxford), p. bas041, 2012.
[66] C.-H. Wei, et al., "PubTator: A PubMed-like interactive curation system for document triage and literature curation," presented at the Proceedings of the International BioCreative 2012 workshop, Washington DC, USA, 2012.
[67] R. Witte and C. J. O. Baker, "Towards a systematic evaluation of protein mutation extraction systems.," Journal of Bioinform Comput Biology, vol. 5, pp. 1339-1359, 2007.
[68] B. Settles, "Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets," presented at the Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, 2004.
[69] A. Névéol, et al., "Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE," Database (Oxford), p. bas026, 2012.
[70] C. H. Wu, et al., "BioCreative-2012 virtual issue.," Database (oxford), vol. 2012, p. bas049, 2012.
[71] A. Névéol, et al., "Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction.," J. Biomed. Inform., vol. 44, pp. 310-318, 2011.
[72] B. Alex, et al., "Assisted curation: does text mining really help?," Pacific Symposium on Biocomputing, pp. 556-67, 2008.
[73] X. Wang, et al., "Disambiguating the Species of Biomedical Named Entities using Natural Language Parsers," Bioinformatics, vol. 26, pp. 661-667, 2010.
[74] T. Kappeler, et al., "TX Task:Automatic Detection of Focus Organisms in Biomedical Publications," in Proceedings of the Workshop on BioNLP, 2009, pp. 80-88.
[75] R. Klinger, et al., "Detection of IUPAC and IUPAC-like chemical names," Bioinformatics, vol. 24, pp. i268-i276, 2008.
[76] P. Corbett, et al., "Annotation of Chemical Named Entities," presented at the BioNLP 2007: Biological, translational, and clinical language processing, 2007.
[77] J. Hakenberg, et al., "Efficient Extraction of Protein-Protein Interactions from Full-Text Articles," IEEE/ACM Transactions On Computational Biology And Bioinformatics, vol. 7, pp. 481-494, 2010.
[78] Y. Chen, et al., "BioLMiner System: Interaction Normalization Task and Interaction Pair Task in the BioCreative II.5 Challenge," IEEE/ACM Transactions On Computational Biology And Bioinformatics, vol. 7, pp. 428-441, 2010.
[79] K. Verspoor, et al., "Exploring Species-Based Strategies for Gene Normalization," IEEE/ACM Transactions On Computational Biology And Bioinformatics, vol. 7, pp. 462-471, 2010.
[80] R. Saetre, et al., "Extracting Protein Interactions from Text with the Unified AkaneRE Event Extraction System," IEEE/ACM Transactions On Computational Biology And Bioinformatics, vol. 7, pp. 442-453, 2010.
[81] D. Maglott, et al., "Entrez Gene: gene-centered information at NCBI," Nucleic Acids Research, vol. 39, pp. D52–D57, 2011.
[82] L. Hirschman, et al., "Rutabaga by any other name: extracting biological names " J. of Biomedical Informatics, vol. 35, pp. 247-259, 2002.
[83] O. Tuason, et al., "Biological Nomenclatures: A Source of Lexical Knowledge and Ambiguity," Pacific Symposium on Biocomputing, pp. 238-249, 2004.
[84] Z. Lu and W. J. Wilbur, "Overview of BioCreative III Gene Normalization," presented at the BioCreative III Workshop, Maryland,Bethesda, 2010.
[85] H. D. Carroll, et al., "Threshold Average Precision (TAP-k): A Measure of Retrieval Designed for Bioinformatics," Bioinformatics, vol. 26, pp. 1708-1713, 2010.
[86] J.-D. Kim, et al., "GENIA corpus-a semantically annotated corpus for bio-textmining," Bioinformatics, vol. 19, pp. i180-i182, 2003.
[87] X. Wang and C. Grover, "Learning the Species of Biomedical Named Entities from Annotated Corpora," in LREC, LREC Workshop on Building and Evaluating Resources for Biomedical Text Mining, 2008.
[88] X. Wang and M. Matthews, "Distinguishing the species of biomedical named entities for term identification," BMC Bioinformatics, vol. 9, p. S6, 2008.
[89] S. Schbath, "An Efficient Statistic To Detect Over And Under Represented Words In Dna Sequences," Journal of Computational Biology, vol. 4, pp. 189-192, 1997.
[90] M. Kanehisa and S. Goto, "KEGG: Kyoto encyclopedia of genes and genomes," Nucleic Acids Research, vol. 28, pp. 27-30, 2000.
[91] H. Turtle and W. B. Croft, inference netwoks for document retrieval, 1989
[92] C. J. v. Rijsbergen, Information Retrieval. London, Butterworths, 1979.
[93] D. Maglott, et al., "Entrez Gene: gene-centered information at NCBI," Nucleic Acids Research, vol. 35, pp. D26-D31, 2007.
[94] M. Krallinger, et al., "Overview of the protein-protein interaction annotation extraction task of BioCreative II," Genome Biology, vol. 9, p. S4, 2008.
[95] J. William A Baumgartner, et al., "Concept recognition for extracting protein interaction relations from biomedical text," Genome Biology, vol. 9, p. S9, 2008.
[96] K. Bontcheva, et al., "Evolving GATE to Meet New Challenges in Language Engineering," Natural Language Engineering, vol. 10, pp. 349-373, 2004.
[97] H. Cunningham, et al., "A framework and graphical development environment for robust NLP tools and applications " in The 40th Anniversary Meeting of the ACL., 2002, pp. 168-175.
[98] T. Mu, et al., "Imbalanced Classification Using Dictionary-based Prototypes and Hierarchical Decision Rules for Entity Sense Disambiguation," in Coling, 2010, pp. 851-859.
[99] Y. Kano, et al., "U-Compare: share and compare text mining tools with UIMA," Bioinformatics, vol. 25, pp. 1997-1998, 2009.
[100] C.-H. Wei and H.-Y. Kao, "Represented Indicator Measurement and corpus distillation on focus species detection," in IEEE International conference on bioinformatics and biomedicine, 2010, pp. 657-662.
[101] B. D, "Randomization Analysis of Experimental Data: The Fisher Randomization Test," Journal of the American Statistical Association, vol. 75, pp. 575-582, 1980.
[102] H. D. Carroll, et al., "Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics.," Bioinformatics, vol. 26, pp. 1708-1713, 2010.
[103] C.-H. Wei, et al., "PubTator: A PubMed-like interactive curation system for document triage and literature curation," presented at the BioCreative 2012 workshop, 2012.
[104] S. Burge, et al., "Biocurators and biocuration: surveying the 21st century challenges.," Database (oxford), vol. 2012, p. bar059, 2012.
[105] D. Howe, et al., "Big data: The future of biocuration.," Nature, vol. 455, pp. 47-50, 2008.
[106] P. E. Bourne and J. McEntyre, "Biocurators: contributors to the world of science.," PLoS Computational Biology, vol. 2, p. e142, 2006
[107] D. Vishnyakova, et al., "Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database.," Database (oxford), vol. 2012 p. bas050, 2012.
[108] A. Névéol, et al., "Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE.," Database (Oxford), vol. 2012, p. bas026, 2012.
[109] F. Rinaldi, et al., "Using ODIN for a PharmGKB revalidation experiment.," Database (oxford), vol. 2012, p. bas021, 2012.
[110] T. C. Wiegers, et al., "Collaborative biocuration--text-mining development task for document prioritization for curation.," Database (oxford), vol. 2012, p. bas037, 2012.
[111] K. V. Auken, et al., "Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation," BMC Bioinformatics, vol. 10, p. 228, 2009.
[112] W. Yu, et al., "GAPscreener: an automatic tool for screening human genetic association literature in PubMed using the support vector machine technique.," BMC Bioinformatics, vol. 9, p. 205, 2008.
[113] K. G. Dowell, et al., "Integrating text mining into the MGI biocuration workflow.," Database (oxford), vol. 2009, p. bap019, 2009.
[114] M. Krallinger, et al., "How to link ontologies and protein-protein interactions to literature: text-mining approaches and the BioCreative experience.," Database (oxford), vol. 2012, p. bas017, 2012.
[115] C.-H. Wei, et al., "PubTator: A PubMed-like interactive curation system for document triage and literature curation.," presented at the BioCreative 2012 workshop, Washington DC, USA, 2012.
[116] A. Névéol, et al., "Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction.," Journal of Biomedical Informatics, vol. 44, pp. 310-318, 2010.
[117] M. Neves and U. Leser, "A survey on annotation tools for the biomedical literature.," Briefings in bioinformatics, pp. 1-14, 2012.
[118] D. Salgado, et al., "MyMiner: a web application for computer-assisted biocuration and text annotation," Bioinformatics, vol. 22, pp. 2285-2287, 2012.
[119] R. I. Doğan and Z. Lu, "An improved corpus of disease mentions in PubMed citations," presented at the BioNLP 2012, Montreal, Canada, 2012.
[120] S. Kim, et al., "Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information," Database (oxford), p. in press, 2012.
[121] L. Yeganova, et al., "Text Mining Techniques for Leveraging Positively Labeled Data," in BioNLP, 2011, pp. 155-163.
[122] P. Lamesch, et al., "The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools.," Nucleic Acids Research, vol. 40, pp. D1202-D1210, 2012.
[123] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. New York: ACM Press, 1999.
[124] S. E. J. Fischer, et al., "The ERI-6/7 helicase acts at the first stage of an siRNA amplification pathway that targets recent gene duplications.," Plos Genetics, vol. 7, p. e1002369, 2011.
[125] J.-H. Shi and Z.-B. Yang, "Is ABP1 an auxin receptor yet?," Molecular Plant, vol. 4, pp. 635-640, 2011.