簡易檢索 / 詳目顯示

研究生: 林子翔
Lin, Zi-Xiang
論文名稱: 一個兩階段半監督式之基因轉錄調控關係擷取學習系統
A Two-stage Semi-Supervised Learning System For Retrieving Gene Regulatory Information From PubMed Thesaurus
指導教授: 高宏宇
Kao, Hung-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 醫學資訊研究所
Institute of Medical Informatics
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 69
中文關鍵詞: 調控基因轉錄排序編排文字樣版
外文關鍵詞: gene, rank, arrangement, pattern, regulate, transcription, word
相關次數: 點閱:109下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於近年來人類基因體的解碼,許多有關基因序列的研究大量地被發表。而生物學技術的進步,像是DNA microarrays以及在分子生物學上高產量的實驗,更促進了大量基因資料的散佈。這些豐富的知識大多以文獻記載的方式記錄下來。如何有效率並準確地擷取這些知識,已經是一個重要的課題。在如此大量的生物醫學文獻中,包含了許多基因的調控資訊。這些資訊對於生物醫學家來說,是相當重要又不易取得的。其中主要的困難來自於基因調控系統的複雜性。
    對生物醫學家來說,如果要以人工從如此大量的文獻記錄中,擷取並獲得這些轉錄調控關係是非常地耗時耗力。因此,有效率的處理這些龐大且豐富的資源是必要的。我們的目標是發展一個自動化的基因調控關係探勘系統,藉由此系統能夠從大量的生物醫學文獻中,擷取出有包含調控關係資訊的文句。經由實驗的驗證,證明了我們所建立的系統能從大量的生物醫學文獻中,有效並準確地找出基因調控的資訊。

    The decoding of human genome sequences and the improvement of biological techniques, like DNA microarrays, accelerates an overwhelming amount of biomedical knowledge recorded in texts. These texts often contain the latest finding in proteins, genes and small molecules. In these literatures, huge amounts of gene-related data are included, like the transcriptional regulation between the regulators and the target genes.
    For the researchers, it takes lots of efforts to derive the information from the tremendous literatures. Therefore, we develop a precise and efficient system for retrieving the regulatory relationship from the tremendous literature. It automatically learning the writing styles of the regulatory sentences and apply a two-stage method to find out the correct sentences. Our system achieves a better precision rate than the existing systems in the first part of the ranked sentences.

    中文摘要 IV ABSTRACT V CONTENT VI FIGURE LISTING VIII TABLE LISTING X 1. INTRODUCTION 1 1.1 Motivation 1 1.2 Method 3 2. RELATED WORK 5 2.1 Bio-related resources 5 2.1.1 PubMed 5 2.1.2 HUGO Gene Nomenclature Committee 6 2.1.3 Sequential Retrieval System 7 2.1.4 Medical Subject Headings (MeSH) 7 2.2 Information retrieval techniques 9 2.2.1 Natural Language Processing (NLP) 9 2.2.2 Pattern Match 9 2.3 Related work 9 3. OUR PROPOSED METHOD 14 3.1 Preliminary analysis 14 3.2 Our proposed method 16 3.2.1 Two-stage method 16 3.2.2 Semi-supervised system 24 3.3 The system flow 26 3.3.1 Preprocessing 26 3.3.2 Identification of TF and gene names 28 3.3.3 Learning Module 29 3.3.4 Extraction Module 34 4. EXPERIMENT 38 4.1 Data sets 38 4.2 Extraction accuracy of our method 39 4.2.1 Detail description 39 4.2.2 Result analysiss 40 4.3 Relations between the number of PMIDs used in the WSRR Establishment and the number of retrieved sentences 42 4.3.1 Detail description 42 4.3.2 Result analysis 43 4.4 Semi-supervised experiment in PubMed 44 4.4.1 Detail description 44 4.4.2 Result analysis 44 4.5 Apply Data1&2 in WSRR Establishment and test in PubMed 48 4.5.1 Detail description 48 4.5.2 Result analysis 49 4.6 System introduction 52 5. CONCLUSION AND FUTURE WORK 54 5.1 Conclusion 54 5.2 Future work 54 6. REFERENCEs 55 7. 附錄 57

    [1] Bodenreider, O., The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res, 2004. 32(Database issue): p. D267-70.
    [2] Chater, K., M. Berlyn, and B. Bachmann, Genetic nomenclature guide. Bacteria. Trends Genet, 1995: p. 5-8.
    [3] Cherry, J.M., Genetic nomenclature guide. Saccharomyces cerevisiae. Trends Genet, 1995: p. 11-2.
    [4] Friedman, C., et al., GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics, 2001. 17 Suppl 1: p. S74-82.
    [5] Griffith, O.L., et al., ORegAnno: an open-access community-driven resource for regulatory annotation. Nucleic Acids Res, 2008. 36(Database issue): p. D107-13.
    [6] http://srs.ebi.ac.uk/.
    [7] http://www.gene.ucl.ac.uk/nomenclature/aboutHGNC.html.
    [8] http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed.
    [9] http://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html.
    [10] http://www.nlm.nih.gov/mesh/.
    [11] http://www.nlm.nih.gov/mesh/MBrowser.html.
    [12] Hu, Z.Z., et al., Literature mining and database annotation of protein phosphorylation using a rule-based system. Bioinformatics, 2005. 21(11): p. 2759-65.
    [13] Huang, M., et al., Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics, 2004. 20(18): p. 3604-12.
    [14] Muller, H.M., E.E. Kenny, and P.W. Sternberg, Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol, 2004. 2(11): p. e309.
    [15] Ono, T., et al., Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics, 2001. 17(2): p. 155-61.
    [16] Pan, H., et al., Dragon TF Association Miner: a system for exploring transcription factor associations through text-mining. Nucleic Acids Res, 2004. 32(Web Server issue): p. W230-4.
    [17] Perez-Iratxeta, C., P. Bork, and M.A. Andrade, Association of genes to genetically inherited diseases using data mining. Nat Genet, 2002. 31(3): p. 316-9.
    [18] Podowski, R.M., et al., AZuRE, a scalable system for automated term disambiguation of gene and protein names. Proc IEEE Comput Syst Bioinform Conf, 2004: p. 415-24.
    [19] Pustejovsky, J., et al., Robust relational parsing over biomedical literature: extracting inhibit relations. Pac Symp Biocomput, 2002: p. 362-73.
    [20] Raychaudhuri, S. and R.B. Altman, A literature-based method for assessing the functional coherence of a gene group. Bioinformatics, 2003. 19(3): p. 396-401.
    [21] Shatkay, H., et al., Genes, themes and microarrays: using information retrieval for large-scale gene analysis. Proc Int Conf Intell Syst Mol Biol, 2000. 8: p. 317-28.
    [22] Tanabe, L. and W.J. Wilbur, Tagging gene and protein names in biomedical text. Bioinformatics, 2002. 18(8): p. 1124-32.
    [23] Temkin, J.M. and M.R. Gilder, Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics, 2003. 19(16): p. 2046-53.
    [24] Wren, J.D., et al., Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics, 2004. 20(3): p. 389-98.
    [25] Wren, J.D. and H.R. Garner, Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Bioinformatics, 2004. 20(2): p. 191-198.
    [26] Yakushiji, A., et al., Event extraction from biomedical papers using a full parser. Pac Symp Biocomput, 2001: p. 408-19.
    [27] Yu, H., et al., Automatic extraction of gene and protein synonyms from MEDLINE and journal articles. Proc AMIA Symp, 2002: p. 919-23.
    [28] Yu, H., et al., Automatically identifying gene/protein terms in MEDLINE abstracts. J Biomed Inform, 2002. 35(5-6): p. 322-30.
    [29] Yu, H., G. Hripcsak, and C. Friedman, Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc, 2002. 9(3): p. 262-72.

    下載圖示 校內:2009-08-12公開
    校外:2009-08-12公開
    QR CODE