| 研究生: |
林子翔 Lin, Zi-Xiang |
|---|---|
| 論文名稱: |
一個兩階段半監督式之基因轉錄調控關係擷取學習系統 A Two-stage Semi-Supervised Learning System For Retrieving Gene Regulatory Information From PubMed Thesaurus |
| 指導教授: |
高宏宇
Kao, Hung-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 醫學資訊研究所 Institute of Medical Informatics |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 英文 |
| 論文頁數: | 69 |
| 中文關鍵詞: | 調控 、基因 、轉錄 、排序 、編排 、文字 、樣版 |
| 外文關鍵詞: | gene, rank, arrangement, pattern, regulate, transcription, word |
| 相關次數: | 點閱:109 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於近年來人類基因體的解碼,許多有關基因序列的研究大量地被發表。而生物學技術的進步,像是DNA microarrays以及在分子生物學上高產量的實驗,更促進了大量基因資料的散佈。這些豐富的知識大多以文獻記載的方式記錄下來。如何有效率並準確地擷取這些知識,已經是一個重要的課題。在如此大量的生物醫學文獻中,包含了許多基因的調控資訊。這些資訊對於生物醫學家來說,是相當重要又不易取得的。其中主要的困難來自於基因調控系統的複雜性。
對生物醫學家來說,如果要以人工從如此大量的文獻記錄中,擷取並獲得這些轉錄調控關係是非常地耗時耗力。因此,有效率的處理這些龐大且豐富的資源是必要的。我們的目標是發展一個自動化的基因調控關係探勘系統,藉由此系統能夠從大量的生物醫學文獻中,擷取出有包含調控關係資訊的文句。經由實驗的驗證,證明了我們所建立的系統能從大量的生物醫學文獻中,有效並準確地找出基因調控的資訊。
The decoding of human genome sequences and the improvement of biological techniques, like DNA microarrays, accelerates an overwhelming amount of biomedical knowledge recorded in texts. These texts often contain the latest finding in proteins, genes and small molecules. In these literatures, huge amounts of gene-related data are included, like the transcriptional regulation between the regulators and the target genes.
For the researchers, it takes lots of efforts to derive the information from the tremendous literatures. Therefore, we develop a precise and efficient system for retrieving the regulatory relationship from the tremendous literature. It automatically learning the writing styles of the regulatory sentences and apply a two-stage method to find out the correct sentences. Our system achieves a better precision rate than the existing systems in the first part of the ranked sentences.
[1] Bodenreider, O., The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res, 2004. 32(Database issue): p. D267-70.
[2] Chater, K., M. Berlyn, and B. Bachmann, Genetic nomenclature guide. Bacteria. Trends Genet, 1995: p. 5-8.
[3] Cherry, J.M., Genetic nomenclature guide. Saccharomyces cerevisiae. Trends Genet, 1995: p. 11-2.
[4] Friedman, C., et al., GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics, 2001. 17 Suppl 1: p. S74-82.
[5] Griffith, O.L., et al., ORegAnno: an open-access community-driven resource for regulatory annotation. Nucleic Acids Res, 2008. 36(Database issue): p. D107-13.
[6] http://srs.ebi.ac.uk/.
[7] http://www.gene.ucl.ac.uk/nomenclature/aboutHGNC.html.
[8] http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed.
[9] http://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html.
[10] http://www.nlm.nih.gov/mesh/.
[11] http://www.nlm.nih.gov/mesh/MBrowser.html.
[12] Hu, Z.Z., et al., Literature mining and database annotation of protein phosphorylation using a rule-based system. Bioinformatics, 2005. 21(11): p. 2759-65.
[13] Huang, M., et al., Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics, 2004. 20(18): p. 3604-12.
[14] Muller, H.M., E.E. Kenny, and P.W. Sternberg, Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol, 2004. 2(11): p. e309.
[15] Ono, T., et al., Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics, 2001. 17(2): p. 155-61.
[16] Pan, H., et al., Dragon TF Association Miner: a system for exploring transcription factor associations through text-mining. Nucleic Acids Res, 2004. 32(Web Server issue): p. W230-4.
[17] Perez-Iratxeta, C., P. Bork, and M.A. Andrade, Association of genes to genetically inherited diseases using data mining. Nat Genet, 2002. 31(3): p. 316-9.
[18] Podowski, R.M., et al., AZuRE, a scalable system for automated term disambiguation of gene and protein names. Proc IEEE Comput Syst Bioinform Conf, 2004: p. 415-24.
[19] Pustejovsky, J., et al., Robust relational parsing over biomedical literature: extracting inhibit relations. Pac Symp Biocomput, 2002: p. 362-73.
[20] Raychaudhuri, S. and R.B. Altman, A literature-based method for assessing the functional coherence of a gene group. Bioinformatics, 2003. 19(3): p. 396-401.
[21] Shatkay, H., et al., Genes, themes and microarrays: using information retrieval for large-scale gene analysis. Proc Int Conf Intell Syst Mol Biol, 2000. 8: p. 317-28.
[22] Tanabe, L. and W.J. Wilbur, Tagging gene and protein names in biomedical text. Bioinformatics, 2002. 18(8): p. 1124-32.
[23] Temkin, J.M. and M.R. Gilder, Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics, 2003. 19(16): p. 2046-53.
[24] Wren, J.D., et al., Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics, 2004. 20(3): p. 389-98.
[25] Wren, J.D. and H.R. Garner, Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Bioinformatics, 2004. 20(2): p. 191-198.
[26] Yakushiji, A., et al., Event extraction from biomedical papers using a full parser. Pac Symp Biocomput, 2001: p. 408-19.
[27] Yu, H., et al., Automatic extraction of gene and protein synonyms from MEDLINE and journal articles. Proc AMIA Symp, 2002: p. 919-23.
[28] Yu, H., et al., Automatically identifying gene/protein terms in MEDLINE abstracts. J Biomed Inform, 2002. 35(5-6): p. 322-30.
[29] Yu, H., G. Hripcsak, and C. Friedman, Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc, 2002. 9(3): p. 262-72.