成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林子翔 Lin, Zi-Xiang
論文名稱：	一個兩階段半監督式之基因轉錄調控關係擷取學習系統 A Two-stage Semi-Supervised Learning System For Retrieving Gene Regulatory Information From PubMed Thesaurus
指導教授：	高宏宇 Kao, Hung-Yu
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 醫學資訊研究所 Institute of Medical Informatics
論文出版年：	2008
畢業學年度：	96
語文別：	英文
論文頁數：	69
中文關鍵詞：	調控、基因、轉錄、排序、編排、文字、樣版
外文關鍵詞：	gene, rank, arrangement, pattern, regulate, transcription, word
相關次數：	點閱：232 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

由於近年來人類基因體的解碼，許多有關基因序列的研究大量地被發表。而生物學技術的進步，像是DNA microarrays以及在分子生物學上高產量的實驗，更促進了大量基因資料的散佈。這些豐富的知識大多以文獻記載的方式記錄下來。如何有效率並準確地擷取這些知識，已經是一個重要的課題。在如此大量的生物醫學文獻中，包含了許多基因的調控資訊。這些資訊對於生物醫學家來說，是相當重要又不易取得的。其中主要的困難來自於基因調控系統的複雜性。
對生物醫學家來說，如果要以人工從如此大量的文獻記錄中，擷取並獲得這些轉錄調控關係是非常地耗時耗力。因此，有效率的處理這些龐大且豐富的資源是必要的。我們的目標是發展一個自動化的基因調控關係探勘系統，藉由此系統能夠從大量的生物醫學文獻中，擷取出有包含調控關係資訊的文句。經由實驗的驗證，證明了我們所建立的系統能從大量的生物醫學文獻中，有效並準確地找出基因調控的資訊。

The decoding of human genome sequences and the improvement of biological techniques, like DNA microarrays, accelerates an overwhelming amount of biomedical knowledge recorded in texts. These texts often contain the latest finding in proteins, genes and small molecules. In these literatures, huge amounts of gene-related data are included, like the transcriptional regulation between the regulators and the target genes.
For the researchers, it takes lots of efforts to derive the information from the tremendous literatures. Therefore, we develop a precise and efficient system for retrieving the regulatory relationship from the tremendous literature. It automatically learning the writing styles of the regulatory sentences and apply a two-stage method to find out the correct sentences. Our system achieves a better precision rate than the existing systems in the first part of the ranked sentences.

中文摘要	IV
ABSTRACT	V
CONTENT	VI
FIGURE LISTING	VIII
TABLE LISTING	X
INTRODUCTION	1
1	Motivation	1
2	Method	3
RELATED WORK	5
1	Bio-related resources	5
1.1	PubMed	5
1.2	HUGO Gene Nomenclature Committee	6
1.3	Sequential Retrieval System	7
1.4	Medical Subject Headings (MeSH)	7
2	Information retrieval techniques	9
2.1	Natural Language Processing (NLP)	9
2.2	Pattern Match	9
3	Related work	9
OUR PROPOSED METHOD	14
1	Preliminary analysis	14
2	Our proposed method	16
2.1	Two-stage method	16
2.2	Semi-supervised system	24
3	The system flow	26
3.1	Preprocessing	26
3.2	Identification of TF and gene names	28
3.3	Learning Module	29
3.4	Extraction Module	34
EXPERIMENT	38
1	Data sets	38
2	Extraction accuracy of our method	39
2.1	Detail description	39
2.2	Result analysiss	40
3	Relations between the number of PMIDs used in the WSRR Establishment and the number of retrieved sentences	42
3.1	Detail description	42
3.2	Result analysis	43
4	Semi-supervised experiment in PubMed	44
4.1	Detail description	44
4.2	Result analysis	44
5	Apply Data1&2 in WSRR Establishment and test in PubMed	48
5.1	Detail description	48
5.2	Result analysis	49
6	System introduction	52
CONCLUSION AND FUTURE WORK	54
1	Conclusion	54
2	Future work	54
REFERENCEs	55
附錄	57
                                    

[1] Bodenreider, O., The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res, 2004. 32(Database issue): p. D267-70.
[2] Chater, K., M. Berlyn, and B. Bachmann, Genetic nomenclature guide. Bacteria. Trends Genet, 1995: p. 5-8.
[3] Cherry, J.M., Genetic nomenclature guide. Saccharomyces cerevisiae. Trends Genet, 1995: p. 11-2.
[4] Friedman, C., et al., GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics, 2001. 17 Suppl 1: p. S74-82.
[5] Griffith, O.L., et al., ORegAnno: an open-access community-driven resource for regulatory annotation. Nucleic Acids Res, 2008. 36(Database issue): p. D107-13.
[6] http://srs.ebi.ac.uk/.
[7] http://www.gene.ucl.ac.uk/nomenclature/aboutHGNC.html.
[8] http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed.
[9] http://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html.
[10] http://www.nlm.nih.gov/mesh/.
[11] http://www.nlm.nih.gov/mesh/MBrowser.html.
[12] Hu, Z.Z., et al., Literature mining and database annotation of protein phosphorylation using a rule-based system. Bioinformatics, 2005. 21(11): p. 2759-65.
[13] Huang, M., et al., Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics, 2004. 20(18): p. 3604-12.
[14] Muller, H.M., E.E. Kenny, and P.W. Sternberg, Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol, 2004. 2(11): p. e309.
[15] Ono, T., et al., Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics, 2001. 17(2): p. 155-61.
[16] Pan, H., et al., Dragon TF Association Miner: a system for exploring transcription factor associations through text-mining. Nucleic Acids Res, 2004. 32(Web Server issue): p. W230-4.
[17] Perez-Iratxeta, C., P. Bork, and M.A. Andrade, Association of genes to genetically inherited diseases using data mining. Nat Genet, 2002. 31(3): p. 316-9.
[18] Podowski, R.M., et al., AZuRE, a scalable system for automated term disambiguation of gene and protein names. Proc IEEE Comput Syst Bioinform Conf, 2004: p. 415-24.
[19] Pustejovsky, J., et al., Robust relational parsing over biomedical literature: extracting inhibit relations. Pac Symp Biocomput, 2002: p. 362-73.
[20] Raychaudhuri, S. and R.B. Altman, A literature-based method for assessing the functional coherence of a gene group. Bioinformatics, 2003. 19(3): p. 396-401.
[21] Shatkay, H., et al., Genes, themes and microarrays: using information retrieval for large-scale gene analysis. Proc Int Conf Intell Syst Mol Biol, 2000. 8: p. 317-28.
[22] Tanabe, L. and W.J. Wilbur, Tagging gene and protein names in biomedical text. Bioinformatics, 2002. 18(8): p. 1124-32.
[23] Temkin, J.M. and M.R. Gilder, Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics, 2003. 19(16): p. 2046-53.
[24] Wren, J.D., et al., Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics, 2004. 20(3): p. 389-98.
[25] Wren, J.D. and H.R. Garner, Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Bioinformatics, 2004. 20(2): p. 191-198.
[26] Yakushiji, A., et al., Event extraction from biomedical papers using a full parser. Pac Symp Biocomput, 2001: p. 408-19.
[27] Yu, H., et al., Automatic extraction of gene and protein synonyms from MEDLINE and journal articles. Proc AMIA Symp, 2002: p. 919-23.
[28] Yu, H., et al., Automatically identifying gene/protein terms in MEDLINE abstracts. J Biomed Inform, 2002. 35(5-6): p. 322-30.
[29] Yu, H., G. Hripcsak, and C. Friedman, Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc, 2002. 9(3): p. 262-72.

2009-08-12公開

簡易檢索 / 詳目顯示

相關論文