成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林冠甫 Lin, Guan-Fu
論文名稱：	利用文獻中共同出現加強網絡來驗證與探勘基因群組相關性 The Evaluation and Mining of Gene Relationship by Constructing the Augmented Co-occurrence Network from Literatures
指導教授：	高宏宇 Kao, Hung-Yu
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2007
畢業學年度：	95
語文別：	英文
論文頁數：	44
中文關鍵詞：	樣版、生物資訊學、文字探勘
外文關鍵詞：	Bioinformatics, Text mining, Patterns
相關次數：	點閱：240 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在近幾年中，關於基因群組的關係探勘是一個相當熱門的研究項目。發現具有相同功能的基因對於有機物質的了解是具有幫助的。而科學文獻是個很好的資源來發覺基因隱含的關係。在文獻中我們可以從學者的研究成果裡獲得豐富的資訊。然而，由於科技技術的進步使得現今生物相關的論文期刊成長地相當快速，這使得利用人力去審查如此大量的相關論文變成一個非常困難的任務。此外，在文獻中同時含有許多不必要的資訊，這也造成難以從文獻中找到精確的基因關係的相關知識。因此，我們希望能利用自動化的技術來分析大量的科學文獻並且能夠過濾不必要的資訊來取得正確的基因相關性。
在這篇論文中，我們分析存放在PubMed中的科學文獻並標定出現在在文獻的標題跟摘要中的基因名稱。接著，根據共同出現的基因名稱來建立一個共同出現網絡。在這過程中，我們利用不同種類的樣板去過濾不必要的資訊以取得更精確的相關性。同時我們也利用一個量化的計算方法來確認基因群組的相關性。在我們的實驗中顯示出利用樣板延伸出的網絡能獲得一個穩定且顯著的成果，而且能夠有用地評估基因的相關性。

In recent years, it is a popular research issue about the relationship mining of a set of genes. It is helpful for the realization of organisms to find genes that have the common functions. The scientific literature is a good source to find the hidden relationships of genes. We can get rich information by the result of the scientists’ research within literatures. However, the number of biological literatures is growing very fast by the improvement of techniques, and it becomes a very difficult task to survey manually so large number of biological papers. Beside, there is also much irrelevant information existed within them, and then it is hard to fetch accurate knowledge about relationships of genes in literatures. Therefore, we want to use the automatic techniques to analyze the large number of literatures and filter irrelevant information to fetch the accurate relationships of genes.
In this paper, we analyze PubMed records to tag the gene names mentioned within abstracts and titles of the literatures, and establish a co-occurrence network of genes by their frequency of co-occurrence. We use different kinds of patterns to filter irrelevant information and fetch accurate relationships. We also use a quantitative method to identify the relationship of the gene sets. Our experiments show that the derived networks attain a stable and prominent performance and are useful to evaluate the gene relationships.

CONTENT
中文摘要	IV
ABSTRACT	V
CONTENT	VI
TABLE LISTING	X
　INTRODUCTION　1
1.　MOTIVATION　1
2.　METHOD　3
3.　STRUCTURE　4
　RELATED WORK　5
1.　DATA RESOURCE　5
1.1.　PubMed　5
1.2.　Gene Ontology (GO)　6
1.3.　The Genetic Association Database (GAD)　7
1.4　HUGO Gene Nomenclature Committee (HGNC)　7
2　RELATED RESEARCH　7
　METHOD AND SYSTEM　10
3.　USING THE PATTERN TO FILTER IRRELEVANT INFORMATION　10
2　CALCULATING THE COHESION SCORE OF AGENE GROUP　11
2.1.　Method refinement 1–Pattern cohesion score 
　　　　(PCS)　１3
2.2　 Method refinement 2 – Accumulation cohesion score 　　　　(ACS)　14
3　SYSTEM STRUCTURE　16
4　TAGGING GENE NAMES　17
5　ESTABLISH GENE CO-OCCURRENCE NETWORK　18
6　QUERY RELATED SUB-NETWORK AND CALCULATE COHESION OF 
　　　GENE GROUP　19
　EXPERIMENT　21
1.　OVERVIEW OF EXPERIMENT　21
2.　EXPERIMENT 1　21
2.1　Evaluate the performance of the system　21
2.2　Result of the experiment 1　22
2　EXPERIMENT 2　23
3.1　Evaluate pattern co-occurrence network　23
3.2　Result of the experiment 2　24
4　EXPERIMENT 3　27
4.1　Compare pattern co-occurrence network and random 
　　　　network　27
4.2　Result of the experiment 3　28
5　EXPERIMENT 4　30
5.1　Evaluate precision and recall rate in three 
　　　　network　30
5.1　Result of the experiment 4　31
6　EXPERIMENT 5　35
6.1　The evaluation of the refinement method 1　35
6.2　Result of the experiment 5 with the first refinement 　　　　method　35
6.3　The evaluation of the refinement method 2　38
6.4　Result of the experiment 5 with the second
　　　refinement method　38
7　EXPERIMENT 6　40
7.1　Shared relationship　40
7.2　Result of the experiment 6　41
　CONCLUSION AND FUTURE WORK　42
REFERENCES　43
                                    

[1].　J.-H. Chiang and H.-C. Yu, Literature Extraction of
　　　Protein Functions Using Sentence Pattern Mining. IEEE
　　　Transactions on knowledge and data engineering, 2005.
　　　Vol. 17, No 8: p. 1088-1098.
[2].　J.-H. Chiang and H.-C. Yu, MekE: discovering the
　　　functions of gene products from biomedical literature
　　　via sentence alignment. Bioinformatics, 2003. Vol.
　　　19, No.11:p. 1417-1422
[3].　H. N. Chua, W.-K. Sung and L. Wong, Exploiting
　　　indirect neighbours and topological weight to predict
　　　protein function from protein-protein interactions.
　　　Bioinformatics, 2006. Vol.22, No.13: p. 1623-1630.
[4].　J. Ding, D. Berleant, D. Nettletion, and E. Wurtele,
　　　Minig Medline: Abstracts, Sentences or Phrases? Pac.
　　　Symp. Biocomp. 2002
[5].　C. Friedman, P. Kra, H. Yu, M. Krauthammer and A.
　　　Rzhetsky, Genies: a natural-language processing
　　　system for the extraction of molecular pathways from
　　　journal articles, Bioinformatics, 2001. Vol. 17, pp.
　　　S74-S82.
[6].　M. Huang, X. Zhu, Y. Hao, D. G. Payan, K. Qu, and M.
　　　Li, Discovering patterns to extract protein-protein
　　　interactions from full texts, Bioinformatics, 2004.
　　　Vol. 20, No.18: pp. 3604-3612.
[7].　R. Homayouni, K. Heinrich, L. Wei and M. W. Berry,
　　　Gene clustering by Latent Semantic Indexing of
　　　MEDLINE abstracts. Bioinformatics, 2005. Vol.21, No.
　　　1: p. 104-115.
[8].　R. Jelier , G. Jenster, L. C. J. Dorssers, C. C. van
　　　der Eijk, E. M. van Mulligen, B. Mons and J. A. Kors,
　　　Co-occurrence based meta-analysis of scientific
　　　texts: retrieving biological relationships between
　　　genes. Bioinformatics, 2005. Vol. 21, No. 9: p. 2049-
　　　2058.
[9].　T.-K. Jessen, A. Lgreid, J. Komorowski and E.
　　　Hovig, A literature network of human genes for high-
　　　throughput analysis of gene expression. Nature
　　　Genetics, 2001. Vol. 28: p. 21-28.
[10].　T.-K. Jessen, A. Lgreid, J. Komorowski and E.
　　　Hovig, Pubgen: Discovering and visualizing gene-gene
　　　relations. In Currents in computational Molecular
　　　Biology, 2000. p. 48-49.
[11].　P. Kemmeren, T. T. J. P. Kockelkorn, T. Bjima, R.
　　　Donders and F. C. P. Holstege, Predicting gene
　　　function through systematic analysis and quality
　　　assessment of high-throughput data. Bioinformatics,
　　　2005. Vol. 21, No.8: p. 1644-1652.
[12].　A. Koike, Y. Niwa and T. Takagi, Automatic
　　　extraction of gene/protein biological functions from
　　　biomedical text. Bioinformatics, 2005. Vol. 21, No.
　　　7: p. 1227-1236.
[13].　T. Ono, H. Hishigaki, A. Tanigami, and T. Takagi,
　　　Automated extraction of information on protein–
　　　protein interactions from the biological literature,
　　　Bioinformatics, 2001. Vol. 17 No. 2: pp.155-161.
[14].　S. Raychaudhuri and R. B. Altman, A literature-based
　　　method for assessing the functional coherence of a
　　　gene group. Bioinformatics, 2003. Vol. 19, No. 3: p.
　　　396-401.
[15].　S. Raychaudhuri, J. T. Chang, P. D. Sutphin and R.B.
　　　Altman, Associating genes with gene ontology codes
　　　using a maximum entropy analysis of biomedical
　　　literature. Genome Research, 2002a Vol. 12, p. 203-214
[16].　S. Raychaudhuri, H. Schtze and R.B. Altman, Using
　　　text analysis to identify functionally coherent gene
　　　groups. Genome Research, 2002b Vol.12, p1582-1590　
[17].　T. C. Rindflesch, L. Tanabe, J. N. Weinstein and L.
　　　Hunter, EDGAR: extraction of grugs, genes and
　　　relations from the biomedical literature. Pacific
　　　Symposium on Biocomputing, 2000 p514-525.
[18].　B. J. Stapley and G. Benoit, Biobiloiometrics:
　　　information retrieval and visualization from co-
　　　occurrences of gene names in Medline abstracts.
　　　Pacific Symposium on Biocomputing, 2000 p529-540.
[19].　L. Tanabe and W. J. Wilbur, Tagging gene and protein
　　　names in biomedical text. Bioinformatics, 2002 Vol.
　　　18, No.8: p. 1124-1132
[20].　C. C. van der Eijk, E. M. van Mulligen, J. A. Kors,
　　　B. Mons and J. van den Berg, Constructing an
　　　associative concept space for literature-based
　　　discovery. JASIST, 2004. Vol. 55. p. 436-444.
[21].　J. D. Wren and H. R. Garner, Heuristics for
　　　identification of acronym – definition patterns
　　　within text: towards an automated construction of
　　　comprehensive acronym – dictionaries. Methods Inf.
　　　Med., 2002 Vol. 41, p. 426-434
[22].　J. D. Wren and H. R. Garner, Shared relationship
　　　analysis: ranking set cohesion and commonalities
　　　within a literature-derived relationship network.
　　　Bioinformatics, 2004. Vol. 20, No.2: p. 191-198.
[23].　J. D. Wren, R. Bekeredjian, J. A. Stewart, R. V.
　　　Shohet and H. R. Garner, Knowledge discovery by
　　　automated identification and ranking of implicit
　　　relationships. Bioinformatics, 2004. Vol. 20, No3: p.
　　　389-398.
[24].　J. D. Wren, Using fuzzy set theory and scale-free
　　　network properties to relation MEDLINE terms. Soft
　　　Computing, 2006. Vol. 10, No.4 : p. 374-381
[25].　http://geneticassociationdb.nih.gov/cgi-bin/index.cgi
[26].　http://www.gene.ucl.ac.uk/nomenclature/data/gdlw_index.html
[27].　http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed
[28].　http://www.geneontology.org/
[29].　http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM

2008-08-06公開