簡易檢索 / 詳目顯示

研究生: 林冠甫
Lin, Guan-Fu
論文名稱: 利用文獻中共同出現加強網絡來驗證與探勘基因群組相關性
The Evaluation and Mining of Gene Relationship by Constructing the Augmented Co-occurrence Network from Literatures
指導教授: 高宏宇
Kao, Hung-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 44
中文關鍵詞: 樣版生物資訊學文字探勘
外文關鍵詞: Bioinformatics, Text mining, Patterns
相關次數: 點閱:145下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在近幾年中,關於基因群組的關係探勘是一個相當熱門的研究項目。發現具有相同功能的基因對於有機物質的了解是具有幫助的。而科學文獻是個很好的資源來發覺基因隱含的關係。在文獻中我們可以從學者的研究成果裡獲得豐富的資訊。然而,由於科技技術的進步使得現今生物相關的論文期刊成長地相當快速,這使得利用人力去審查如此大量的相關論文變成一個非常困難的任務。此外,在文獻中同時含有許多不必要的資訊,這也造成難以從文獻中找到精確的基因關係的相關知識。因此,我們希望能利用自動化的技術來分析大量的科學文獻並且能夠過濾不必要的資訊來取得正確的基因相關性。
    在這篇論文中,我們分析存放在PubMed中的科學文獻並標定出現在在文獻的標題跟摘要中的基因名稱。接著,根據共同出現的基因名稱來建立一個共同出現網絡。在這過程中,我們利用不同種類的樣板去過濾不必要的資訊以取得更精確的相關性。同時我們也利用一個量化的計算方法來確認基因群組的相關性。在我們的實驗中顯示出利用樣板延伸出的網絡能獲得一個穩定且顯著的成果,而且能夠有用地評估基因的相關性。

    In recent years, it is a popular research issue about the relationship mining of a set of genes. It is helpful for the realization of organisms to find genes that have the common functions. The scientific literature is a good source to find the hidden relationships of genes. We can get rich information by the result of the scientists’ research within literatures. However, the number of biological literatures is growing very fast by the improvement of techniques, and it becomes a very difficult task to survey manually so large number of biological papers. Beside, there is also much irrelevant information existed within them, and then it is hard to fetch accurate knowledge about relationships of genes in literatures. Therefore, we want to use the automatic techniques to analyze the large number of literatures and filter irrelevant information to fetch the accurate relationships of genes.
    In this paper, we analyze PubMed records to tag the gene names mentioned within abstracts and titles of the literatures, and establish a co-occurrence network of genes by their frequency of co-occurrence. We use different kinds of patterns to filter irrelevant information and fetch accurate relationships. We also use a quantitative method to identify the relationship of the gene sets. Our experiments show that the derived networks attain a stable and prominent performance and are useful to evaluate the gene relationships.

    CONTENT 中文摘要 IV ABSTRACT V CONTENT VI TABLE LISTING X 1. INTRODUCTION 1 1.1. MOTIVATION 1 1.2. METHOD 3 1.3. STRUCTURE 4 2. RELATED WORK 5 2.1. DATA RESOURCE 5 2.1.1. PubMed 5 2.1.2. Gene Ontology (GO) 6 2.1.3. The Genetic Association Database (GAD) 7 2.1.4 HUGO Gene Nomenclature Committee (HGNC) 7 2.2 RELATED RESEARCH 7 3. METHOD AND SYSTEM 10 3.3. USING THE PATTERN TO FILTER IRRELEVANT INFORMATION 10 3.2 CALCULATING THE COHESION SCORE OF AGENE GROUP 11 3.2.1. Method refinement 1–Pattern cohesion score     (PCS) 13 3.2.2  Method refinement 2 – Accumulation cohesion score     (ACS) 14 3.3 SYSTEM STRUCTURE 16 3.4 TAGGING GENE NAMES 17 3.5 ESTABLISH GENE CO-OCCURRENCE NETWORK 18 3.6 QUERY RELATED SUB-NETWORK AND CALCULATE COHESION OF    GENE GROUP 19 4. EXPERIMENT 21 4.1. OVERVIEW OF EXPERIMENT 21 4.2. EXPERIMENT 1 21 4.2.1 Evaluate the performance of the system 21 4.2.2 Result of the experiment 1 22 4.2 EXPERIMENT 2 23 4.3.1 Evaluate pattern co-occurrence network 23 4.3.2 Result of the experiment 2 24 4.4 EXPERIMENT 3 27 4.4.1 Compare pattern co-occurrence network and random     network 27 4.4.2 Result of the experiment 3 28 4.5 EXPERIMENT 4 30 4.5.1 Evaluate precision and recall rate in three     network 30 4.5.1 Result of the experiment 4 31 4.6 EXPERIMENT 5 35 4.6.1 The evaluation of the refinement method 1 35 4.6.2 Result of the experiment 5 with the first refinement     method 35 4.6.3 The evaluation of the refinement method 2 38 4.6.4 Result of the experiment 5 with the second    refinement method 38 4.7 EXPERIMENT 6 40 4.7.1 Shared relationship 40 4.7.2 Result of the experiment 6 41 5. CONCLUSION AND FUTURE WORK 42 REFERENCES 43

    [1]. J.-H. Chiang and H.-C. Yu, Literature Extraction of
       Protein Functions Using Sentence Pattern Mining. IEEE
       Transactions on knowledge and data engineering, 2005.
       Vol. 17, No 8: p. 1088-1098.
    [2]. J.-H. Chiang and H.-C. Yu, MekE: discovering the
       functions of gene products from biomedical literature
       via sentence alignment. Bioinformatics, 2003. Vol.
       19, No.11:p. 1417-1422
    [3]. H. N. Chua, W.-K. Sung and L. Wong, Exploiting
       indirect neighbours and topological weight to predict
       protein function from protein-protein interactions.
       Bioinformatics, 2006. Vol.22, No.13: p. 1623-1630.
    [4]. J. Ding, D. Berleant, D. Nettletion, and E. Wurtele,
       Minig Medline: Abstracts, Sentences or Phrases? Pac.
       Symp. Biocomp. 2002
    [5]. C. Friedman, P. Kra, H. Yu, M. Krauthammer and A.
       Rzhetsky, Genies: a natural-language processing
       system for the extraction of molecular pathways from
       journal articles, Bioinformatics, 2001. Vol. 17, pp.
       S74-S82.
    [6]. M. Huang, X. Zhu, Y. Hao, D. G. Payan, K. Qu, and M.
       Li, Discovering patterns to extract protein-protein
       interactions from full texts, Bioinformatics, 2004.
       Vol. 20, No.18: pp. 3604-3612.
    [7]. R. Homayouni, K. Heinrich, L. Wei and M. W. Berry,
       Gene clustering by Latent Semantic Indexing of
       MEDLINE abstracts. Bioinformatics, 2005. Vol.21, No.
       1: p. 104-115.
    [8]. R. Jelier , G. Jenster, L. C. J. Dorssers, C. C. van
       der Eijk, E. M. van Mulligen, B. Mons and J. A. Kors,
       Co-occurrence based meta-analysis of scientific
       texts: retrieving biological relationships between
       genes. Bioinformatics, 2005. Vol. 21, No. 9: p. 2049-
       2058.
    [9]. T.-K. Jessen, A. Lgreid, J. Komorowski and E.
       Hovig, A literature network of human genes for high-
       throughput analysis of gene expression. Nature
       Genetics, 2001. Vol. 28: p. 21-28.
    [10]. T.-K. Jessen, A. Lgreid, J. Komorowski and E.
       Hovig, Pubgen: Discovering and visualizing gene-gene
       relations. In Currents in computational Molecular
       Biology, 2000. p. 48-49.
    [11]. P. Kemmeren, T. T. J. P. Kockelkorn, T. Bjima, R.
       Donders and F. C. P. Holstege, Predicting gene
       function through systematic analysis and quality
       assessment of high-throughput data. Bioinformatics,
       2005. Vol. 21, No.8: p. 1644-1652.
    [12]. A. Koike, Y. Niwa and T. Takagi, Automatic
       extraction of gene/protein biological functions from
       biomedical text. Bioinformatics, 2005. Vol. 21, No.
       7: p. 1227-1236.
    [13]. T. Ono, H. Hishigaki, A. Tanigami, and T. Takagi,
       Automated extraction of information on protein–
       protein interactions from the biological literature,
       Bioinformatics, 2001. Vol. 17 No. 2: pp.155-161.
    [14]. S. Raychaudhuri and R. B. Altman, A literature-based
       method for assessing the functional coherence of a
       gene group. Bioinformatics, 2003. Vol. 19, No. 3: p.
       396-401.
    [15]. S. Raychaudhuri, J. T. Chang, P. D. Sutphin and R.B.
       Altman, Associating genes with gene ontology codes
       using a maximum entropy analysis of biomedical
       literature. Genome Research, 2002a Vol. 12, p. 203-214
    [16]. S. Raychaudhuri, H. Schtze and R.B. Altman, Using
       text analysis to identify functionally coherent gene
       groups. Genome Research, 2002b Vol.12, p1582-1590 
    [17]. T. C. Rindflesch, L. Tanabe, J. N. Weinstein and L.
       Hunter, EDGAR: extraction of grugs, genes and
       relations from the biomedical literature. Pacific
       Symposium on Biocomputing, 2000 p514-525.
    [18]. B. J. Stapley and G. Benoit, Biobiloiometrics:
       information retrieval and visualization from co-
       occurrences of gene names in Medline abstracts.
       Pacific Symposium on Biocomputing, 2000 p529-540.
    [19]. L. Tanabe and W. J. Wilbur, Tagging gene and protein
       names in biomedical text. Bioinformatics, 2002 Vol.
       18, No.8: p. 1124-1132
    [20]. C. C. van der Eijk, E. M. van Mulligen, J. A. Kors,
       B. Mons and J. van den Berg, Constructing an
       associative concept space for literature-based
       discovery. JASIST, 2004. Vol. 55. p. 436-444.
    [21]. J. D. Wren and H. R. Garner, Heuristics for
       identification of acronym – definition patterns
       within text: towards an automated construction of
       comprehensive acronym – dictionaries. Methods Inf.
       Med., 2002 Vol. 41, p. 426-434
    [22]. J. D. Wren and H. R. Garner, Shared relationship
       analysis: ranking set cohesion and commonalities
       within a literature-derived relationship network.
       Bioinformatics, 2004. Vol. 20, No.2: p. 191-198.
    [23]. J. D. Wren, R. Bekeredjian, J. A. Stewart, R. V.
       Shohet and H. R. Garner, Knowledge discovery by
       automated identification and ranking of implicit
       relationships. Bioinformatics, 2004. Vol. 20, No3: p.
       389-398.
    [24]. J. D. Wren, Using fuzzy set theory and scale-free
       network properties to relation MEDLINE terms. Soft
       Computing, 2006. Vol. 10, No.4 : p. 374-381
    [25]. http://geneticassociationdb.nih.gov/cgi-bin/index.cgi
    [26]. http://www.gene.ucl.ac.uk/nomenclature/data/gdlw_index.html
    [27]. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed
    [28]. http://www.geneontology.org/
    [29]. http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM

    下載圖示 校內:2008-08-06公開
    校外:2008-08-06公開
    QR CODE