| 研究生: |
林冠甫 Lin, Guan-Fu |
|---|---|
| 論文名稱: |
利用文獻中共同出現加強網絡來驗證與探勘基因群組相關性 The Evaluation and Mining of Gene Relationship by Constructing the Augmented Co-occurrence Network from Literatures |
| 指導教授: |
高宏宇
Kao, Hung-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 英文 |
| 論文頁數: | 44 |
| 中文關鍵詞: | 樣版 、生物資訊學 、文字探勘 |
| 外文關鍵詞: | Bioinformatics, Text mining, Patterns |
| 相關次數: | 點閱:145 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在近幾年中,關於基因群組的關係探勘是一個相當熱門的研究項目。發現具有相同功能的基因對於有機物質的了解是具有幫助的。而科學文獻是個很好的資源來發覺基因隱含的關係。在文獻中我們可以從學者的研究成果裡獲得豐富的資訊。然而,由於科技技術的進步使得現今生物相關的論文期刊成長地相當快速,這使得利用人力去審查如此大量的相關論文變成一個非常困難的任務。此外,在文獻中同時含有許多不必要的資訊,這也造成難以從文獻中找到精確的基因關係的相關知識。因此,我們希望能利用自動化的技術來分析大量的科學文獻並且能夠過濾不必要的資訊來取得正確的基因相關性。
在這篇論文中,我們分析存放在PubMed中的科學文獻並標定出現在在文獻的標題跟摘要中的基因名稱。接著,根據共同出現的基因名稱來建立一個共同出現網絡。在這過程中,我們利用不同種類的樣板去過濾不必要的資訊以取得更精確的相關性。同時我們也利用一個量化的計算方法來確認基因群組的相關性。在我們的實驗中顯示出利用樣板延伸出的網絡能獲得一個穩定且顯著的成果,而且能夠有用地評估基因的相關性。
In recent years, it is a popular research issue about the relationship mining of a set of genes. It is helpful for the realization of organisms to find genes that have the common functions. The scientific literature is a good source to find the hidden relationships of genes. We can get rich information by the result of the scientists’ research within literatures. However, the number of biological literatures is growing very fast by the improvement of techniques, and it becomes a very difficult task to survey manually so large number of biological papers. Beside, there is also much irrelevant information existed within them, and then it is hard to fetch accurate knowledge about relationships of genes in literatures. Therefore, we want to use the automatic techniques to analyze the large number of literatures and filter irrelevant information to fetch the accurate relationships of genes.
In this paper, we analyze PubMed records to tag the gene names mentioned within abstracts and titles of the literatures, and establish a co-occurrence network of genes by their frequency of co-occurrence. We use different kinds of patterns to filter irrelevant information and fetch accurate relationships. We also use a quantitative method to identify the relationship of the gene sets. Our experiments show that the derived networks attain a stable and prominent performance and are useful to evaluate the gene relationships.
[1]. J.-H. Chiang and H.-C. Yu, Literature Extraction of
Protein Functions Using Sentence Pattern Mining. IEEE
Transactions on knowledge and data engineering, 2005.
Vol. 17, No 8: p. 1088-1098.
[2]. J.-H. Chiang and H.-C. Yu, MekE: discovering the
functions of gene products from biomedical literature
via sentence alignment. Bioinformatics, 2003. Vol.
19, No.11:p. 1417-1422
[3]. H. N. Chua, W.-K. Sung and L. Wong, Exploiting
indirect neighbours and topological weight to predict
protein function from protein-protein interactions.
Bioinformatics, 2006. Vol.22, No.13: p. 1623-1630.
[4]. J. Ding, D. Berleant, D. Nettletion, and E. Wurtele,
Minig Medline: Abstracts, Sentences or Phrases? Pac.
Symp. Biocomp. 2002
[5]. C. Friedman, P. Kra, H. Yu, M. Krauthammer and A.
Rzhetsky, Genies: a natural-language processing
system for the extraction of molecular pathways from
journal articles, Bioinformatics, 2001. Vol. 17, pp.
S74-S82.
[6]. M. Huang, X. Zhu, Y. Hao, D. G. Payan, K. Qu, and M.
Li, Discovering patterns to extract protein-protein
interactions from full texts, Bioinformatics, 2004.
Vol. 20, No.18: pp. 3604-3612.
[7]. R. Homayouni, K. Heinrich, L. Wei and M. W. Berry,
Gene clustering by Latent Semantic Indexing of
MEDLINE abstracts. Bioinformatics, 2005. Vol.21, No.
1: p. 104-115.
[8]. R. Jelier , G. Jenster, L. C. J. Dorssers, C. C. van
der Eijk, E. M. van Mulligen, B. Mons and J. A. Kors,
Co-occurrence based meta-analysis of scientific
texts: retrieving biological relationships between
genes. Bioinformatics, 2005. Vol. 21, No. 9: p. 2049-
2058.
[9]. T.-K. Jessen, A. Lgreid, J. Komorowski and E.
Hovig, A literature network of human genes for high-
throughput analysis of gene expression. Nature
Genetics, 2001. Vol. 28: p. 21-28.
[10]. T.-K. Jessen, A. Lgreid, J. Komorowski and E.
Hovig, Pubgen: Discovering and visualizing gene-gene
relations. In Currents in computational Molecular
Biology, 2000. p. 48-49.
[11]. P. Kemmeren, T. T. J. P. Kockelkorn, T. Bjima, R.
Donders and F. C. P. Holstege, Predicting gene
function through systematic analysis and quality
assessment of high-throughput data. Bioinformatics,
2005. Vol. 21, No.8: p. 1644-1652.
[12]. A. Koike, Y. Niwa and T. Takagi, Automatic
extraction of gene/protein biological functions from
biomedical text. Bioinformatics, 2005. Vol. 21, No.
7: p. 1227-1236.
[13]. T. Ono, H. Hishigaki, A. Tanigami, and T. Takagi,
Automated extraction of information on protein–
protein interactions from the biological literature,
Bioinformatics, 2001. Vol. 17 No. 2: pp.155-161.
[14]. S. Raychaudhuri and R. B. Altman, A literature-based
method for assessing the functional coherence of a
gene group. Bioinformatics, 2003. Vol. 19, No. 3: p.
396-401.
[15]. S. Raychaudhuri, J. T. Chang, P. D. Sutphin and R.B.
Altman, Associating genes with gene ontology codes
using a maximum entropy analysis of biomedical
literature. Genome Research, 2002a Vol. 12, p. 203-214
[16]. S. Raychaudhuri, H. Schtze and R.B. Altman, Using
text analysis to identify functionally coherent gene
groups. Genome Research, 2002b Vol.12, p1582-1590
[17]. T. C. Rindflesch, L. Tanabe, J. N. Weinstein and L.
Hunter, EDGAR: extraction of grugs, genes and
relations from the biomedical literature. Pacific
Symposium on Biocomputing, 2000 p514-525.
[18]. B. J. Stapley and G. Benoit, Biobiloiometrics:
information retrieval and visualization from co-
occurrences of gene names in Medline abstracts.
Pacific Symposium on Biocomputing, 2000 p529-540.
[19]. L. Tanabe and W. J. Wilbur, Tagging gene and protein
names in biomedical text. Bioinformatics, 2002 Vol.
18, No.8: p. 1124-1132
[20]. C. C. van der Eijk, E. M. van Mulligen, J. A. Kors,
B. Mons and J. van den Berg, Constructing an
associative concept space for literature-based
discovery. JASIST, 2004. Vol. 55. p. 436-444.
[21]. J. D. Wren and H. R. Garner, Heuristics for
identification of acronym – definition patterns
within text: towards an automated construction of
comprehensive acronym – dictionaries. Methods Inf.
Med., 2002 Vol. 41, p. 426-434
[22]. J. D. Wren and H. R. Garner, Shared relationship
analysis: ranking set cohesion and commonalities
within a literature-derived relationship network.
Bioinformatics, 2004. Vol. 20, No.2: p. 191-198.
[23]. J. D. Wren, R. Bekeredjian, J. A. Stewart, R. V.
Shohet and H. R. Garner, Knowledge discovery by
automated identification and ranking of implicit
relationships. Bioinformatics, 2004. Vol. 20, No3: p.
389-398.
[24]. J. D. Wren, Using fuzzy set theory and scale-free
network properties to relation MEDLINE terms. Soft
Computing, 2006. Vol. 10, No.4 : p. 374-381
[25]. http://geneticassociationdb.nih.gov/cgi-bin/index.cgi
[26]. http://www.gene.ucl.ac.uk/nomenclature/data/gdlw_index.html
[27]. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed
[28]. http://www.geneontology.org/
[29]. http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM