| 研究生: |
林宗達 Lin, Tsung-Ta |
|---|---|
| 論文名稱: |
利用文獻探勘預測ESTs功能相關性 Utilizing Text Mining to Predict Functional Relationships of ESTs |
| 指導教授: |
王惠嘉
Wang, Hei-Chia |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2005 |
| 畢業學年度: | 93 |
| 語文別: | 中文 |
| 論文頁數: | 53 |
| 中文關鍵詞: | 功能群組 、功能相關性 、文獻探勘 、序列註解 |
| 外文關鍵詞: | Function Group, Sequence Annotation, Functional Relationship, Text Mining |
| 相關次數: | 點閱:82 下載:13 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在後基因體時代,生物相關研究人員通常希望能獲得更多生物序列(biological sequence)的相關資訊,特別是有關於實驗所產生的未知Expressed Sequence Tag (EST)所具有的功能及EST之間的功能關聯性,目前因生物醫學文獻數量龐大且容易取得,因此可當為輔助資訊的主要來源。然而隨著生物科技的進步,使得生物相關資料庫的資料量呈現快速的成長,如何利用電腦自動從大量的文獻資料中找出有用的資訊便成為當今生物資訊領域中一個重要的課題。
現行的許多方法都是希望能透過和基因相關的生物醫學文獻來找出不同基因之間在功能上的相關性,然而目前這類型的研究皆只針對不同基因之間文獻的關聯性,卻沒有進一步分析這些文獻的內容。因此本研究希望能藉由text-mining技術,從各EST相關的文獻中,找出此EST可能的功能性關鍵字集(keyword list),並且藉由每條EST的Direct Reference和Related Document之關聯性計算,搭配關鍵字集之間的相似性,定義出不同EST之間功能上的相關性,透過此方式從大量的ESTs中建立功能群組(function group)。接著將每個功能群組中的EST和現行已知的生化代謝途徑(pathway)資料庫進行序列相似度比對(sequence alignment),藉此推得各EST所屬的生化代謝途徑,在這些關聯中,因有些EST對生化代謝途徑資料庫的比對結果為未知,此原因可能是目前生化代謝途徑資料庫(如KEGG)資料是以人工建立較不完整,所以我們可以利用我們所提出之方法找出來的結果,來推論同一個功能群組中其它未知EST可能的生化代謝途徑,藉此來達成生化代謝途徑的預測。
none
Andrade, M., & Valencia, A. (1998). Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics, 14(7), 600-607.
Baeza-Yates, R., & Ribeiro-Neto, B. (1999) Modern Information Retrieval. New York: The ACM Press.
Bassett, D.E., Eisen, M.B., & Boguski, M.S. (1999). Gene expression informatics—it’s all in your mine. Nature Genetics, 21, 51-55.
Crestani, F., Lalmas, M., Rijsbergen, C.J.V., & Campbell, I. (1998). Is this document relevant ? … Probably: A Survey of Probabilistic Models in Information Retrieval. ACM Computing Surveys, 30(4), 528-552.
Eisenberg, D., Marcotte, M.E., Xenarios, I., & Yeates, O.T. (2000). Protein function in the post-genomic era. Nature, 405, 823-826.
Fuhr, N. (1992). Probabilistic models in information retrieval. The Computer Journal, 35(3), 243-255.
Jenssen, T.K., Laegreid, A., Komorowski, J., & Hovig, E. (2001). A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics, 28, 21-28.
Kanehisa, M., & Goto, S. (2000). KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research, 28, 27-30.
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., & Hattori, M. (2004). The KEGG resources for deciphering the genome. Nucleic Acids Research, 32, 277-280.
Li, P., Nijhawan, D., Budihardjo, I., Srinivasula, S.M., Ahmad, M., Alnemri, E.S., & Wang, X. (1997) Cytochrom c and dATP-dependent formation of Apaf-1 /caspase-9 complex initiates an apoptotic protease cascade. Cell, 91(4), 479-489.
Liu, Y., Brandon, M., Navathe, S., Dingledine, R., & Ciliax, B.J. (2004) Text mining functional keywords associated with genes. Medinfo 2004, San Francisco, 292-296.
Mack, R., & Hehenberger, M. (2002). Text-based knowledge discovery: search and mining of life-sciences documents. Drug Discovery Today, 7(11), 89-98.
Manning, C.M., & Schutze, H. (1999). Foundations of statistical natural language processing. Cambridge: The MIT Press.
Myers, E. (1999). Whole-genome DNA sequencing. IEEE Computational Engineering and Science, 1(3), 33-43.
Porter, M. (1980). An algorithm for suffix stripping. Program, 14, 130-137.
Raychaudhuri, S., Schutze, H., & Altman, R.B. (2002). Using text analysis to identify functionally coherent gene groups. Genome Research, 12(10), 1582-1590.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic retrieval. Information Processing & Management, 24(5), 513-523.
Salton, G., Wang, A., & Yang, C.S. (1975). A vector space model for automatic indexing. Communications of the ACM, 11, 613-620.
Shah, P.K., Perez-Iratxeta, C., Bork, P., & Andrade, M.A. (2003). Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatics, 4(1), 20-28.
Shatkay, H., Edwards, S., & Boguski, M. (2002). Information retrieval meets gene analysis. IEEE Intelligent Systems, Special Issue on Intelligent Systems in Biology, 17(2), 45-53.
Shatkay, H., & Feldman, R. (2003). Mining the biomedical literature in the genomic era: an overview. Journal of Computational Biology, 10(6), 821-855.
Tao, Y.C., & Leibel, R.L. (2002). Identifying functional relationships among human genes by systematic analysis of biological literature. BMC Bioinformatics, 3(16), 1-9.
Tu, Q., Tang, H., & Ding, D. (2004). MedBlast: searching articles related to a biological sequence. Bioinformatics, 20(1), 75-77.
The Gene Ontology Consortium. (2000). Gene Ontology: Tool for the unification of biology. Nature Genetics, 25, 25-29.
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., et al. (2001). The sequence of the human genome. Science, 291, 1304-1351.