簡易檢索 / 詳目顯示

研究生: 許懷仁
Hsu, Huai-Jen
論文名稱: 生物醫學文件探勘系統之架構設計與實作
A Biomedical Text Mining System in Genes' Information Discovery
指導教授: 蔣榮先
Chiang, Jung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2002
畢業學年度: 90
語文別: 中文
論文頁數: 61
中文關鍵詞: 基因關聯性預測文件探勘
外文關鍵詞: text mining, gene-gene relation prediction
相關次數: 點閱:94下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著人類基因序列的解碼,基因與各種疾病間之關聯性的研究正逐漸受到重視,而相關之研究文獻亦隨之增多。透過搜尋引擎等管道,醫學研究人員可以快速取得所需之資訊,但也同時面臨了資料過多的問題。因此我們針對醫學文件提出一個基因資訊探索的系統架構,以期提供醫學研究人員一個快速且精確的資訊服務。
    本系統共有兩個核心模組,第一為基因相關資訊之查詢與瀏覽,主要提供查詢之基因在生物功能、疾病與相關基因此三方面的資訊,使用者可以透過「特定語彙典」的訂定,決定此模組在提供資訊所包含的範圍與內容。第二則是基因與基因間之關聯性的預測,判斷兩基因間為〝正向〞、〝合作〞或是〝負向〞關聯性。此模組主要是利用文件中之〝生物功能關鍵詞〞、〝關聯性字詞〞與〝名詞〞,學習醫學文件在描述基因間關聯性時的措詞與詞彙運用之模式,並將學習所得之「文句表達模式」做為往後關聯性預測時的依據。
    我們實作了本論文所提出之系統架構,透過與〝PubMed〞的比較,我們說明了系統在第一個模組的優點與特性;透過與「決策樹」方法的比較,我們驗證了「文句表達模式」在關聯性預測上之效能。最後,希望此系統對於醫學人員的研究領域,能夠給予更為有效的幫助。

    Since the genome sequence has been decoded, there are more and more people devoted to this research domain. The medical-related papers have been published rapidly and enormously. Medical researchers can easily access the information through the search engine, but simultaneously, they also have to face the problem of information overload. So, for the medical text, we develop an information discovery system and expect that this system can offer the medical researcher a rapid and precise information service.
    In this system, there are two major functional modules. The first one is the screen of the gene-related information, including the biological functions, diseases, and related genes. User can decide what content to be shown by defining the domain-specific lexicon. The second one is the relation prediction between pair of genes, where the relations used here include positive, cooperative, and negative relation. This module utilizes the biological-function keyword, relational keyword, and noun in the biomedical text to construct the "sentence expression pattern", and take it as the basis of the relation prediction. The "sentence expression pattern" represents the pattern of wording and term distribution when describing the genes’ relations in the biomedical text.
    We have accomplished the system proposed in the thesis. By comparing with the "PubMed", we illustrate the advantage and the characteristic of the first module in the system. By comparing with the method of decision tree, we proof the performance of "Sentence Expression Pattern" in the relation prediction. Finally, we hope that this system can provide essential and significant help for the biomedical research.

    章節目錄 第一章 導論 1 1.1 系統概述 1 1.2 動機 2 1.3 解決方法 3 1.4 資料探勘與文件探勘 4 1.5 論文架構 5 第二章 文獻回顧 6 2.1 資訊分類的方法 6 2.2 以醫學文件為分析對象之資訊系統 8 第三章 基因相關資訊的瀏覽與關聯性之預測 20 3.1 系統架構 21 3.2 基因與基因間之關聯性 21 3.3 「特定語彙典」之介紹 23 3.4 基因相關資訊的查詢與瀏覽 25 3.5 基因間之關聯性預測 27 第四章 基因間之關聯性預測的方法 31 4.1 「文句表達模式」(Sentence Expression Pattern)的定義 31 4.2 特定語彙典之用途 31 4.3 「文句表達模式」之學習 32 4.3.1「文句表達模式」之組成元素 32 4.3.2「文句表達模式」的結構 32 4.3.3「文句表達模式」的學習方式 35 4.3.4「文句表達模式」的特性 41 4.4 「文句表達模式」用於關聯性預測之方法 42 第五章 實驗設計與分析 45 5.1 資料集與文件前處理 48 5.1.1 資料來源 48 5.1.2 查詢字串語法 48 5.1.3 文件格式 48 5.1.4 資料前處理 49 5.2 系統與〝PubMed〞在查詢結果與功能上之比較 51 5.3 「文句表達模式」在關聯性預測上之效能 55 第六章 結論與未來展望 59 6.1結論 59 6.2未來展望 59 參考文獻 60

    [1]M.E. Maron,〝AutoMatic Indexing: An Experimental Inquiry〞, Journal of the ACM, vol.10, no.1, 1961, pp.404-417.
    [2]H. Borko and M. Bernick, 〝Automatic Document Classification〞, Journal of the ACM, vol.10, no.1, 1963, pp.151-162.
    [3]P.S. Jacobes,〝Using Statistical Methods to Improve Knowledge-Based News Categorization〞, IEEE Expert, vol.8, no.2, April 1993, pp.13-23.
    [4] Blosseville MJ, Hebrail G, Monteil MG, Penot N, 〝Automatic Document Classification: Natural Language Processing, Statistical Analysis and Expert System Used Together〞, ACM SIGIR, Copenhaguen, June 1992.
    [5]D.f. Specht, 〝Probabilistic Neural Networks〞, Neural Networks, vol.3, 1990, pp.109-118.
    [6]Y-S. Chen, T.-H. Chu, 〝A Neural Network Classification Tree〞, IEEE International Conference on Nerual Networks, vol.1, 1995, pp.409-413.
    [7]H.S. Heaps,〝A Theory of Relevance for Automatic Document Classification〞, Information and Control, vol.22, no.3, 1973, pp.268-278.
    [7] Yasubumi Sakakibara, Kazuo Misue and Takeshi Koshiba.〝 Text classification and automatic extraction of keywords by learning decision trees〞, In Proceedings of the 9th IEEE Conference on Artificial Intelligence for Applications, pages 466--466, Los Alamitos, California, March 1993. IEEE Computer Society Press.
    [8] Apte C., Damerau F., and Weiss S. M. 〝Automated learning of decision rules for text classification〞, ACM Transactions on Information Systems 1994. IBM Research Report RC18879.
    [9] Christian Blaschke, Miguel A. Andrade, Christos Ouzounis and Alfonso Valencia (1999) "Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions". ISMB99, 60-67.
    [10] Christian Blaschke and Alfonso Valencia (2000) "Mining functional information associated to expression arrays". Functional and Integrative Genomics.
    [11] L. Tanabe, U. Scherf, L. H. Smith, J. K. Lee, L. Hunter and J. N. Weinstein, "MedMiner: an Internet Text-Mining Tool for Biomedical Information, with Application to Gene Expression Profiling." BioTechniques 27:1210-1217
    [12] Stapley & Benoit. "Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in medline abstracts." Pacific Symposia in Biocomputing 2000. In press.
    [13] Miguel A. Andrade* and Alfonso Valencia, "Automatic extraction of keywords from scientific text─Application to the knowledge domain of protein families", BIOINFORMATICS, Vol. 14 no. 7 1998
    [14] Perez-Iratxeta C, Bork P, Andrade MA. 2001, "XplorMed: a tool for exploring MEDLINE abstracts." Trends Biochem Sci. 26, 573-575.
    [15] Perez-Iratxeta C, Keer HS, Bork P, Andrade MA. 2002, "Computing fuzzy associations for the analysis of biological literature." Biotechniques. In Press.
    [16]〝Catalog & Technical Reference 2000-2001〞, pp.168, Cell Signaling Technology, U.S.A., 2000
    [17] David Lewis. 1992, 〝Representation and learning in information retrieval.〞 Technical Report 91-93, Computer Science Dept., University of Massachusetts at Amherst,. PhD Thesis.
    [18] Tom M. Mitchell. Machine Learning. pages 52-59, McGraw-Hill, New York, 1997

    下載圖示 校內:立即公開
    校外:2002-08-15公開
    QR CODE