簡易檢索 / 詳目顯示

研究生: 謝志欣
Hsieh, Chih-Hsin
論文名稱: 自生醫文獻中擷取出癌症生物標誌相關之驗證句
Retrieving Cancer Biomarker Related Evidence Sentences from Biomedical Literature
指導教授: 蔣榮先
Chiang, Jung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 中文
論文頁數: 39
中文關鍵詞: 癌症生物標誌文件探勘
外文關鍵詞: cancer biomarker, text mining
相關次數: 點閱:128下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來由於生物資訊學、資訊科技與網路的發展,使得生物醫學領域的文件大幅度的成長。而這些記載大量研究成果的生物醫學的文獻,經由電腦分析過後,便能淬取出有用的資訊,甚至整合成知識。在癌症生物標誌研究方面,目前主要還是以生物技術為主,如PCR或質譜分析技術。經由分析實驗結果,找尋出新的癌症生物標誌。但是,此過程仍然需要耗費大量的成本。因此,本研究提出一個自生醫文獻中擷取出癌症生物標誌相關之驗證句的架構。並在實作過程中,利用ABNER作為蛋白質名稱辨識器、收集統一醫學語言系統知識來源伺服器的資源辨識癌症疾病名稱以及收集與癌症生物標誌相關之關鍵字詞。結合共同出現關係,藉此找尋與癌症生物標誌相關之驗證句。而在最後的案例研究結果,驗證本系統的確能提供一些已知之癌症生物標誌之驗證句,如PSA、AMACR、PSMA等。希冀本文件探勘系統能對研究癌症生物標誌的生物醫學家能有所幫助。

    In recent years, with the rapid growth of articles about biomedical literature, it has become a challenge for researches to access the dramatically increasing information to understand and discovery the newest knowledge of cancer biomarkers. In this research, we provide a method to extract useful information from biomedical literature and, then to retrieve cancer biomarker related evidence sentences. We use ABNER to recognize the protein name entities, a lexicon for labeling the cancer names and a bag of keywords as the features of cancer biomarker. In addition, the concept of co-occurrence of protein names, disease names and keywords is used as the bases of related evidence sentences. After ranking the retrieved sentences, the system demonstrates significant improvement in experimental results such as the AMACR, PSA, PCA3, and so on biomarkers. This proposed text mining procedures can be a useful tool for biologists who are interested in researching cancer related biomarkers.

    第一章 導論 1 1.1 前言 1 1.2 研究動機 2 1.3 解決方法 3 1.4 論文架構 3 第二章 相關研究 4 2.1 生物資訊學 4 2.1.1 PubMed 5 2.1.1 統一醫學語言系統知識來源伺服器 6 2.2 癌症生物標誌 7 2.3 文件處理與相關分析技術 10 2.4 資訊擷取(information retrieval) 10 第三章 癌症生物標誌相關驗證句擷取系統 11 3.1 系統架構 11 3.2 文件前處理 12 3.3 名稱辨識 13 3.3.1 癌症生物標誌物名稱之辨識 13 3.3.2 癌症疾病名稱之標記 15 3.4 關鍵字詞 16 3.5 蛋白質名稱的合併 17 3.6 共同出現(co-occurrence)的概念 17 3.7 癌症生物標誌句的排序 18 第四章 系統介紹與案例分析 19 4.1 系統功能與頁面介紹 19 4.2 分門別類顯示 20 4.3 使用者輸入蛋白質查詢 20 4.4 案例研究 20 4.4.1 本系統與PubMed比較 21 4.4.2 句子排序之分析 23 4.4.3 系統效能之評估 25 4.4.4 實例輸入前列腺癌生物標誌查詢結果 27 4.4.5 實例輸入肝癌生物標誌查詢結果 30 第五章 結論與未來研究 35 5.1 結論 35 5.2 未來研究 35

    [1] B. L. Adam, A. Vlahou, O. J. Semmes and G. L. Wright, “Proteomic approaches to biomarker discovery in prostate and bladder cancers”, Proteomics, vol. 1, pp. 1264-1270, 2001.
    [2] S. Benowitz, “Liver cancer biomarkers struggling to succeed”, JNCI, Vol. 99, Issue 8, pp. 590-591, 2007.
    [3] O. Bodenreider, “The Unified Medical Language System(UMLS): integrating biomedical terminology.”, Nucleic Acids research, vol. 32, Database issue, pp. D267-D270, 2004.
    [4] H. W. Chun, Y. Tsuruoka, J. D. Kim, R. Shiba, N. Nagata, T. Hishiki and J. Tsujii, “Extraction of gene-disease relations from medline using domain dictionaries and machine learning”, Pacific Symposium on Biocomputing, vol. 11, pp. 4-15, 2006.
    [5] A. M. Cohen and W. R. Hersh, “A survey of current work in biomedical text mining”, Briefings in Bioinformatics, vol. 6,no. 1,pp. 57-71, 2005.
    [6] E. P. Diamandis and G. M. Yousef, “Human Tissue Kallikreins: A Family of New Cancer Biomarkers”, Clinical Chemistry, vol. 48(8), pp. 1198-1205, 2002.
    [7] M. H. Hamdan. Cancer biomarkers : analytical techniques for discovery. Wiley-Interscience, 2007.
    [8] J. W. Y. Ho, R. W. C. Pang, C. Lau, C. K. Sun, W. C. Yu, S. T. Fan and R. T. P. Poon, “Significance of Circulating Endothelial Progenitor Cells in Hepatocellular Carcinoma”, Hepatology, vol. 44, issue 4, pp. 836-843, 2006.
    [9] T. Karopka, J. Fluck, H. T. Mevissen and A. Glass, “The Autoimmune disease database: a dynamically compiled literature-derived database.” BMC Bioinformatics, vol. 7,pp. 325-341, 2006.
    [10] R. Kuefer, S. Varambally, M. Zhou, P. C. Lucas, M. Loeffler, H. Wolter, T. Mattfeldt, R. E. Hautmann, J. E. Gschwend, T. R. Barrette, R. L. Dunn, A. M. Chinnaiyan and M. A. Rubin, “{alpha}-Methylacyl-CoA Racemase: Expression Levels of this Novel Cancer Biomarker Depend on Tumor Differentiation”, American Journal of Patbology, vol. 161, no. 3, pp. 841-848, 2002.
    [11] N. M. Luscombe, D. Greenbaum and M. Gerstein, “What is bioinformatics? An introduction and overview”, IMIA, Yearbook of Medical Informatics, pp. 83-99, 2001.
    [12] B. Y. Ricardo and R. N. Berthier, “Modern information retrieval”, Addison Wesley Longman, 1999.
    [13] B. Settles, “ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text”, Bioinformatics, Data and text mining, vol. 21,no. 14, pp. 3191-3192, 2005.
    [14] M. Shiwa, Y. Nishimura, R. Wakatabe, A. Fukawa, H. Arikuni, H. Ota, Y. Kato and T. Yamori, “Rapid discovery and identification of a tissue-specific tumor biomarker from 39 human cancer cell lines using the SELDI ProteinChip platform”, BBRC, vol. 309, pp. 18-25, 2003.
    [15] A. Singhal, “Modern information retrieval:a brief overvier”, IEEE Data. Engineering Bulletin, vol. 24(4), pp. 35-43, 2001.
    [16] P. R. Srinivas, M. Verma, Y. Zhao and S. Srivastava, “Proteomics for cancer biomarker discovery”, Clinical Chemistry, 48:8, pp. 1160-1169, 2002.
    [17] Eutils:http://www.ncbi.nlm.nih.gov/books/bookres.fcgi/coursework/chapter_eutils.pdf
    [18] PubMed help: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.chapter.pubmedhelp
    [19] PubMed overview: http://www.nlm.nih.gov/bsd/disted/pubmedtutorial/010_050.html
    [20] SELDI system overview: http://www.evms.edu/vpc/seldi/seldiprocess/index.html

    下載圖示 校內:2009-01-28公開
    校外:2010-01-28公開
    QR CODE