| 研究生: |
謝志欣 Hsieh, Chih-Hsin |
|---|---|
| 論文名稱: |
自生醫文獻中擷取出癌症生物標誌相關之驗證句 Retrieving Cancer Biomarker Related Evidence Sentences from Biomedical Literature |
| 指導教授: |
蔣榮先
Chiang, Jung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 中文 |
| 論文頁數: | 39 |
| 中文關鍵詞: | 癌症生物標誌 、文件探勘 |
| 外文關鍵詞: | cancer biomarker, text mining |
| 相關次數: | 點閱:128 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來由於生物資訊學、資訊科技與網路的發展,使得生物醫學領域的文件大幅度的成長。而這些記載大量研究成果的生物醫學的文獻,經由電腦分析過後,便能淬取出有用的資訊,甚至整合成知識。在癌症生物標誌研究方面,目前主要還是以生物技術為主,如PCR或質譜分析技術。經由分析實驗結果,找尋出新的癌症生物標誌。但是,此過程仍然需要耗費大量的成本。因此,本研究提出一個自生醫文獻中擷取出癌症生物標誌相關之驗證句的架構。並在實作過程中,利用ABNER作為蛋白質名稱辨識器、收集統一醫學語言系統知識來源伺服器的資源辨識癌症疾病名稱以及收集與癌症生物標誌相關之關鍵字詞。結合共同出現關係,藉此找尋與癌症生物標誌相關之驗證句。而在最後的案例研究結果,驗證本系統的確能提供一些已知之癌症生物標誌之驗證句,如PSA、AMACR、PSMA等。希冀本文件探勘系統能對研究癌症生物標誌的生物醫學家能有所幫助。
In recent years, with the rapid growth of articles about biomedical literature, it has become a challenge for researches to access the dramatically increasing information to understand and discovery the newest knowledge of cancer biomarkers. In this research, we provide a method to extract useful information from biomedical literature and, then to retrieve cancer biomarker related evidence sentences. We use ABNER to recognize the protein name entities, a lexicon for labeling the cancer names and a bag of keywords as the features of cancer biomarker. In addition, the concept of co-occurrence of protein names, disease names and keywords is used as the bases of related evidence sentences. After ranking the retrieved sentences, the system demonstrates significant improvement in experimental results such as the AMACR, PSA, PCA3, and so on biomarkers. This proposed text mining procedures can be a useful tool for biologists who are interested in researching cancer related biomarkers.
[1] B. L. Adam, A. Vlahou, O. J. Semmes and G. L. Wright, “Proteomic approaches to biomarker discovery in prostate and bladder cancers”, Proteomics, vol. 1, pp. 1264-1270, 2001.
[2] S. Benowitz, “Liver cancer biomarkers struggling to succeed”, JNCI, Vol. 99, Issue 8, pp. 590-591, 2007.
[3] O. Bodenreider, “The Unified Medical Language System(UMLS): integrating biomedical terminology.”, Nucleic Acids research, vol. 32, Database issue, pp. D267-D270, 2004.
[4] H. W. Chun, Y. Tsuruoka, J. D. Kim, R. Shiba, N. Nagata, T. Hishiki and J. Tsujii, “Extraction of gene-disease relations from medline using domain dictionaries and machine learning”, Pacific Symposium on Biocomputing, vol. 11, pp. 4-15, 2006.
[5] A. M. Cohen and W. R. Hersh, “A survey of current work in biomedical text mining”, Briefings in Bioinformatics, vol. 6,no. 1,pp. 57-71, 2005.
[6] E. P. Diamandis and G. M. Yousef, “Human Tissue Kallikreins: A Family of New Cancer Biomarkers”, Clinical Chemistry, vol. 48(8), pp. 1198-1205, 2002.
[7] M. H. Hamdan. Cancer biomarkers : analytical techniques for discovery. Wiley-Interscience, 2007.
[8] J. W. Y. Ho, R. W. C. Pang, C. Lau, C. K. Sun, W. C. Yu, S. T. Fan and R. T. P. Poon, “Significance of Circulating Endothelial Progenitor Cells in Hepatocellular Carcinoma”, Hepatology, vol. 44, issue 4, pp. 836-843, 2006.
[9] T. Karopka, J. Fluck, H. T. Mevissen and A. Glass, “The Autoimmune disease database: a dynamically compiled literature-derived database.” BMC Bioinformatics, vol. 7,pp. 325-341, 2006.
[10] R. Kuefer, S. Varambally, M. Zhou, P. C. Lucas, M. Loeffler, H. Wolter, T. Mattfeldt, R. E. Hautmann, J. E. Gschwend, T. R. Barrette, R. L. Dunn, A. M. Chinnaiyan and M. A. Rubin, “{alpha}-Methylacyl-CoA Racemase: Expression Levels of this Novel Cancer Biomarker Depend on Tumor Differentiation”, American Journal of Patbology, vol. 161, no. 3, pp. 841-848, 2002.
[11] N. M. Luscombe, D. Greenbaum and M. Gerstein, “What is bioinformatics? An introduction and overview”, IMIA, Yearbook of Medical Informatics, pp. 83-99, 2001.
[12] B. Y. Ricardo and R. N. Berthier, “Modern information retrieval”, Addison Wesley Longman, 1999.
[13] B. Settles, “ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text”, Bioinformatics, Data and text mining, vol. 21,no. 14, pp. 3191-3192, 2005.
[14] M. Shiwa, Y. Nishimura, R. Wakatabe, A. Fukawa, H. Arikuni, H. Ota, Y. Kato and T. Yamori, “Rapid discovery and identification of a tissue-specific tumor biomarker from 39 human cancer cell lines using the SELDI ProteinChip platform”, BBRC, vol. 309, pp. 18-25, 2003.
[15] A. Singhal, “Modern information retrieval:a brief overvier”, IEEE Data. Engineering Bulletin, vol. 24(4), pp. 35-43, 2001.
[16] P. R. Srinivas, M. Verma, Y. Zhao and S. Srivastava, “Proteomics for cancer biomarker discovery”, Clinical Chemistry, 48:8, pp. 1160-1169, 2002.
[17] Eutils:http://www.ncbi.nlm.nih.gov/books/bookres.fcgi/coursework/chapter_eutils.pdf
[18] PubMed help: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.chapter.pubmedhelp
[19] PubMed overview: http://www.nlm.nih.gov/bsd/disted/pubmedtutorial/010_050.html
[20] SELDI system overview: http://www.evms.edu/vpc/seldi/seldiprocess/index.html