| 研究生: |
陳疇丞 Chen, Chou-Cheng |
|---|---|
| 論文名稱: |
文章探勘和資料探勘在癌症研究之運用 Application of text mining and data mining in cancer research |
| 指導教授: |
何中良
Ho, Chung-Liang |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
醫學院 - 基礎醫學研究所 Institute of Basic Medical Sciences |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 英文 |
| 論文頁數: | 93 |
| 中文關鍵詞: | 文章探勘 、資料探勘 、幹細胞 、大腸癌 、肝癌 |
| 外文關鍵詞: | data mining, text mining, stem cell, colorectal cancer, liver cancer, cancer stem cell, TCGA |
| 相關次數: | 點閱:141 下載:17 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
這個研究主要是利用文章探勘與資料探勘來篩選有關肝癌與大腸癌的基因,我們利用EST資料探勘與文獻回顧找出五十個未知功能的基因,並且藉由實驗發現ZNF496、RMI2和U41可能為WNT標的基因並且與肝癌有關。在已知基因部分,我利用自己撰寫的文章探勘工具PubstractHelper、StemTextSearch與現有的資料探勘工具GeneCards、NCBI的GEO找出二十個可能基因,經由實驗驗發現IGF2BP1可能為再復發肝癌的在血液循環中oncofetal幹細胞相似標記。我利用TCGA的資料來資料探勘並且尋找出二十三個可能與大腸癌相關的基因,我們最後選出可以買到抗體的三個基因並且利用免疫組織化學染色測試是否在大腸癌有表現。這個研究顯示了我們可以利用文章探勘與資料探勘來幫助科學家縮小可能與癌症相關的候選基因。
This study aimed to use text and data mining to select candidate genes which are associated with liver and colorectal cancer. Fifty unknown candidate genes were selected by data mining the EST library, and ZNF496, RMI2 and U41 were found that may be associated with WNT target genes and liver cancer. Twenty known candidate genes were selected by text mining PubstractHelper and StemTextSearch, and data mining GeneCards and GEO of NCBI. IGF2BP1 was found to be associated with the oncofetal circulating cancer stem cell-like markers associated with the recurrence of hepatocellular carcinoma by experiment. Twenty-three candidate genes were selected by data mining from TCGA (the cancer genome atlas) data, and the three remaining candidate genes are examined as to whether they are expressed in colorectal cancer by IHC (immunohistochemistry). This study shows that text and data mining are alternative methods to help scientist narrow down their candidate genes which are associated with cancer.
Cheng, S.W., et al., Lin28B is an oncofetal circulating cancer stem cell-like marker associated with recurrence of hepatocellular carcinoma. PLoS One, 8(11): p. e80053. 2013.
2. Sklan, A., US Supreme Court rules on landmark gene patent case. Pharm Pat Anal, 2(5): p. 581. 2013.
3. Coordinators, N.R., Database Resources of the National Center for Biotechnology Information. Nucleic Acids Res, 45(D1): p. D12-D17. 2017.
4. Safran, M., et al., GeneCards Version 3: the human gene integrator. Database (Oxford), 2010: p. baq020. 2010.
5. Hsu, C.C., et al., Identifying LRRC16B as an oncofetal gene with transforming enhancing capability using a combined bioinformatics and experimental approach. Oncogene, 30(6): p. 654-67. 2011.
6. Widelitz, R., Wnt signaling through canonical and non-canonical pathways: recent progress. Growth Factors, 23(2): p. 111-6. 2005.
7. Rebholz-Schuhmann, D., et al., EBIMed--text crunching to gather facts for proteins from Medline. Bioinformatics, 23(2): p. e237-44. 2007.
8. He, X., et al., BSQA: integrated text mining using entity relation semantics extracted from biological literature of insects. Nucleic Acids Res, 38(Web Server issue): p. W175-81. 2010.
9. Fang, Y.C., H.C. Huang, and H.F. Juan, MeInfoText: associated gene methylation and cancer information from text mining. BMC Bioinformatics, 9: p. 22. 2008.
10. Wei, C.H., H.Y. Kao, and Z. Lu, PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res, 41(Web Server issue): p. W518-22. 2013.
11. Chen, C.C. and C.L. Ho, PubstractHelper: A Web-based Text-Mining Tool for Marking Sentences in Abstracts from PubMed Using Multiple User-Defined Keywords. Bioinformation, 10(11): p. 708-10. 2014.
12. Lee, H.J., et al., OncoSearch: cancer gene search engine with literature evidence. Nucleic Acids Res, 42(Web Server issue): p. W416-21. 2014.
13. Wohlers, I., et al., The Characterization Tool: A knowledge-based stem cell, differentiated cell, and tissue database with a web-based analysis front-end. Stem Cell Res, 3(2-3): p. 88-95. 2009.
14. Turenne, N., et al., Finding biomarkers in non-model species: literature mining of transcription factors involved in bovine embryo development. BioData Min, 5(1): p. 12. 2012.
15. Xie, W., J. Sun, and J. Wu, Construction and analysis of a protein-protein interaction network related to self-renewal of mouse spermatogonial stem cells. Mol Biosyst, 11(3): p. 835-43. 2015.
16. Chen, C.C. and C.L. Ho, StemTextSearch: Stem cell gene database with evidence from abstracts. J Biomed Inform. 2017.
17. Stirewalt, D.L., et al., Identification of genes with abnormal expression changes in acute myeloid leukemia. Genes Chromosomes Cancer, 47(1): p. 8-20. 2008.
18. Cancer Genome Atlas Research, N., et al., The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet, 45(10): p. 1113-20. 2013.
19. Gao, J., et al., Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal, 6(269): p. pl1. 2013.
20. Yin, F., et al., Microarray-based identification of genes associated with cancer progression and prognosis in hepatocellular carcinoma. J Exp Clin Cancer Res, 35(1): p. 127. 2016.
21. M., A.A.A., et al., PSEUDO GENETIC AND PROBABILISTIC-BASED FEATURE SELECTION METHOD FOR EXTRACTIVE SINGLE DOCUMENT SUMMARIZATION. Journal of Theoretical and Applied Information Technology, 32(1): p. 8. 2011.
22. Bird, S. and M. Liberman, A formal framework for linguistic annotation Speech Communication, 33(1-2): p. 38. 2000.
23. Y., M. and I. M., KEYWORD EXTRACTION FROM A SINGLE DOCUMENT USING WORD CO-OCCURRENCE STATISTICAL INFORMATION. International Journal on Artificial Intelligence Tools, 13(1). 2004.
24. Maglott, D., et al., Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res, 39(Database issue): p. D52-7. 2011.
25. MATSUO, Y. and M. Ishizuka, Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information. 13(01): p. 13. 2004.
26. Moldovan, S.M., et al., [Treatment of unilateral limbal stem cell deficiency syndrome by limbal autograft]. J Fr Ophtalmol, 22(3): p. 302-9. 1999.
27. Sangwan, V.S., et al., Simple limbal epithelial transplantation (SLET): a novel surgical technique for the treatment of unilateral limbal stem cell deficiency. Br J Ophthalmol, 96(7): p. 931-4. 2012.
28. Amescua, G., et al., Modified simple limbal epithelial transplantation using cryopreserved amniotic membrane for unilateral limbal stem cell deficiency. Am J Ophthalmol, 158(3): p. 469-75 e2. 2014.
29. Schwartz, A.S. and M.A. Hearst, A simple algorithm for identifying abbreviation definitions in biomedical text. Pac Symp Biocomput: p. 451-62. 2003.
30. Awad, H.A., et al., Autologous mesenchymal stem cell-mediated repair of tendon. Tissue Eng, 5(3): p. 267-77. 1999.
31. Lee, K., et al., BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations. Database (Oxford), 2016. 2016.
32. Reiter, R.E., et al., Prostate stem cell antigen: a cell surface marker overexpressed in prostate cancer. Proc Natl Acad Sci U S A, 95(4): p. 1735-40. 1998.
33. A, D.A.C.P., et al., Co-expression of stem cell markers ALDH1 and CD44 in non-malignant and neoplastic lesions of the breast. Anticancer Res, 34(3): p. 1427-34. 2014.
34. Zhang, Y., et al., Lef1 contributes to the differentiation of bulge stem cells by nuclear translocation and cross-talk with the Notch signaling pathway. Int J Med Sci, 10(6): p. 738-46. 2013.
35. Klein, D. and C.D. Manning, Fast Exact Inference with a Factored Model for Natural Language Parsing. In Advances in Neural Information Processing Systems 15 (NIPS 2002), Cambridge, MA: MIT Press: p. 8. 2003.
36. Bjorne, J., F. Ginter, and T. Salakoski, University of Turku in the BioNLP'11 Shared Task. BMC Bioinformatics, 13 Suppl 11: p. S4. 2012.
37. Lee, H.J., et al., CoMAGC: a corpus with multi-faceted annotations of gene-cancer relations. BMC Bioinformatics, 14: p. 323. 2013.
38. Berger, A.L., V.J. Della Pietra, and S.A. Della Pietra, A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics, 22(1): p. 34. 1996.
39. Glader, B.E. and K. Backer, Elevated red cell adenosine deaminase activity: a marker of disordered erythropoiesis in Diamond-Blackfan anaemia and other haematologic diseases. Br J Haematol, 68(2): p. 165-8. 1988.
40. Xu, L., et al., Cellular retinol-binding protein 1 (CRBP-1) regulates osteogenenesis and adipogenesis of mesenchymal stem cells through inhibiting RXRalpha-induced beta-catenin degradation. Int J Biochem Cell Biol, 44(4): p. 612-9. 2012.
41. Eyler, C.E., et al., Brain cancer stem cells display preferential sensitivity to Akt inhibition. Stem Cells, 26(12): p. 3027-36. 2008.
42. Staniszewska, A.D., et al., Stat3 is required to maintain the full differentiation potential of mammary stem cells and the proliferative potential of mammary luminal progenitors. PLoS One, 7(12): p. e52608. 2012.
43. Guo, W., et al., Slug and Sox9 cooperatively determine the mammary stem cell state. Cell, 148(5): p. 1015-28. 2012.
44. Sun, G., et al., Histone demethylase LSD1 regulates neural stem cell proliferation. Mol Cell Biol, 30(8): p. 1997-2005. 2010.
45. Ono, T. and S. Kuhara, A novel method for gathering and prioritizing disease candidate genes based on construction of a set of disease-related MeSH(R) terms. BMC Bioinformatics, 15: p. 179. 2014.
46. DeLuca, D.S., et al., MaHCO: an ontology of the major histocompatibility complex for immunoinformatic applications and text mining. Bioinformatics, 25(16): p. 2064-70. 2009.
47. Urbanski, W.M. and B.G. Condie, Textpresso site-specific recombinases: A text-mining server for the recombinase literature including Cre mice and conditional alleles. Genesis, 47(12): p. 842-6. 2009.
48. Oh, J.H. and J.O. Deasy, A literature mining-based approach for identification of cellular pathways associated with chemoresistance in cancer. Brief Bioinform. 2015.
49. Mahmood, S., M. Shahbaz, and A. Guergachi, Negative and positive association rules mining from text using frequent and infrequent itemsets. ScientificWorldJournal, 2014: p. 973750. 2014.
50. Liu, R.-L. and Y.-C. Huang, Ranker enhancement for proximity-based ranking of biomedical texts. Journal of the American Society for Information Science and Technology, 62(12): p. 17. 2011.
51. Kim, J., et al., DigSee: Disease gene search engine with evidence sentences (version cancer). Nucleic Acids Res, 41(Web Server issue): p. W510-7. 2013.
52. Torii, M., et al., RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information. IEEE/ACM Trans Comput Biol Bioinform, 12(1): p. 17-29. 2015.
53. Kahl, P., et al., Androgen receptor coactivators lysine-specific histone demethylase 1 and four and a half LIM domain protein 2 predict risk of prostate cancer recurrence. Cancer Res, 66(23): p. 11341-7. 2006.