| 研究生: |
張綾娟 Chang, Ling-Chuan |
|---|---|
| 論文名稱: |
使用基於 Gene Ontology 架構之神經網路透過基因表現數據對腦部膠質瘤進行可解釋的分類 A Gene Ontology Based Architecture Neural Network for Interpretable Classification of Glioma Types from Gene Expression Data |
| 指導教授: |
賀保羅
Paul Horton |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 39 |
| 中文關鍵詞: | 可解釋性模型 、基因表現 、基因本體論 |
| 外文關鍵詞: | Interpretable neural network model, Gene expression, Gene Ontology |
| 相關次數: | 點閱:105 下載:33 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來不管在圖像分類、自然語言處理或是語音處理等許多領域,深度學習都是一種受歡迎且效果很好的技術。然而對深度學習來說,最重要的問題是模型缺乏解釋性,尤其在生物醫學這個特別需要解釋性的領域。因此我們的目標是建立一個可解釋的模型並應用在腦部膠質瘤類型預測。
神經膠質瘤是一種起源於膠質細胞的原發性腦癌,大約佔據原發性腦癌的80%。在癌症預測的研究中,有些研究使用過度表達的基因表現、生物途徑或是結合不同種類的資料來進行預測及解釋模型。然而生物體內的複雜性很高,僅僅使用過度表達的基因結合生物途徑無法獲得全面的資訊。
在本文中,我們提出BP-Gene模型,利用基因表現和生物路徑資料庫:基因本體論 (Gene Ontology) 作為模型架構來預測正常和不同類型的神經膠質瘤。基因本體論整合生物功能及基因,並且對生物功能進行階層式分類。透過分類好的功能結合基因表現,不僅定義了模型神經元的意義、也可以讓模型階層地學習到不同的資訊。
實驗結果表明,BP-Gene模型的性能優於其他方法,模型學習到的重要特徵也和其他文獻中的生物分析結果相符。除此之外我們更發現不僅只有過度表達的基因會影響疾病。
In recent years, deep learning is a popular and effective technology in many fields such as image classification, natural language processing and speech processing. However, the most important problem in deep learning is the lack of interpretability of the model, especially in the biomedical field, which requires interpretability. Therefore, our approach is to establish an interpretable model and apply it to the prediction of glioma types.
Gliomas are a type of primary brain cancer originated from glial cells and about 80% of primary brain cancer are gliomas. Among the studies on cancer prediction, some studies use over-representation analysis of gene expression, biological pathways, or combine different types of data to predict and explain models. However, using only differential genes and biological pathways in gene expression is not enough to obtain comprehensive information because of relations in biology are complex and interlocking.
In this paper, we proposed BP-Gene Model using gene expression and the biological knowledge database: Gene Ontology (GO) as a model architecture to discriminate between normal and different types of gliomas. GO integrates biological functions and genes and then classifies biological functions hierarchically. Combining well-categorized biological functions with gene expression not only can assign meaning to neural units in a computational neural network, but allows such a model to learn different information hierarchically.
The experimental results show the performance of BP-Gene model is better than other compared methods and important biological factors learned by the model are also consistent with the biological analysis results in other literature. In addition, we found that not only differential genes may affect diseases.
[1] Gene ontology resource. http://geneontology.org/. [Online; accessed 20-June-2021].
[2] Embl-Ebi. Gene ontology and go annotations. https://www.ebi.ac.uk/QuickGO/. [Online; accessed 20-June-2021].
[3] The Cancer Genome Atlas Program. National cancer institute.https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga. [Online; accessed 20-June-2021].
[4] Anil K Jain, Jianchang Mao, and K Moidin Mohiuddin. Artificial neural networks:A tutorial. Computer, 29(3):31-44, 1996.
[5] David Gunning. Explainable artificial intelligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web, 2(2), 2017.
[6] Fei-Hung Hung and Hung-Wen Chiu. Cancer subtype prediction from a pathway-level perspective by using a support vector machine based on integrated gene expression and protein network. Computer methods and programs in biomedicine, 141:27-34, 2017.
[7] Blaise Hanczar, Farida Zehraoui, Tina Issa, and Mathieu Arles. Biological interpretation of deep neural network for phenotype prediction based on gene expression. BMC bioinformatics, 21(1):1-18, 2020.
[8] Jie Hao, Youngsoon Kim, Tejaswini Mallavarapu, Jung Hun Oh, and Mingon Kang. Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data. BMC medical genomics, 12(10):1-13, 2019.
[9] Michael Ashburner, Catherine A Ball, Judith A Blake, David Botstein, Heather Butler, J Michael Cherry, Allan P Davis, Kara Dolinski, Selina S Dwight, Janan T Eppig, et al. Gene ontology: tool for the unication of biology. Nature genetics, 25(1):25-29, 2000.
[10] The gene ontology resource: enriching a gold mine. Nucleic Acids Research, 49(D1):D325-D334, 2021.
[11] Quinn T Ostrom, Nirav Patil, Gino Cio, KristinWaite, Carol Kruchko, and Jill S Barnholtz-Sloan. Cbtrus statistical report: Primary brain and other central nervous system tumors diagnosed in the united states in 2013{2017. Neuro-oncology, 22(Supplement 1):iv1-iv96, 2020.
[12] Erin Dunbar and Anthony T Yachnis. Glioma diagnosis: immunohistochemistry and beyond. Advances in anatomic pathology, 17(3):187-201, 2010.
[13] Karlyne M Reilly. Brain tumor susceptibility: the role of genetic factors and uses of mouse models to unravel risk. Brain pathology, 19(1):121-131, 2009.
[14] Luca Bertero and Paola Cassoni. Classification of tumours of the central nervous system. In Neurorehabilitation in Neuro-Oncology, pages 21-36. Springer, 2019.
[15] Azizul Haque, Naren L Banik, and Swapan K Ray. Molecular alterations in glioblastoma: potential targets for immunotherapy. Progress in molecular biology and translational science, 98:187-234, 2011.
[16] Ruty Shai, Tao Shi, Thomas J Kremen, Steve Horvath, Linda M Liau, Timothy F Cloughesy, Paul S Mischel, and Stanley F Nelson. Gene expression profiling identifies molecular subtypes of gliomas. Oncogene, 22(31):4918-4923, 2003.
[17] Jacques Lapointe, Chunde Li, John P Higgins, Matt Van De Rijn, Eric Bair, Kelli Montgomery, Michelle Ferrari, Lars Egevad,Walter Rayford, Ulf Bergerheim, et al. Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proceedings of the National Academy of Sciences, 101(3):811-816, 2004.
[18] Gene Ontology Consortium. Gene ontology consortium: going forward. Nucleic acids research, 43(D1):D1049-D1056, 2015.
[19] Michael I Jordan and Tom M Mitchell. Machine learning: Trends, perspectives, and prospects. Science, 349(6245):255-260, 2015.
[20] Taiwo Oladipupo Ayodele. Types of machine learning algorithms. New advances in machine learning, 3:19-48, 2010.
[21] Ayon Dey. Machine learning algorithms: a review. International Journal of Computer Science and Information Technologies, 7(3):1174-1179, 2016.
[22] Amanpreet Singh, Narina Thakur, and Aakanksha Sharma. A review of supervised machine learning algorithms. In 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pages 1310-1315. Ieee, 2016.
[23] Yan-Yan Song and LU Ying. Decision tree methods: applications for classication and prediction. Shanghai archives of psychiatry, 27(2):130, 2015.
[24] Gerard Biau and Erwan Scornet. A random forest guided tour. Test, 25(2):197-227, 2016.
[25] Lixin Sun, Ai-Min Hui, Qin Su, Alexander Vortmeyer, Yuri Kotliarov, Sandra Pastorino, Antonino Passaniti, Jayant Menon, Jennifer Walling, Rolando Bailey, et al. Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer cell, 9(4):287-300, 2006.
[26] Hongfang Liu, Ionut Bebu, and Xin Li. Microarray probes and probe sets. Frontiers in bioscience (Elite edition), 2:325, 2010.
[27] Jeremy Harbig, Robert Sprinkle, and Steven A Enkemann. A sequence-based identification of the genes detected by probesets on the affymetrix u133 plus 2.0 array. Nucleic acids research, 33(3):e31-e31, 2005.
[28] Marc Carlson, S Falcon, H Pages, and N Li. hgu133plus2. db: Affymetrix human genome u133 plus 2.0 array annotation data (chip hgu133plus2). R package version, 3(3), 2016.
[29] R Gentleman. Basic go usage. 2012.
[30] David Binns, Emily Dimmer, Rachael Huntley, Daniel Barrell, Claire O'donovan, and Rolf Apweiler. Quickgo: a web-based tool for gene ontology searching. Bioinformatics, 25(22):3045-3046, 2009.
[31] Matt W Gardner and SR Dorling. Articial neural networks (the multilayer perceptron)-a review of applications in the atmospheric sciences. Atmospheric environment, 32(14-15):2627-2636, 1998.
[32] Cancer Genome Atlas Research Network. Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas. New England Journal of Medicine, 372(26):2481-2498, 2015.
[33] Cancer Genome Atlas Research Network et al. Comprehensive genomic characterization denes human glioblastoma genes and core pathways. Nature, 455(7216):1061, 2008.
[34] Cameron W Brennan, Roel GW Verhaak, Aaron McKenna, Benito Campos, Houtan Noushmehr, Soe R Salama, Siyuan Zheng, Debyani Chakravarty, J Zachary Sanborn, Samuel H Berman, et al. The somatic genomic landscape of glioblastoma. Cell, 155(2):462-477, 2013.
[35] Wikipedia contributors. Oncogene | Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Oncogene&oldid=1026459943. [Online; accessed 20-June-2021].
[36] Broad Institute. Oncogene. https://www.broadinstitute.org/. [Online; accessed 20-June-2021].
[37] Georey M Cooper, Robert E Hausman, and Robert E Hausman. The cell: a molecular approach, volume 4. ASM press Washington, DC, 2007.
[38] Harvey Lodish and S Lawrence Zipursky. Molecular cell biology. Biochem Mol Biol Educ, 29:126-133, 2001.
[39] Catalogue of Somatic Mutations in Cancer. Cosmic - catalogue of somatic mutations in cancer. https://cancer.sanger.ac.uk/cosmic. [Online; accessed 20-June-2021].
[40] Xiao-Yang Liu, Noha Gerges, Andrey Korshunov, Nesrin Sabha, Dong-Anh Khuong-Quang, Adam M Fontebasso, Adam Fleming, Djihad Hadjadj, Jeremy Schwartzentruber, Jacek Majewski, et al. Frequent atrx mutations and loss of expression in adult diuse astrocytic tumors carrying idh1/idh2 and tp53 mutations. Acta neuropathologica, 124(5):615-625, 2012.
[41] David N Louis. The p53 gene and protein in human brain tumors. Journal of neuropathology and experimental neurology, 53(1):11-21, 1994.
[42] Fredrik Johansson Swartling. Identifying candidate genes involved in brain tumor formation. Upsala journal of medical sciences, 113(1):1-38, 2008.
[43] Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. String v11: protein{protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic acids research, 47(D1):D607-D613, 2019.