| 研究生: |
張凱奇 Abdollahi, Sina |
|---|---|
| 論文名稱: |
基於深度學習模型分類癌症基因組學 Cancer Genomics Classification Using Deep Learning Models |
| 指導教授: |
蔣榮先
Chiang, Jung-Hsien |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 69 |
| 中文關鍵詞: | 基因致病性分數 、蛋白質和蛋白質間的交互作用 、突變特徵 、突變累積 、深度學習 、矩陣分解 |
| 外文關鍵詞: | Gene Pathogenicity Score, Protein-Protein Interactions, Mutational Patterns, Mutational Accumulation, Deep Learning, Matrix Factorization |
| 相關次數: | 點閱:254 下載:50 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
辨識癌症相關基因和蛋白質和蛋白質間的交互作用(PPI)是至關重要的。對各種癌症類型和疾病中的基因和PPI進行優先級排序,將有助於檢測疾病相關途徑和腫瘤發生途徑,幫助藥物和治療的發現,並更好地了解癌症生物學。基因組變異數據提供患者各種基因的編碼區和非編碼區中的變異和突變的信息。蛋白質的突變氨基酸序列能編碼蛋白質功能和結構的有用信息。
我們考慮了四種不同的方式來取得基因組變異數據中的特徵。變體和基因的致病性、PPI、突變特徵和突變累積,展現了用於識別癌症相關基因和相互作用的信息特徵。為了提取基因致病性分數,我們會需要處理致病性不確定的基因。為了應對這一挑戰,我們提出了兩種基於矩陣分解的方法來預測基因的基因致病性分數。我們還以四種不同的方式表現PPI,並基於這些表徵,設計了四種深度學習的架構。此外,我們分析了20種不同癌症類型的突變模式。我們透過基因致病性分數、五個不同的PPI表徵和突變特徵來預測疾病生物標記和癌症相關基因。另一方面,我們設計了一個突變累積評分系統,以鑑定體液相關腫瘤 (淋巴瘤 LYMPHOMA) 的基因。
Identification of cancer-associated genes and protein-protein interactions (PPIs) is of utmost importance. Prioritizing genes and PPIs in various cancer types and diseases would help detect disease-associated and tumorigenesis pathways, facilitate drug and treatment discovery, and find a better understanding of cancer biology. Genomic variant data provide information about mutations and variants in coding and non-coding regions of various genes of a patient. A mutated amino acid sequence of a protein encodes useful information about the function and structure of the protein.
We considered four different ways to extract the features from genomic variant data. The pathogenicity of variants and genes, PPIs, mutational patterns, and mutational accumulation reveal informative features that can be used to identify cancer-associated genes and interactions. To extract the pathogenicity scores, we faced with the genes that are recognized as uncertain significance. To solve this challenge, we proposed two collaborative filtering (CF)-based models to predict the malignancy level of the genes. We also represented PPIs in four different ways and based on these representations, we designed four deep learning-based architectures. Furthermore, we analyzed mutational pattern of 20 different cancer types. We predicted disease biomarkers and cancer-associated genes using pathogenicity scores, the five different PPI features, and mutational patterns.
On the other hand, we designed a mutational accumulation scoring system to identify tumors in effusions-associated genes.
[1] A. Telenti, C. Lippert, P. C. Chang, and M. DePristo, “Deep learning of genomic variation and regulatory network data,” Hum. Mol. Genet., vol. 27, no. R1, pp. R63–R71, 2018, doi: 10.1093/hmg/ddy115.
[2] F. B. Sheinerman, R. Norel, and B. Honig, “Electrostatic aspects of protein-protein interactions,” Current Opinion in Structural Biology, vol. 10, no. 2. Current Biology Ltd, pp. 153–159, Apr. 01, 2000, doi: 10.1016/S0959-440X(00)00065-8.
[3] A. A. Ivanov, F. R. Khuri, and H. Fu, “Targeting protein-protein interactions as an anticancer strategy,” Trends in Pharmacological Sciences, vol. 34, no. 7. Elsevier, pp. 393–400, Jul. 01, 2013, doi: 10.1016/j.tips.2013.04.007.
[4] Q. Li and K. Wang, “InterVar: Clinical Interpretation of Genetic Variants by the 2015 ACMG-AMP Guidelines,” Am. J. Hum. Genet., vol. 100, no. 2, pp. 267–280, Feb. 2017, doi: 10.1016/j.ajhg.2017.01.004.
[5] M. J. Landrum et al., “ClinVar: Public archive of relationships among sequence variation and human phenotype,” Nucleic Acids Res., vol. 42, no. D1, pp. D980–D985, Jan. 2014, doi: 10.1093/nar/gkt1113.
[6] P. Kumar, S. Henikoff, and P. C. Ng, “Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm,” Nat. Protoc., vol. 4, no. 7, pp. 1073–1081, Jul. 2009, doi: 10.1038/nprot.2009.86.
[7] P. Rentzsch, D. Witten, G. M. Cooper, J. Shendure, and M. Kircher, “CADD: Predicting the deleteriousness of variants throughout the human genome,” Nucleic Acids Res., vol. 47, no. D1, pp. D886–D894, Jan. 2019, doi: 10.1093/nar/gky1016.
[8] M. Kircher, D. M. Witten, P. Jain, B. J. O’Roak, G. M. Cooper, and J. Shendure, “A general framework for estimating the relative pathogenicity of human genetic variants,” Nat. Genet., vol. 46, no. 3, pp. 310–315, Mar. 2014, doi: 10.1038/ng.2892.
[9] R. L. S. Mesman et al., “The functional impact of variants of uncertain significance in BRCA2,” Genet. Med., vol. 21, no. 2, pp. 293–302, Feb. 2019, doi: 10.1038/s41436-018-0052-2.
[10] A. Oulas, G. Minadakis, M. Zachariou, and G. M. Spyrou, “Selecting variants of unknown significance through network-based gene-association significantly improves risk prediction for disease-control cohorts,” Sci. Rep., vol. 9, no. 1, pp. 1–15, Dec. 2019, doi: 10.1038/s41598-019-39796-w.
[11] J. S. Bennett et al., “Reclassification of Variants of Uncertain Significance in Children with Inherited Arrhythmia Syndromes is Predicted by Clinical Factors,” Pediatr. Cardiol., vol. 40, no. 8, pp. 1679–1687, Dec. 2019, doi: 10.1007/s00246-019-02203-2.
[12] S. Abdi and P. M. Dougherty, “Chemotherapy-Induced Peripheral Neuropathy: A Challenge for Clinicians.,” Oncology (Williston Park)., vol. 30, no. 11, pp. 1030, C3, 2016, Accessed: Jun. 03, 2019. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/27854105.
[13] H. W. Cheung et al., “Systematic investigation of genetic vulnerabilities across cancer cell lines reveals lineage-specific dependencies in ovarian cancer,” Proc. Natl. Acad. Sci. U. S. A., vol. 108, no. 30, pp. 12372–12377, Jul. 2011, doi: 10.1073/pnas.1109363108.
[14] S. Harati et al., “MEDICI: Mining Essentiality Data to Identify Critical Interactions for Cancer Drug Target Discovery and Development,” PLoS One, vol. 12, no. 1, p. e0170339, Jan. 2017, doi: 10.1371/journal.pone.0170339.
[15] A. A. Ivanov et al., “The OncoPPi Portal: An integrative resource to explore and prioritize protein-protein interactions for cancer target discovery,” Bioinformatics, vol. 34, no. 7, pp. 1183–1191, Apr. 2018, doi: 10.1093/bioinformatics/btx743.
[16] X. L. Mo et al., “Enabling systematic interrogation of protein-protein interactions in live cells with a versatile ultra-high-throughput biosensor platform,” J. Mol. Cell Biol., vol. 8, no. 3, pp. 271–281, Jun. 2016, doi: 10.1093/jmcb/mjv064.
[17] P. Guda, S. V. Chittur, and C. Guda, “Comparative Analysis of Protein-Protein Interactions in Cancer-Associated Genes,” Genomics, Proteomics Bioinforma., vol. 7, no. 1–2, pp. 25–36, Jun. 2009, doi: 10.1016/S1672-0229(08)60030-3.
[18] Z. Li et al., “The OncoPPi network of cancer-focused protein-protein interactions to inform biological insights and therapeutic strategies,” Nat. Commun., vol. 8, no. 1, pp. 1–14, Feb. 2017, doi: 10.1038/ncomms14356.
[19] Y. Sun et al., “Identification of 12 cancer types through genome deep learning,” Sci. Rep., vol. 9, no. 1, pp. 1–9, Dec. 2019, doi: 10.1038/s41598-019-53989-3.
[20] T. Bepler and B. Berger, “LEARNING PROTEIN SEQUENCE EMBEDDINGS USING INFORMATION FROM STRUCTURE.” Accessed: Oct. 24, 2020. [Online]. Available: https://github.com/tbepler/.
[21] B. Shin, S. Park, K. Kang, and J. C. Ho, “Self-Attention Based Molecule Representation for Predicting Drug-Target Interaction,” PMLR, Oct. 2019. Accessed: Oct. 24, 2020. [Online]. Available: https://mt-dti.deargendev.me/.
[22] P. Xiong, C. Zhang, W. Zheng, and Y. Zhang, “BindProfX: Assessing Mutation-Induced Binding Affinity Change by Protein Interface Profiles with Pseudo-Counts,” J. Mol. Biol., vol. 429, no. 3, pp. 426–434, Feb. 2017, doi: 10.1016/j.jmb.2016.11.022.
[23] S. Jemimah, M. Sekijima, and M. M. Gromiha, “ProAffiMuSeq: Sequence-based method to predict the binding free energy change of protein-protein complexes upon mutation using functional classification,” Bioinformatics, vol. 36, no. 6, pp. 1725–1730, Mar. 2020, doi: 10.1093/bioinformatics/btz829.
[24] K. A. Barlow et al., “Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity upon Mutation,” J. Phys. Chem. B, vol. 122, no. 21, pp. 5389–5399, May 2018, doi: 10.1021/acs.jpcb.7b11367.
[25] L. Zhang, C. Zhang, R. Gao, and R. Yang, “Incorporating g-gap dipeptide composition and position specific scoring matrix for identifying antioxidant proteins,” in Canadian Conference on Electrical and Computer Engineering, Jun. 2015, vol. 2015-June, no. June, pp. 31–36, doi: 10.1109/CCECE.2015.7129155.
[26] L. Qian, Y. Wen, and G. Han, “Identification of Cancerlectins Using Support Vector Machines With Fusion of G-Gap Dipeptide,” Front. Genet., vol. 11, p. 275, Apr. 2020, doi: 10.3389/fgene.2020.00275.
[27] L. B. Alexandrov, S. Nik-Zainal, D. C. Wedge, P. J. Campbell, and M. R. Stratton, “Deciphering Signatures of Mutational Processes Operative in Human Cancer,” Cell Rep., vol. 3, no. 1, pp. 246–259, Jan. 2013, doi: 10.1016/j.celrep.2012.12.008.
[28] B. J. Raphael, J. R. Dobson, L. Oesper, and F. Vandin, “Identifying driver mutations in sequenced cancer genomes: Computational approaches to enable precision medicine,” Genome Medicine, vol. 6, no. 1. BioMed Central, p. 5, Jan. 30, 2014, doi: 10.1186/gm524.
[29] F. S. Collins, M. S. Guyer, and A. Chakravarti, “Variations on a theme: Cataloging human DNA sequence variation,” Science, vol. 278, no. 5343. Science, pp. 1580–1581, Nov. 28, 1997, doi: 10.1126/science.278.5343.1580.
[30] L. B. Alexandrov et al., “The repertoire of mutational signatures in human cancer,” Nature, vol. 578, no. 7793, pp. 94–101, Feb. 2020, doi: 10.1038/s41586-020-1943-3.
[31] Y. Zhang, Y. Xiao, M. Yang, and J. Ma, “Cancer mutational signatures representation by large-scale context embedding,” Bioinformatics, vol. 36, no. 1, pp. i309–i316, Jul. 2020, doi: 10.1093/bioinformatics/btaa433.
[32] E. Letouzé et al., “Mutational signatures reveal the dynamic interplay of risk factors and cellular processes during liver tumorigenesis,” Nat. Commun., vol. 8, no. 1, Dec. 2017, doi: 10.1038/s41467-017-01358-x.
[33] Y. S. Ju, “The mutational signatures and molecular alterations of bladder cancer,” Transl. Cancer Res., vol. 6, no. 4, pp. S689–S701, 2017, doi: 10.21037/13776.
[34] S. A. Roberts et al., “An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers,” Nat. Genet., vol. 45, no. 9, pp. 970–976, Sep. 2013, doi: 10.1038/ng.2702.
[35] Y. P. Chen, H. Y. Huang, K. P. Lin, L. J. Medeiros, T. Y. Chen, and K. C. Chang, “Malignant effusions correlate with poorer prognosis in patients with diffuse large B-cell Lymphoma,” Am. J. Clin. Pathol., vol. 143, no. 5, pp. 707–715, May 2015, doi: 10.1309/AJCP6LXA2LKFZAMC.
[36] D. K. Das, “Serous effusions in malignant lymphomas: A review,” Diagn. Cytopathol., vol. 34, no. 5, pp. 335–347, May 2006, doi: 10.1002/dc.20432.
[37] L. C. Tong, H.-M. Ko, M. A. Saieg, S. Boerner, W. R. Geddie, and G. da Cunha Santos, “Subclassification of lymphoproliferative disorders in serous effusions,” Cancer Cytopathol., vol. 121, no. 5, pp. 261–270, May 2013, doi: 10.1002/cncy.21257.
[38] A. Mihaescu et al., “Application of molecular genetics to the diagnosis of lymphoid-rich effusions: Study of 95 cases with concomitant immunophenotyping,” Diagn. Cytopathol., vol. 27, no. 2, pp. 90–95, 2002, doi: 10.1002/dc.10150.
[39] M. Mendeville et al., “Aggressive genomic features in clinically indolent primary HHV8-negative effusion-based lymphoma,” Blood, vol. 133, no. 4. American Society of Hematology, pp. 377–380, Jan. 24, 2019, doi: 10.1182/blood-2017-12-822171.
[40] D. H. Hovelson et al., “Development and Validation of a Scalable Next-Generation Sequencing System for Assessing Relevant Somatic Variants in Solid Tumors,” Neoplasia, vol. 17, no. 4, pp. 385–399, Apr. 2015, doi: 10.1016/j.neo.2015.03.004.
[41] D. Szklarczyk et al., “STRING v11: Protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets,” Nucleic Acids Res., vol. 47, no. D1, pp. D607–D613, Jan. 2019, doi: 10.1093/nar/gky1131.
[42] C. Camacho et al., “BLAST+: Architecture and applications,” BMC Bioinformatics, vol. 10, no. 1, p. 421, Dec. 2009, doi: 10.1186/1471-2105-10-421.
[43] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl, “Item-based collaborative filtering recommendation algorithms,” in Proceedings of the tenth international conference on World Wide Web - WWW ’01, 2001, pp. 285–295, doi: 10.1145/371920.372071.
[44] J. Lee, M. Sun, and G. Lebanon, “A Comparative Study of Collaborative Filtering Algorithms,” May 2012, Accessed: May 28, 2019. [Online]. Available: http://arxiv.org/abs/1205.3193.
[45] A. Vaswani et al., “Attention Is All You Need.”
[46] Z. Chen et al., “IFeature: A Python package and web server for features extraction and selection from protein and peptide sequences,” Bioinformatics, vol. 34, no. 14, pp. 2499–2502, Jul. 2018, doi: 10.1093/bioinformatics/bty140.
[47] M. L. Zhang and Z. H. Zhou, “ML-KNN: A lazy learning approach to multi-label learning,” Pattern Recognit., vol. 40, no. 7, pp. 2038–2048, Jul. 2007, doi: 10.1016/j.patcog.2006.12.019.
[48] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown, “Learning multi-label scene classification,” Pattern Recognit., vol. 37, no. 9, pp. 1757–1771, Sep. 2004, doi: 10.1016/j.patcog.2004.03.009.
[49] C.-C. Acm Reference Format: Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol, vol. 2, p. 27, 2011, doi: 10.1145/1961189.1961199.
[50] E. Spyromitros, G. Tsoumakas, and I. Vlahavas, “An Empirical Study of Lazy Multilabel Classification Algorithms,” in Artificial Intelligence: Theories, Models and Applications, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 401–406.
[51] O. Luaces, J. Díez, J. Barranquero, J. J. del Coz, and A. Bahamonde, “Binary relevance efficacy for multilabel classification,” Prog. Artif. Intell., vol. 1, no. 4, pp. 303–313, Dec. 2012, doi: 10.1007/s13748-012-0030-x.
[52] A. Xavier, R. J. Scott, and B. A. Talseth-Palmer, “TAPES: A tool for assessment and prioritisation in exome studies,” PLOS Comput. Biol., vol. 15, no. 10, p. e1007453, Oct. 2019, doi: 10.1371/journal.pcbi.1007453.
[53] S. Petrovski, Q. Wang, E. L. Heinzen, A. S. Allen, and D. B. Goldstein, “Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes,” PLoS Genet., vol. 9, no. 8, p. e1003709, Aug. 2013, doi: 10.1371/journal.pgen.1003709.
[54] D. M. Sabatini, “mTOR and cancer: Insights into a complex relationship,” Nature Reviews Cancer, vol. 6, no. 9. Nature Publishing Group, pp. 729–734, Sep. 17, 2006, doi: 10.1038/nrc1974.
[55] H. Hua, Q. Kong, H. Zhang, J. Wang, T. Luo, and Y. Jiang, “Targeting mTOR for cancer therapy,” Journal of Hematology and Oncology, vol. 12, no. 1. BioMed Central Ltd., pp. 1–19, Jul. 05, 2019, doi: 10.1186/s13045-019-0754-1.