簡易檢索 / 詳目顯示

研究生: 李芷榕
Lee, Chih-Jung
論文名稱: 利用遺傳基因結構變異與免疫基因表現量預測癌症病人之預後
Using Deletion Structural Variants and Immune Response Gene Expression Profiles to Predict Clinical Outcome of Cancer Patients
指導教授: 蔣榮先
Chiang, Jung-Hsien
共同指導教授: 林鵬展
Lin, Peng-Chan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 59
中文關鍵詞: 次世代定序全基因組定序RNA測序遺傳基因結構變異免疫基因表現量癌症易患性機器學習存活分析
外文關鍵詞: NGS, WGS, RNA-seq, structural deletions variants, immune gene expression, cancer susceptibility, machine learning, survival analysis
相關次數: 點閱:235下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 癌症一直以來是居高不下的死因之一,有許多研究證實癌症與遺傳基因的結構變異有著密不可分的關係。近年來有了次世代定序的技術,可以快速地取得人類全基因組序列的資料,使得分析遺傳基因結構變異的研究便利了許多,也有很多研究都指出遺傳結構變異是癌症的成因之一,我們可以透過遺傳基因的結構變異來探討癌症的易患性,進而了解在基因型發生結構變異時使得表型有異常,造成腫瘤內的免疫基因有異常的表現量,透過異常的表現量來分析遺傳變異與免疫基因表現量之間的關係。本研究將探討結構變異與癌症的關聯性,並分析癌症病人與非癌症樣本之間的差異性,整合分析遺傳基因結構變異和RNA測序資料,分析癌症病人發生結構變異的基因在轉錄後產物表現量的變化,並探討異常的基因表現量與癌症臨床預後之間的關聯性。
    藉由分析台灣健康人與癌症病人的人類全基因組結構變異定序資料,找到區分癌症與非癌症樣本的結構變異。本系統可以分成三個部分,第一個部分是結構變異的檢測。我們利用工具PopDel來共同檢測序列資料中的結構變異並過濾掉覆蓋率較低的結構變異。第二個部分是結構變異之挑選,首先我們利用機器學習的方法來挑選與癌症相關的結構變異,並且結合RNA測序資料,挑選出與腫瘤的免疫基因表現量相關的結構變異。最後一個部分是預後分析,結合病人臨床資料來分析結構變異發生在不同基因上所造成的預後情況,並預測癌症的易患性及復發的可能性,探討預後不同的病人之間結構變異的差異,以及癌症病人基因型與表型之間的關係。
    在實驗中,本研究利用成大醫院192癌症病人及臺灣人體生物資料庫499非癌症病人的全基因組序列資料,其中包括120位大腸癌、35位卵巢癌、29位子宮內膜癌、8位乳癌的病人。我們挑選與癌症相關的結構變異並證實這些變異可以用來區分癌症病人與非癌症樣本,接著挑選了與免疫基因表現量相關的結構變異。最後我們挑選出65個候選結構變異,並利用Cox比例風險模式來選出在模型中統計顯著的預後相關的結構變異,這些結構變異都顯示與腫瘤標記基因有相關性。其中有五個結構變異在癌症病人之中有較好的預後結果,有三個結構變異會對於癌症病人有較差的預後。

    Cancer ranks in the top of the cause of death in Taiwan for several years. The cause of cancer is complicated, and cancer susceptibility can be explained by genetics, lifestyle and environmental component. There are some genetic variants are related to cancer susceptibility. In recent years, the Next Generation Sequencing (NGS) technique is flourished that can help us rapidly and accurately getting the human Whole Genome Sequence (WGS) data. Structural deletion variants are structural DNA variations that affect phenotypes via losing biology mechanism functions. Therefore, we can explore the cause of cancer from human genome and the difference between cancer patients and non-cancer people. In our research, structural deletion variants are associated with cancer susceptibility and which can be used to distinguish between cancer patients and non-cancer people.
    We design a system aim to find out the germline structural deletions which can be used to distinguish between cancer or non-cancer samples and have the association with prognosis. The system can be divided into three parts. First of all, in structural deletions detection part, germline whole genome sequence data of cancer patients and non-cancer samples are the input to PopDel to detect the structural deletions from germline DNA sequence data. After filtering structural deletions, we get the higher coverage deletions. Secondly, in structural deletion selection part, we utilize machine learning approach – attention weighted model to select cancer associated deletions. Combining the gene expression profile from cancer tissue RNA-seq data and patient’s clinical information in order that find the immune and prognosis associated deletions. Lastly, in the prognosis part, we apply survival-SVM to select the candidate deletions which are associated with recurrence and select the significant prognostic deletions from survival analysis.
    We conduct our experiment in whole genome sequence data from NCKUH 192 cancer patients and Taiwan Biobank 499 non-cancer samples. There are four different types of cancer in 192 cancer patients, 8 breast cancer, 120 colorectal cancer, 29 endometrial cancer and 35 ovarian cancer. Detect 14,772 deletions from PopDel and remain 2,919 after filtering the fewer coverage deletions. Selecting 671 cancer associated deletions which are the highest weight from attention weighted model, and we show that cancer and non-cancer sample can be separated. Then we choose 160 immune associated deletions from immune correlation model, representing that there are some structural deletions relate to immune gene expression. At last, we pick 65 candidate deletions which are correlated to recurrence. We use Cox’s proportional model to select 8 prognostic deletions. Prognostic deletions are correlated to some tumor maker genes in cancer tissue. Among them, there are 5 deletions are associated to better prognosis and another 3 deletions are associated with poor prognosis.

    中文摘要 I Abstract III 誌謝 V Contents VI List of Tables VIII List of Figures IX Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Research Objectives 3 1.4 Thesis Organization 4 Chapter 2 Related Work 5 2.1 Germline Structural Variations Associated with Cancer Risk 5 2.2 Immune Response Correlation with Germline Deletion 6 2.3 Machine Learning in Bioinformatics Data 7 2.4 Machine Learning for Survival Analysis 8 Chapter 3 Structural Deletion Detection and Selection 10 3.1 Overview 10 3.2 Structural Deletion Detection 11 3.2.1 PopDel - Population-wide Deletion Calling 11 3.2.2 Deletion Filtering 13 3.3 Structural Deletion Selection 14 3.3.1 Cancer Associated Deletion Selection Model 14 3.3.2 Immune Expression Correlation Model 16 3.3.2.1 Gene Expression Value Normalization 17 3.3.2.2 Deletion and Immune Expression Correlation 18 3.3.3 Candidate Deletions Selection Model 19 Chapter 4 Prognosis Analysis 22 4.1 Candidate Deletions Clustering 22 4.2 Prognostic Deletions Selection 23 4.3 Family Cancer History Discussion 24 Chapter 5 Experiments 26 5.1 Experimental Design 26 5.2 Structural Deletion Detection 27 5.2.1 Data Description 27 5.2.2 Filtering 29 5.3 Structural Deletion Selection 30 5.3.1 Cancer Associated Deletion Selection Model 30 5.3.1.1 Attention Weighted Model Training 31 5.3.1.2 Evaluation 32 5.3.2 Immune Expression Correlation Model 34 5.3.3 Candidate Deletions Selection Model 35 5.4 Prognosis Analysis 36 5.4.1 Candidate Deletions 36 5.4.2 Prognostic Deletions 38 5.4.3 Relationship Between Deletion and Immune Gene Expression 40 Chapter 6 Discussion 45 6.1 Rare Genes Analysis 45 6.2 Family Cancer History Analysis 46 6.3 Pathway Enrichment of Cancer Associated Genes 49 Chapter 7 Conclusions and Future Works 51 7.1 Conclusions 51 7.2 Future Works 52 Reference 54

    [1] “106年國人死因統計結果,” 中華民國衛生福利部 Ministry of Health and Welfare, 2017. [Online]. Available: https://www.mohw.gov.tw/cp-16-41794-1.html.
    [2] “The Genetics of Cancer,” Cancer.Net. [Online]. Available: https://www.cancer.net/navigating-cancer-care/cancer-basics/genetics/genetics-cancer.
    [3] T.Rausch, T.Zichner, A.Schlattl, A. M.Stütz, V.Benes, andJ. O.Korbel, “DELLY: Structural variant discovery by integrated paired-end and split-read analysis,” Bioinformatics, 2012.
    [4] X.Chen et al., “Manta: Rapid detection of structural variants and indels for germline and cancer sequencing applications,” Bioinformatics, 2016.
    [5] KehrLab, “PopDel - Population-wide Deletion Calling,” https://github.com/kehrlab/PopDel, 2018. .
    [6] P. H.Sudmant et al., “An integrated map of structural variation in 2,504 human genomes,” Nature, 2015.
    [7] T.Walsh andM. C.King, “Ten Genes for Inherited Breast Cancer,” Cancer Cell. 2007.
    [8] D.Welter et al., “The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.,” Nucleic Acids Res., 2014.
    [9] Z. K.Stadler et al., “Genome-wide association studies of cancer,” Journal of Clinical Oncology. 2010.
    [10] R. A.Hubner andR. S.Houlston, “Single nucleotide polymorphisms and cancer susceptibility,” in The Molecular Basis of Human Cancer, 2016.
    [11] D. F.Easton et al., “Genome-wide association study identifies novel breast cancer susceptibility loci,” Nature, 2007.
    [12] L.Fachal andA. M.Dunning, “From candidate gene studies to GWAS and post-GWAS analyses in breast cancer,” Current Opinion in Genetics and Development. 2015.
    [13] K.Michailidou et al., “Large-scale genotyping identifies 41 new loci associated with breast cancer risk,” Nat. Genet., 2013.
    [14] C.Alkan, B. P.Coe, andE. E.Eichler, “Genome structural variation discovery and genotyping,” Nature Reviews Genetics. 2011.
    [15] J. L.Freeman et al., “Copy number variation: New insights in genome diversity,” Genome Research. 2006.
    [16] L.Feuk, A. R.Carson, andS. W.Scherer, “Structural variation in the human genome.,” Nat. Rev. Genet., 2006.
    [17] D. L.Bodian et al., “Germline variation in cancer-susceptibility genes in a healthy, ancestrally diverse cohort: Implications for individual genome sequencing,” PLoS One, 2014.
    [18] M.Kumaran et al., “Germline copy number variations are associated with breast cancer risk and prognosis,” Sci. Rep., 2017.
    [19] L.Feuk, A. R.Carson, andS. W.Scherer, “Structural variation in the human genome,” Nature Reviews Genetics. 2006.
    [20] T.Walsh et al., “Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia,” Science (80-. )., vol. 320, no. 5875, pp. 539–543, 2008.
    [21] Y.Zeng, G.Wang, E.Yang, G.Ji, C. L.Brinkmeyer-Langford, andJ. J.Cai, “Aberrant Gene Expression in Humans,” PLoS Genet., 2015.
    [22] A. S.Baldwin, “Series Introduction: The transcription factor NF-κB and human disease,” J. Clin. Invest., 2008.
    [23] T. I.Lee andR. A.Young, “Transcriptional regulation and its misregulation in disease,” Cell. 2013.
    [24] T.Libermann andL.Zerbini, “Targeting Transcription Factors for Cancer Gene Therapy,” Curr. Gene Ther., 2006.
    [25] S. B.Baylin, “Aberrant patterns of DNA methylation, chromatin formation and gene expression in cancer,” Hum. Mol. Genet., 2001.
    [26] S. B.Baylin andJ. G.Herman, “DNA hypermethylation in tumorigenesis: Epigenetics joins genetics,” Trends in Genetics. 2000.
    [27] V.Thorsson et al., “The Immune Landscape of Cancer,” Immunity, 2018.
    [28] C. S. & J. M. S.The Cancer Genome Atlas Research Network, John N Weinstein, Eric A Collisson, Gordon B Mills, Kenna R Mills Shaw, Brad A Ozenberger, Kyle Ellrott, Ilya Shmulevich, “The Cancer Genome Atlas Pan-cancer analysis project,” Nat. Genet., vol. 18, no. 4, pp. 219–223, 2013.
    [29] M. W.Libbrecht andW. S.Noble, “Machine learning applications in genetics and genomics.,” Nat. Rev. Genet., 2015.
    [30] P.Larranaga, “Machine learning in bioinformatics,” Brief. Bioinform., 2006.
    [31] D.Ruppert, “The Elements of Statistical Learning: Data Mining, Inference, and Prediction,” J. Am. Stat. Assoc., 2004.
    [32] A.Cutler, D. R.Cutler, andJ. R.Stevens, “Random forests,” in Ensemble Machine Learning: Methods and Applications, 2012.
    [33] C.Cortes andV.Vapnik, “Support-Vector Networks,” Mach. Learn., 1995.
    [34] K. P.Soh, E.Szczurek, T.Sakoparnig, andN.Beerenwinkel, “Predicting cancer type from tumour DNA signatures,” Genome Med., 2017.
    [35] G. P.Way et al., “Machine Learning Detects Pan-cancer Ras Pathway Activation in The Cancer Genome Atlas,” Cell Rep., 2018.
    [36] C.Kartsonaki, “Survival analysis,” Diagnostic Histopathology, vol. 22, no. 7. Elsevier Ltd, pp. 263–270, 01-Jul-2016.
    [37] E. L.Kaplan andP.Meier, “Nonparametric Estimation from Incomplete Observations,” J. Am. Stat. Assoc., 1958.
    [38] V.Bewick, L.Cheek, andJ.Ball, “Statistics review 12: Survival analysis,” Critical Care. 2004.
    [39] P. D.Allison, Survival Analysis Using SAS: A Practical Guide. 2010.
    [40] P.Wang, Y.Li, andC. K.Reddy, “Machine Learning for Survival Analysis: A Survey,” vol. X, no. X, pp. 1–39, 2017.
    [41] L.Gordon andR. A.Olshen, “Tree-structured survival analysis.,” Cancer Treat. Rep., 1985.
    [42] S.Pölsterl, N.Navab, andA.Katouzian, “Fast training of support vector machines for survival analysis,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2015.
    [43] C. H.Franklin andG.King, Unifying Political Methodology: The Likelihood Theory of Statistical Inference, vol. 84, no. 4. 2006.
    [44] J.Gillespie, “Population Genetics: A Concise Guide.,” Biometrics, vol. 54, no. 4, p. 1683, Jul.2006.
    [45] D.Kornbrot, “Point Biserial Correlation,” in Wiley StatsRef: Statistics Reference Online, John Wiley & Sons, Ltd, 2014.
    [46] M.Raymond andF.Rousset, “An Exact Test for Population Differentiation,” Evolution (N. Y)., 2006.
    [47] “Taiwan Biobank.” [Online]. Available: https://www.twbiobank.org.tw/new_web/index.php.
    [48] V.Geoffroy et al., “AnnotSV: An integrated tool for structural variations annotation,” Bioinformatics, 2018.
    [49] M. A.Hollingsworth andB. J.Swanson, “Mucins in cancer: Protection and control of the cell surface,” Nature Reviews Cancer. 2004.
    [50] P.Chaturvedi, A. P.Singh, andS. K.Batra, “Structure, evolution, and biology of the MUC4 mucin,” FASEB J., 2008.
    [51] A.Franceschini et al., “STRING v9.1: Protein-protein interaction networks, with increased coverage and integration,” Nucleic Acids Res., 2013.
    [52] B.Mytar et al., “Characterization of human gastric adenocarcinoma cell lines established from peritoneal ascites,” Oncol. Lett., 2018.
    [53] J. L.Weon andP. R.Potts, “The MAGE protein family and cancer,” Current Opinion in Cell Biology. 2015.
    [54] “MAGEA1 MAGE family member A1 [ Homo sapiens (human) ],” NCBI. [Online]. Available: https://www.ncbi.nlm.nih.gov/gene/4100.
    [55] X.Men et al., “Transcriptome profiling identified differentially expressed genes and pathways associated with tamoxifen resistance in human breast cancer,” Oncotarget, 2018.
    [56] S. A.Selamat et al., “DNA Methylation changes in atypical adenomatous hyperplasia, adenocarcinoma in situ, and lung adenocarcinoma,” PLoS One, 2011.
    [57] “THE HUMAN PROTEIN ATLAS.” [Online]. Available: https://www.proteinatlas.org/ENSG00000019991-HGF/pathology.
    [58] H. Y.Chen et al., “Effects of HGF gene polymorphisms and protein expression on transhepatic arterial chemotherapeutic embolism efficacy and prognosis in patients with primary liver cancer,” Onco. Targets. Ther., 2017.
    [59] “TNFSF4 TNF superfamily member 4 [ Homo sapiens (human) ],” NCBI. [Online]. Available: https://www.ncbi.nlm.nih.gov/gene/7292.
    [60] A. M.Donson et al., “Immune Gene and Cell Enrichment Is Associated with a Good Prognosis in Ependymoma,” J. Immunol., 2009.
    [61] L. M.Wei, S.Cao, W. D.Yu, Y. L.Liu, andJ. T.Wang, “Overexpression of CX3CR1 is associated with cellular metastasis, proliferation and survival in gastric cancer,” Oncol. Rep., 2015.
    [62] Q.-Z.Liu et al., “Expression of ITGB1 predicts prognosis in colorectal cancer: a large prospective study based on tissue microarray.,” Int. J. Clin. Exp. Pathol., 2015.
    [63] Q.Sun et al., “Prognostic value of increased integrin-beta 1 expression in solid cancers: A meta-analysis,” OncoTargets and Therapy. 2018.
    [64] N.Rahman, “Realizing the promise of cancer predisposition genes,” Nature. 2014.
    [65] P. D.Thomas et al., “PANTHER: a library of protein families and subfamilies indexed by function.,” Genome Res., 2003.
    [66] M.Ashburner et al., “Gene ontology: Tool for the unification of biology,” Nature Genetics. 2000.
    [67] A.Vaswani et al., “Attention Is All You Need,” no. Nips, 2017.

    無法下載圖示 校內:2021-08-30公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE