| 研究生: |
郭泓億 Kuo, Hong-Yi |
|---|---|
| 論文名稱: |
以綜合性計算方法藉由遺傳拷貝數變異探討癌症易患性 A Comprehensive Computational Method for Cancer Risk Predisposition with Germline Copy Number Variation in Population Scale |
| 指導教授: |
蔣榮先
Chiang, Jung-Hsien |
| 共同指導教授: |
楊士德
Yang, Hsih-Te |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 英文 |
| 論文頁數: | 36 |
| 中文關鍵詞: | 次世代定序 、全基因組定序 、癌症 、家族遺傳 、拷貝數變異 、序列變異分析 、癌症風險 |
| 外文關鍵詞: | NGS, WGS, family hereditary, CNV, sequence analysis, cancer predisposition |
| 相關次數: | 點閱:133 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在台灣,自民國71 年起,癌症高居10 大死因之首,隨著醫療技術的進步,人們試著從DNA了解癌症。現今已有了次世代定序的方法,能夠快速大量的取得人體全基因組的序列資料,序列上含有完整生物遺傳資訊,而癌症一般除了環境因子外被認為是DNA序列上變異的累積,因此本研究探討是否癌症病人與健康人在基因序列上有根本上的差異,進而得到了癌症,也探討癌症在家族中的遺傳機轉。 人類基因組由23對染色體組成,其中含有約30億個DNA鹼基對,以胸腺嘧啶(T)、腺嘌呤(A)、胞嘧啶(C)和鳥嘌呤(G)四種鹼基排列成序列。其中一部分的鹼基對組成了大約20000到25000個基因。大多數對於人類遺傳變異的研究集中在單一核苷酸多型性,也就是DNA中的個別鹼基變換。而相對於分析小範圍的變異,大範圍序列結構上的變異於近期愈受關注,拷貝數變異即為常見的結構變異,其影響的範圍動輒數百萬個鹼基,造成染色體大量的變異,不只是種遺傳累積的變異,也有多數研究指出它對於疾病的高度相關性。 本研究設計了一套拷貝數綜合分析系統(CNCAS)去找出關鍵拷貝數變異基因,首先利用次世代定序的方法來取得人類全基因組序列,其中包括癌症病人以及健康人,癌症類型包括大腸直腸癌、子宮內膜癌與卵巢癌,以台灣人為目標族群,接著從序列 IV 中分析出大範圍變異 -拷貝數變異,並進一步以基因為單位,去判斷每個基因受到變異影響的狀況,利用統計、演算等方法並結合多個拷貝數變異資料庫,去發掘出這兩個群體是否有其序列上的差異,並進一步找出其差異的基因,也針對臨床資料做探討,建立一完整全基因組序列分析平台,最後找出數個癌症高度風險相關基因,並發現癌症病人序列變異上的特定特徵,希望能將我們的平台應用到臨床上,試圖協助醫生做癌症病人的治療決策,降低罹患癌症與癌症惡化的風險。
Cancer ranks first in the top ten causes of death in Taiwan from 1982. With the advance in medical technology, people try to understand cancer from DNA. There is Next Generation Sequencing (NGS) technology which can help us quickly get a large amount of human Whole Genome Sequencing (WGS) data now. WGS contains complete biological genetic information. Except for environmental factors, cancer is generally considered as the accumulation of DNA sequence mutations. Hence this research is going to explore whether there exists any basic difference between cancerous patients and healthy people. We check whether these differences cause mutations to accumulate by age and get cancer. Moreover, we also explore the relation between cancer predisposition and family cancer history. Human genome which encoded as DNA within the 23 chromosome pairs contains about three billion DNA base pairs. There are four types of bases, Thymine (T), adenine (A), cytosine (C) and guanine (G). Part of these base pairs makes up about twenty to twenty-five thousand genes. Most of the research of human hereditary concentrated on single nucleotide polymorphisms (SNP), a variation in a single nucleotide in DNA. In VI contrast to small range mutation, large range structure variation gets more concern recently. Copy number variation (CNV) is a common type of structural variation and its affected scope is about thousands of thousands bp. CNV results in the large scale of variations in chromosomes. It is not only a type of genetic cumulative variation but also many research indicated that its high correlation with disease. This research designed a copy number comprehensive analysis system (CNCAS) to find out the crucial copy number variation genes. First, we used NGS method to get the human WGS of cancerous patients and healthy people and we targeted on Taiwan ethnicity. Cancer types include colorectal cancer, ovarian cancer, and endometrial cancer. Then, we analyzed on large range variations, CNV, from WGS and use gene as analysis unit to judge its mutation status. We used different methods and combined several CNV database to explore whether these two groups exist some difference in their sequence. We mapped the mutation to the gene and find out the winner genes. We also investigated on clinical data to establish a complete WGS analysis platform. Finally, we find several cancerous risky genes and some certain patterns on cancer sequence. We hope we can apply our platform to clinical therapy. It will try to assist doctor to do cancer patient treatment decision and reduce the threat of cancer.
References
[1] H. P. ADMINISTRATION, M. O. H. A. WELFARE, and TAIWAN, "CANCER REGISTRY ANNUAL REPORT, 2014
TAIWAN," 衛生 福 利 部 國 民 健 康 署2016.
[2] N. I. Weisenfeld et al., "Comprehensive variation discovery in single human genomes," Nature genetics, vol. 46, no. 12, pp. 1350-1355, 2014.
[3] C. Chiang et al., "SpeedSeq: ultra-fast personal genome analysis and interpretation," Nature methods, vol. 12, no. 10, pp. 966-968, 2015.
[4] M. Pirooznia, F. S. Goes, and P. P. Zandi, "Whole-genome CNV analysis: advances in computational approaches," Frontiers in genetics, vol. 6, 2015.
[5] V. Boeva et al., "Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization," Bioinformatics, vol. 27, no. 2, pp. 268-9, Jan 15 2011.
[6] V. Boeva et al., "Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data," Bioinformatics, vol. 28, no. 3, pp. 423-5, Feb 01 2012.
[7] C. Raczy et al., "Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms," Bioinformatics, vol. 29, no. 16, pp. 2041-2043, 2013.
[8] R. E. Handsaker, J. M. Korn, J. Nemesh, and S. A. McCarroll, "Discovery and genotyping of genome structural polymorphism by sequencing on a population scale," Nat Genet, vol. 43, no. 3, pp. 269-76, Mar 2011.
[9] R. E. Handsaker et al., "Large multiallelic copy number variations in humans," Nat Genet, vol. 47, no. 3, pp. 296-303, Mar 2015.
[10] C. L. Usher and S. A. McCarroll, "Complex and multi-allelic copy number variation in human disease," Briefings in functional genomics, vol. 14, no. 5, pp. 329-338, 2015.
[11] J. Guinney et al., "The consensus molecular subtypes of colorectal cancer," Nature medicine, 2015.
[12] L. Carter et al., "Molecular analysis of circulating tumor cells identifies distinct copy-number profiles in patients with chemosensitive and chemorefractory small-cell lung cancer," Nat Med, vol. 23, no. 1, pp. 114-119, Jan 2017.
[13] Y. Suehiro et al., "Germline copy number variations associated with
breast cancer susceptibility in a Japanese population," Tumour Biol, vol. 34, no. 2, pp. 947-52, Apr 2013.
[14] A. C. Krepischi et al., "Large germline copy number variations as predisposing factor in childhood neoplasms," Future Oncol, vol. 10, no. 9, pp. 1627-33, 2014.
[15] T. Furuya, Y. Suehiro, Y. Namiki, and K. Sasaki, "CNVs associated with susceptibility to cancers: a mini-review," Journal of Cancer Therapy, vol. 6, no. 05, p. 413, 2015.
[16] J. R. MacDonald, R. Ziman, R. K. Yuen, L. Feuk, and S. W. Scherer, "The Database of Genomic Variants: a curated collection of structural variation in the human genome," Nucleic acids research, vol. 42, no. D1, pp. D986-D992, 2014.
[17] M. J. Landrum et al., "ClinVar: public archive of relationships among sequence variation and human phenotype," Nucleic acids research, vol. 42, no. D1, pp. D980-D985, 2014.
[18] M. J. Landrum et al., "ClinVar: public archive of interpretations of clinically relevant variants," Nucleic acids research, vol. 44, no. D1, pp. D862-D868, 2016.
[19] D. Croft et al., "The Reactome pathway knowledgebase," Nucleic acids research, vol. 42, no. D1, pp. D472-D477, 2013.
[20] D. Croft et al., "Reactome: a database of reactions, pathways and biological processes," Nucleic acids research, vol. 39, no. suppl_1, pp. D691-D697, 2010.
[21] H. Li and R. Durbin, "Fast and accurate long-read alignment with Burrows–Wheeler transform," Bioinformatics, vol. 26, no. 5, pp. 589-595, 2010.
[22] H. Li and R. Durbin, "Fast and accurate short read alignment with Burrows–Wheeler transform," Bioinformatics, vol. 25, no. 14, pp. 1754-1760, 2009.
[23] G. A. Van der Auwera et al., "From FastQ data to high‐confidence variant calls: the genome analysis toolkit best practices pipeline," Current protocols in bioinformatics, pp. 11.10. 1-11.10. 33, 2013.
[24] A. McKenna et al., "The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data," Genome research, vol. 20, no. 9, pp. 1297-1303, 2010.
[25] B. S. Pedersen, I. V. Yang, and S. De, "CruzDB: software for annotation of genomic intervals with UCSC genome-browser database," Bioinformatics, vol. 29, no. 23, pp. 3003-6, Dec 01 2013.
[26] S. Wold, K. Esbensen, and P. Geladi, "Principal component analysis," Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, pp. 37-52, 1987.
[27] F. Pedregosa et al., "Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, no. Oct, pp. 2825-2830, 2011.
[28] D. Welter et al., "The NHGRI GWAS Catalog, a curated resource of SNP-trait associations," Nucleic acids research, vol. 42, no. D1, pp. D1001-D1006, 2013.
[29] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," ACM SIGKDD explorations newsletter, vol. 11, no. 1, pp. 10-18, 2009.
[30] W. Bodmer and I. Tomlinson, "Rare genetic variants and the risk of cancer," Current opinion in genetics & development, vol. 20, no. 3, pp. 262-267, 2010.
[31] J. T. Robinson et al., "Integrative genomics viewer," Nature biotechnology, vol. 29, no. 1, pp. 24-26, 2011.
[32] J. Zhang et al., "Germline mutations in predisposition genes in pediatric cancer," New England Journal of Medicine, vol. 373, no. 24, pp. 2336-2346, 2015.