簡易檢索 / 詳目顯示

研究生: 呂強尼
Lu, Chiang-Ni
論文名稱: 次世代定序技術應用-遺傳疾病資訊檢索
Next Generation Sequencing for Genetic disorder information retrieval
指導教授: 黃吉川
HUANG, CHI-CHUAN
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 86
中文關鍵詞: 次世代定序遺傳疾病預防醫學
外文關鍵詞: Next-generation sequencing, genetic diseases, preventive medicine
相關次數: 點閱:148下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 中文摘要
    隨著全球生活環境改善與醫療科技的突飛猛進,人類生活品質因而普遍提升,平均壽命也因此延長,但對許多已開發或開發中國家而言,伴隨而來的是高齡化人口結構的形成,造成慢性病患者數目逐漸增加,成為國家醫療支出與社會福利負擔的一大隱憂。隨著基因體學(genomics)的興起,科學家開始探討「個人基因體學」(personal genomics)的基因、飲食與疾病間的相互關聯性,期望能解碼上述系統,達到生活管理與疾病預防,甚至預測之目的,如藉由個人基因檢測確認是否容易罹患代謝症候群(metabolic syndrome),並針對飲食調整達到預防效果。

    使用次世代定序技術可以清楚明確的檢視我們的生命遺傳密碼,再結合過去累積的醫療知識,若是能夠在生命尚未受到疾病的侵襲之前,就能對其由所防範甚至是事先處理這正是醫學上的終極目標。現代醫學中,基因檢查已經占據越來越重要的地位,近年來癌症治療也多有藉由多基因檢查來預測效果,由於次世代定序的發展,基因數目已從過去的單一基因到現在大量基因檢查。

    本文規劃了一套次世代定序資料的處理流程,並展現了公共資料庫在解讀基因體資料的可行性與方式,在分析疾病方面則是展現了單基因疾病、多基因疾病、粒線體變異疾病的檢索方式。

    關鍵字: 次世代定序、遺傳疾病、預防醫學

    Next Generation Sequencing for Genetic disorder information retrieval

    Chiang-Ni Lu
    Chi-Chuan Hwang
    Department of Engineering Science National Cheng Kung University

    SUMMARY
    With the improvement of the living environment and the technology medical science. Thus enhance the quality of our life in general, the average life expectancy is therefore extended. And for many developed or developing countries, accompanied by the formation of Population aging. Resulting in increasing number of patients with chronic, and become one of major concerns of healthy spending and social welfare burden.

    In modern medical science, Oncogene detection has occupied an increasingly important position. There are also many cancer treatment in recent years by multiple gene detection to predict the effect. Due to the development of next-generation sequencing, the number of genes from single gene to large number of gene detection.

    In this paper, we plan a set of data processing of next-generation sequencing data. And demonstrate the feasibility of a public database in the way of interpretation of genomics. In terms of the disease analysis is to show the retrieval methods of Single gene disorder, Multifactorial and polygenic (complex) disorders, Mitochondrial disease.

    Key words: Next-generation sequencing, genetic diseases, preventive medicine

    Introduction

    Genetic testing, also known as DNA testing, allows the genetic diagnosis of vulnerabilities to inherited diseases, and can also be used to determine a child's parentage (genetic mother and father) or in general a person's ancestry. In addition to studying chromosomes to the level of individual genes, genetic testing in a broader sense includes biochemical tests for the possible presence of genetic diseases, or mutant forms of genes associated with increased risk of developing genetic disorders. Genetic testing identifies changes in chromosomes, genes, or proteins. Most of the time, testing is used to find changes that are associated with inherited disorders. The results of a genetic test can confirm or rule out a suspected genetic condition or help determine a person's chance of developing or passing on a genetic disorder.

    In the past, Oncogene detection is mainly for genetic diseases of single gene. However, most of these diseases are rare diseases, and the probability of getting such disease is less than a million. Currently, the new mainstream of preventive Oncogene detection is aimed to check for the Multifactorial and polygenic (complex) disorders. The complex multi-gene is caused by multiple genetic abnormalities.

    The purpose of this study is based on preventive medicine. Through next-generation sequencing machine to detect Oncogene, and read the genetic code using Next-generation sequencing technology. Sequencing detection of Multiple-gene, which allows some of the symptoms of the disease complex judgments become more simple.

    In this paper, we plan a set of data processing of next-generation sequencing data. And demonstrate the feasibility of a public database in the way of interpretation of genomic . In terms of the disease analysis is to show the retrieval methods of Single gene disorder, Multifactorial and polygenic (complex) disorders, Mitochondrial disease.

    Materials and Methods

    This study has been cooperate with Genetech Biotech company (GTBio) which is the agents of Illumina in Taiwan. We obtain three whole-genome sequencing from GTBio, and the sequence data is a family with depth sequencing about 42x. We use the code NA12878 (female, mother), NA12891 (male, father), NA12892 (female, daughter) to show them.

    Using the database of OMIM, KEGG, GWAS, MalaCards & GeneCards to describe the characteristics of genetic diseases within the human genome.

    Results and Discussion

    In this paper, we plan a set of data processing of next-generation sequencing data. And demonstrate the feasibility of a public database in the way of interpretation of genomic. In terms of the disease analysis is to show the retrieval methods of Single gene disorder, Multifactorial and polygenic (complex) disorders, Mitochondrial disease.

    Currently, analysis of genetic variation in polygenic disease only reached a rough estimate of someone's risk of common diseases level. With the discovery of more variation, we will make some adjustment to improve the accuracy of detection.

    Most of various diseases is not result from a single factor. In addition to differences of gene expression of each person, the environmental factors, exposure to harmful substances, living and eating habits are likely to interact with gene expression, and changes susceptibility to the disease. Therefore, cumulate the haplotype data to create a Haplotype database of Chinese is the basic method to further illustrate the variation.

    Table 1 NA12891 point mutation in each of the functional distribution of the original
    SNVs Deletions Insertios
    Total Number 3495306 189164 189941
    Number in Genes 1585193 86955 86894
    Number in Exons 47933 1933 2079
    Number in Coding Regions 21604 221 249
    Splice Site Regions 3102 209 198
    Stop Gained 76 0 1
    Stop Lots 10 0 0
    Frameshift 0 95 111
    Non-synonymous 10217 126 138
    Synonymous 11301 0 0
    Mature miRNA 38 8 2
    UTR Region 26329 1712 1830

    Table 2 NA12891 distribution of structural variation
    SV Type Count % in Genes
    CNV 61 37.70%
    Tandem Duplication 142 32.39%
    Inversion 287 35.89%
    Deletion 54702 10.53%
    Insertion 16756 38.91%

    Conclusion
    1. We show the retrieval methods of Single gene disorder, Multifactorial and polygenic (complex) disorders, Mitochondrial disease by using the example of Huntington's disease, familial hypercholesterolemia, gastric cancer, Leber's optic atrophy.
    2. Using the next-generation sequencing to conduct large number of gene detection.
    3. We plan a set of data processing of next-generation sequencing data.

    目錄 中文摘要……………………………………………………………………………………1 Abstract……………………………………………………………………………………..2 致謝…………………………………………………………………………………………6 目錄…………………………………………………………………………………………7 表目錄……………………………………………………………………………………..10 圖目錄……………………………………………………………………………………..11 第一章 緒論……………………………………………………………………………..13 1.1 前言………………………………………………………………………..13 1.2 研究目的…………………………………………………………………..14 1.3 文獻回顧…………………………………………………………………..15 1.3.1人類基因組的解序………………………………………………….15 1.3.2單倍型圖譜計劃:尋找基因與疾病的關聯…………………...…….17 1.3.3千人基因組計劃:建立一張高精度遺傳變異圖……………….…...17 1.3.4 NGS應用在多基因疾病的檢測……………………………………18 1.3.5疾病風險預測所提供的資訊與意義…………………………….…20 第二章 研究材料與方法……………………………………………………………..…23 2.1.1本文研究方法與與全基因組分析流程……………………….........23 2.1.2 SNP晶片檢測方式……………………………………...………….24 2.1.3 單基因深度定序檢測方式…………………………………………25 2.2遺傳物質的變異與遺傳疾病………………………………………………26 2.2.1基因變異的種類與基因變異的起源…………………………….…26 2.2.2遺傳變異與疾病…………………………………………………….30 2.2.3癌症……………………………………………………………….…31 2.2.4遺傳性疾病……………………………………………………….…32 2.3 公共資料庫探勘與介紹…………………………………………….….....34 2.3.1 OMIM…………………………………………………………….….35 2.3.2 KEGG………………………………………………………………..36 2.3.3 GWAS…………………………………………………………….….37 2.3.4 MalaCards & GeneCards………………………………………….…38 2.3.5 公共資料庫的特點與差異…………………………………………39 2.4 基因組映射和變異檢測…………………………………………………..40 2.4.1定序實驗與定序平台…………………………….…………………42 2.4.2基於NGS的變異檢測方法…………………………………...........45 2.4.3 數據產出統計………………………………………………………50 2.4.4 一致性序列組裝…………………………………………………....52 2.4.5 SNP檢測及在基因組的分佈……………………………………….53 2.4.6 InDel檢測及在基因組的分佈…………………………………..….54 2.4.7結構變異檢測及在基因組中的分佈……………………………….55 2.4.8變異過濾與變異注釋……………………………………………….57 第三章 疾病檢測方式與討論 …………………………………………………….….62 3.1遺傳疾病的原因與分類……………………………………………………63 3.1.1單基因疾病………………………………………………………….64 3.1.2染色體疾病………………………………………………………….65 3.1.3多基因疾病………………………………………………………….66 3.1.4粒線體變異遺傳病……………………………………………….…67 3.2分析案例……………………………………………………………………68 3.2.1過程說明…………………………………………………………….68 3.2.2杭廷頓氏舞蹈症…………………………………………………….68 3.2.3家族性高膽固醇症……………………………………………….…69 3.2.4胃癌 ………………………………………………………………...72 3.2.5 Leber氏視神經病變………………………………………….……..77 第四章 結論與未來展望………………………………………………………………..78 4.1 結論………………………………………………………………………...78 4.2未來展望………………………………………………………………..…..78 參考文獻………………………………………………………………………………..…80

    參考文獻
    [1] Abyzov, A., et al. "CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing." Genome Res 21(6): 974-984. (2011).

    [2] Alkan, C., et al. "Genome structural variation discovery and genotyping." Nat Rev Genet 12(5): 363-376. (2011).

    [3] Amir, E., et al. "Assessing women at high risk of breast cancer: a review of risk assessment models." J Natl Cancer Inst 102(10): 680-691. (2010).

    [4] Berliner, J. L., et al. "NSGC practice guideline: risk assessment and genetic counseling for hereditary breast and ovarian cancer." J Genet Couns 22(2): 155-163. (2013).

    [5] Bertram, L. and R. E. Tanzi. "Genome-wide association studies in Alzheimer's disease." Hum Mol Genet 18(R2): R137-145. (2009).

    [6] Bloss, C. S., et al. "Consumer perceptions of direct-to-consumer personalized genomic risk assessments." Genet Med 12(9): 556-566. (2010).

    [7] Borte, S., et al. "Neonatal screening for severe primary immunodeficiency diseases using high-throughput triplex real-time PCR." Blood 119(11): 2552-2555. (2012)

    [8] Boycott, K. M., et al. "Rare-disease genetics in the era of next-generation sequencing: discovery to translation." Nat Rev Genet 14(10): 681-691. (2013).

    [9] Bras, J., et al. "Use of next-generation sequencing and other whole-genome strategies to dissect neurological disease." Nat Rev Neurosci 13(7): 453-464. (2012).

    [10] Chatterjee, N., et al. "Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies." Nat Genet 45(4): 400-405, 405e401-403. (2013).

    [11] Chen, K., et al. "BreakDancer: an algorithm for high-resolution mapping of genomic structural variation." Nat Methods 6(9): 677-681. (2009).

    [12] Conrad, D. F., et al. "Origins and functional impact of copy number variation in the human genome." Nature 464(7289): 704-712. (2010).

    [13] Dames, S., et al. "The development of next-generation sequencing assays for the mitochondrial genome and 108 nuclear genes associated with mitochondrial disorders." J Mol Diagn 15(4): 526-534. (2013).

    [14] Danaei, G., et al. "Causes of cancer in the world: comparative risk assessment of nine behavioural and environmental risk factors." The Lancet 366(9499): 1784-1793. (2005)

    [15] D'Antonio, M. and F. D. Ciccarelli. "Integrated analysis of recurrent properties of cancer genes to identify novel drivers." Genome Biol 14(5): R52. (2013).

    [16] Doris, P. A. "Hypertension Genetics, Single Nucleotide Polymorphisms, and the Common Disease:Common Variant Hypothesis." Hypertension 39(2): 323-331. (2002).

    [17] Evangelou, E. and J. P. Ioannidis. "Meta-analysis methods for genome-wide association studies and beyond." Nat Rev Genet 14(6): 379-389. (2013).

    [18] Gabriel, S. B., et al. "The structure of haplotype blocks in the human genome." Science 296(5576): 2225-2229. (2002).

    [19] Genomes Project, C., et al. "A map of human genome variation from population-scale sequencing." Nature 467(7319): 1061-1073. (2010).

    [20] Goecks, J., et al.. "Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences." Genome Biol 11(8): R86. (2010)

    [21] Hormozdiari, F., et al. "Simultaneous structural variation discovery among multiple paired-end sequenced genomes." Genome Res 21(12): 2203-2212. (2011).

    [22] Johnson, A. D. and C. J. O'Donnell. "An open access database of genome-wide association results." BMC Med Genet 10: 6. (2009).

    [23] Jostins, L. and J. C. Barrett."Genetic risk prediction in complex disease." Hum Mol Genet 20(R2): R182-188. (2011).

    [24] Jun, G., et al. "Comprehensive search for Alzheimer disease susceptibility loci in the APOE region." Arch Neurol 69(10): 1270-1279. (2012).

    [25] Kamboh, M. I., et al. "Genome-wide association study of Alzheimer's disease." Transl Psychiatry 2: e117. (2012).

    [26] Kuntzer, J., et al. "The Roche Cancer Genome Database 2.0." BMC Med Genomics 4: 43. (2011).

    [27] Lam, H. Y., et al. "Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library." Nat Biotechnol 28(1): 47-55. (2010).

    [28] Lam, H. Y., et al. "Detecting and annotating genetic variations using the HugeSeq pipeline." Nat Biotechnol 30(3): 226-229. (2012).

    [29] Li, H. and R. Durbin. "Fast and accurate long-read alignment with Burrows-Wheeler transform." Bioinformatics 26(5): 589-595. (2010).

    [30] Loman, N. J., et al. "Performance comparison of benchtop high-throughput sequencing platforms." Nat Biotechnol 30(5): 434-439. (2012).

    [31] Mardis, E. R. "Next-generation DNA sequencing methods." Annu Rev Genomics Hum Genet 9: 387-402. (2008).

    [32] McKenna, A., et al. "The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data." Genome Res 20(9): 1297-1303. (2010).

    [33] Medvedev, P., et al. "Computational methods for discovering structural variation with next-generation sequencing." Nat Methods 6(11 Suppl): S13-20. (2009).

    [34] Mills, R. E., et al. "Mapping copy number variation by population-scale genome sequencing." Nature 470(7332): 59-65. (2011).

    [35] Mills, R. E., et al. "An initial map of insertion and deletion (INDEL) variation in the human genome." Genome Res 16(9): 1182-1190. (2006)

    [36] Narcisa, V., et al. "Parental interest in a genetic risk assessment test for autism spectrum disorders." Clin Pediatr (Phila) 52(2): 139-146. (2013).

    [37] Palomaki, G. E., et al. "DNA sequencing of maternal plasma to detect Down syndrome: an international clinical validation study." Genet Med 13(11): 913-920. (2011)

    [38] Papanicolaou, A. and D. G. Heckel. "The GMOD Drupal bioinformatic server framework." Bioinformatics 26(24): 3119-3124. (2010).

    [39] Porter, M. S. and R. G. Beiko "SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles." Bioinformatics 29(15): 1858-1864. (2013).

    [40] Peprah, E. "Fragile X syndrome: the FMR1 CGG repeat distribution among world populations." Ann Hum Genet 76(2): 178-191. (2012)

    [41] Read, Andrew J.; Strachan, Thomas. Human molecular genetics 3. New York: Garland Science. ISBN 0-8153-4184-9. (2004)

    [42] Rehm, H. L. "Disease-targeted sequencing: a cornerstone in the clinic." Nat Rev Genet 14(4): 295-300. (2013).

    [43] Rehm, H. L., et al. "ACMG clinical laboratory standards for next-generation sequencing." Genet Med 15(9): 733-747. (2013).

    [44] Roses, A. D., et al. "A TOMM40 variable-length polymorphism predicts the age of late-onset Alzheimer's disease." Pharmacogenomics J 10(5): 375-384. (2010).

    [45] Rushton, A. R. "Leopold: the "bleeder prince" and public knowledge about hemophilia in Victorian Britain." J Hist Med Allied Sci 67(3): 457-490 (2012)

    [46] Venter, J. C., et al. "The sequence of the human genome." Science 291(5507): 1304-1351. (2001).

    [47] Walsh, T., et al. "Detection of inherited mutations for breast and ovarian cancer using genomic capture and massively parallel sequencing." Proc Natl Acad Sci U S A 107(28): 12629-12633. (2010).
    [48] Wray, N. R., et al. "Prediction of individual genetic risk to disease from genome-wide association studies." Genome Res 17(10): 1520-1528. (2007).

    [49] Wu, C., et al. "Genome-wide association study identifies five loci associated with susceptibility to pancreatic cancer in Chinese populations." Nat Genet 44(1): 62-66.(2012).

    [50] Xie, C. and M. T. Tammi "CNV-seq, a new method to detect copy number variation using high-throughput sequencing." BMC Bioinformatics 10: 80. (2009).

    [51] Xu, F., et al. "Impact of Next-Generation Sequencing (NGS) technology on cardiovascular disease research." Cardiovasc Diagn Ther 2(2): 138-146. (2012).

    [52] Yan, W.-L. "Genome-wide association study on complex diseases: genetic statis-tical issues." Hereditas (Beijing) 30(5): 543-549. (2008).

    [53] Ye, K., et al. "Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads." Bioinformatics 25(21): 2865-2871. (2009).

    下載圖示 校內:2017-02-14公開
    校外:2017-02-14公開
    QR CODE