| 研究生: |
藍國倫 Lan, Kuo-Lun |
|---|---|
| 論文名稱: |
以圖形處理器為基礎從次世代基因定序資料中偵測病毒基因體缺失之方法 GPU-based Identification of Viral Genomic Deletions from Next-Generation Sequencing Data |
| 指導教授: |
曾新穆
Tseng, Shin-Mu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 中文 |
| 論文頁數: | 49 |
| 中文關鍵詞: | 基因缺陷偵測 、圖形處理器 、CUDA 、結構變異 、次世代基因定序 |
| 外文關鍵詞: | gene deletion identification, GPU: Graphics Processing Units, CUDA: Compute Unified Device Architecture, SVs: structural variations, NGS: Next Generation Sequencing |
| 相關次數: | 點閱:94 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
感染B型肝炎與患有肝癌人數的成長有很大的關係,特別是在基因體中特定的區域發生基因缺陷,會使得肝癌的發生率提高。在過去的十年間,有越來越多的次世代基因序列定序軟體被設計出來,目的是要偵測出基因中的結構變異像是基因缺陷、嵌入與融合之結構變異。然而,這些軟體所找出的結構變異是針對人類基因做為探討,而非針對病毒。因此,我們提出了一個以圖形處理器為基礎的方法,用來找出病毒基因的缺陷片段及位置。我們使用了CUDA這個架構去平行化序列型態增長的過程以找出基因中的缺陷片段。我們使用了模擬資料及以及真實病患的資料來評估我們提出的方法,這些資料分別為: i) 利用wgsim這個次世代定序軟體產生了五組模擬的資料集,ii) B型肝炎病患的資料。經過實驗測試,我們證明了我們的方法在有效率地找出病毒中的基因缺陷是非常可靠的。我們提出的方法不僅比其他工具快速,在準確度上也是非常的精確。因此,在針對偵測病毒基因中的缺陷片段,我們提出的方法具有高度的效益。
Hepatitis B viral infection is strongly associated with the development of hepatocellular carcinoma (HCC). Deletions occurred on a specific genome are highly associated with the development of HCC. In the past decade, more and more next generation sequencing (NGS) tools are designed for finding structural variations such as insertion, deletion and fusion. However, the structural variations of these tools are focused on human DNA, not for virus. In this work, we proposed a graphics processing unit (GPU) based deletion finding tool to identify the deletions of viral genomic DNA. We use compute unified device architecture (CUDA) to parallelize the computational procedure of pattern growth approach to find deletions. Our approach is evaluated with both real and synthetic data including: i) five synthetic HBV datasets generated by wgsim and ii) a real NGS data from a patient infected with HBV. Based on a reciprocal test, we prove that our approach is reliable to efficiently identify the deletions of viral genomic DNA. Our approach not only is much faster than other tools but also achieves a high accuracy of the deletion positions. Therefore, our approach performs well for the purpose of finding deletions with virus data.
[1] M. J. Alter, “Epidemiology of hepatitis B in Europe and worldwide,” J Hepatol, vol. 39 Suppl 1, pp. S64-9, 2003.
[2] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Res, vol. 25, no. 17, pp. 3389-402, Sep 1, 1997.
[3] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J Mol Biol, vol. 215, no. 3, pp. 403-10, Oct 5, 1990.
[4] S. Bao, R. Jiang, W. Kwan, B. Wang, X. Ma, and Y. Q. Song, “Evaluation of next-generation sequencing software in mapping and assembly,” J Hum Genet, vol. 56, no. 6, pp. 406-14, Jun, 2011.
[5] J. M. Bartlett, and D. Stirling, “A short history of the polymerase chain reaction,” Methods Mol Biol, vol. 226, pp. 3-6, 2003.
[6] R. P. Beasley, L. Y. Hwang, C. C. Lin, and C. S. Chien, “Hepatocellular carcinoma and hepatitis B virus. A prospective study of 22 707 men in Taiwan,” Lancet, vol. 2, no. 8256, pp. 1129-33, Nov 21, 1981.
[7] David R. Bentley, “Whole-genome re-sequencing,” Current Opinion in Genetics & Development, vol. 16, no. 6, pp. 545-552, 2006.
[8] Michael Burrows, and David J Wheeler, “A block-sorting lossless data compression algorithm,” 1994.
[9] B. F. Chen, C. J. Liu, G. M. Jow, P. J. Chen, J. H. Kao, and D. S. Chen, “High prevalence and mapping of pre-S deletion in hepatitis B virus carriers with progressive liver diseases,” Gastroenterology, vol. 130, no. 4, pp. 1153-68, Apr, 2006.
[10] C. J. Chen, M. W. Yu, and Y. F. Liaw, “Epidemiological characteristics and risk factors of hepatocellular carcinoma,” J Gastroenterol Hepatol, vol. 12, no. 9-10, pp. S294-308, Oct, 1997.
[11] X. Ding, M. Mizokami, G. Yao, B. Xu, E. Orito, R. Ueda, and M. Nakanishi, “Hepatitis B virus genotype distribution among chronic hepatitis B virus carriers in Shanghai, China,” Intervirology, vol. 44, no. 1, pp. 43-7, 2001.
[12] Z. L. Fang, C. A. Sabin, B. Q. Dong, S. C. Wei, Q. Y. Chen, K. X. Fang, J. Y. Yang, J. Huang, X. Y. Wang, and T. J. Harrison, “Hepatitis B virus pre-S deletion mutations are a risk factor for hepatocellular carcinoma: a matched nested case-control study,” J Gen Virol, vol. 89, no. Pt 11, pp. 2882-90, Nov, 2008.
[13] P. Ferragina, and G. Manzini, “Opportunistic data structures with applications,” in Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 2000, pp. 390.
[14] O. Gotoh, “An improved algorithm for matching biological sequences,” J Mol Biol, vol. 162, no. 3, pp. 705-8, Dec 15, 1982.
[15] I. D. Gust, “Epidemiology of hepatitis B infection in the Western Pacific and South East Asia,” Gut, vol. 38 Suppl 2, pp. S18-23, 1996.
[16] N. Homer, B. Merriman, and S. F. Nelson, “BFAST: an alignment tool for large scale genome resequencing,” PLoS One, vol. 4, no. 11, pp. e7767, 2009.
[17] J. Hou, Z. Liu, and F. Gu, “Epidemiology and Prevention of Hepatitis B Virus Infection,” Int J Med Sci, vol. 2, no. 1, pp. 50-57, 2005.
[18] Y. Hu, K. Wang, X. He, D. Y. Chiang, J. F. Prins, and J. Liu, “A probabilistic framework for aligning paired-end RNA-seq data,” Bioinformatics, vol. 26, no. 16, pp. 1950-7, Aug 15, 2010.
[19] J. H. Kao, N. H. Wu, P. J. Chen, M. Y. Lai, and D. S. Chen, “Hepatitis B genotypes and the response to interferon therapy,” J Hepatol, vol. 33, no. 6, pp. 998-1002, Dec, 2000.
[20] W. J. Kent, “BLAT--the BLAST-like alignment tool,” Genome Res, vol. 12, no. 4, pp. 656-64, Apr, 2002.
[21] D. Kim, and S. L. Salzberg, “TopHat-Fusion: an algorithm for discovery of novel fusion transcripts,” Genome Biol, vol. 12, no. 8, pp. R72, 2011.
[22] M. Kobayashi, Y. Arase, K. Ikeda, A. Tsubota, Y. Suzuki, S. Saitoh, F. Suzuki, N. Akuta, T. Someya, M. Matsuda, J. Sato, and H. Kumada, “Clinical characteristics of patients infected with hepatitis B virus genotypes A, B, and C,” J Gastroenterol, vol. 37, no. 1, pp. 35-9, Jan, 2002.
[23] B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg, “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biol, vol. 10, no. 3, pp. R25, 2009.
[24] B. Langmead, and S. L. Salzberg, “Fast gapped-read alignment with Bowtie 2,” Nat Methods, vol. 9, no. 4, pp. 357-9, Apr, 2012.
[25] H. Li, and N. Homer, “A survey of sequence alignment algorithms for next-generation sequencing,” Brief Bioinform, vol. 11, no. 5, pp. 473-83, Sep, 2010.
[26] H. Li, J. Ruan, and R. Durbin, “Mapping short DNA sequencing reads and calling variants using mapping quality scores,” Genome Res, vol. 18, no. 11, pp. 1851-8, Nov, 2008.
[27] H. Li, and R. Durbin, “Fast and accurate short read alignment with Burrows-Wheeler transform,” Bioinformatics, vol. 25, no. 14, pp. 1754-60, Jul 15, 2009.
[28] H. Li, and R. Durbin, “Fast and accurate long-read alignment with Burrows-Wheeler transform,” Bioinformatics, vol. 26, no. 5, pp. 589-95, Mar 1, 2010.
[29] I. T. Li, W. Shum, and K. Truong, “160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA),” BMC Bioinformatics, vol. 8, pp. 185, 2007.
[30] R. Li, H. Zhu, J. Ruan, W. Qian, X. Fang, Z. Shi, Y. Li, S. Li, G. Shan, K. Kristiansen, H. Yang, and J. Wang, “De novo assembly of human genomes with massively parallel short read sequencing,” Genome Res, vol. 20, no. 2, pp. 265-72, Feb, 2010.
[31] R. Li, C. Yu, Y. Li, T. W. Lam, S. M. Yiu, K. Kristiansen, and J. Wang, “SOAP2: an improved ultrafast tool for short read alignment,” Bioinformatics, vol. 25, no. 15, pp. 1966-7, Aug 1, 2009.
[32] C. M. Lin, G. M. Wang, G. M. Jow, and B. F. Chen, “Functional analysis of hepatitis B virus pre-s deletion variants associated with hepatocellular carcinoma,” J Biomed Sci, vol. 19, pp. 17, 2012.
[33] H. Lin, Z. Zhang, M. Q. Zhang, B. Ma, and M. Li, “ZOOM! Zillions of oligos mapped,” Bioinformatics, vol. 24, no. 21, pp. 2431-7, Nov 1, 2008.
[34] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, “NVIDIA Tesla: A Unified Graphics and Computing Architecture,” Micro, IEEE, vol. 28, no. 2, pp. 39-55, 2008.
[35] Y. Liu, B. Schmidt, and D. L. Maskell, “CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions,” BMC Res Notes, vol. 3, pp. 93, 2010.
[36] Y. Liu, D. L. Maskell, and B. Schmidt, “CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units,” BMC Res Notes, vol. 2, pp. 73, 2009.
[37] A. S. Lok, “Chronic hepatitis B,” N Engl J Med, vol. 346, no. 22, pp. 1682-3, May 30, 2002.
[38] Anna S.F. Lok, “Chronic Hepatitis B,” New England Journal of Medicine, vol. 346, no. 22, pp. 1682-1683, 2002.
[39] E. R. Mardis, “The impact of next-generation sequencing technology on genetics,” Trends Genet, vol. 24, no. 3, pp. 133-41, Mar, 2008.
[40] E. R. Mardis, “Next-generation DNA sequencing methods,” Annu Rev Genomics Hum Genet, vol. 9, pp. 387-402, 2008.
[41] M. Margulies, M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, L. A. Bemben, J. Berka, M. S. Braverman, Y. J. Chen, Z. Chen, S. B. Dewell, L. Du, J. M. Fierro, X. V. Gomes, B. C. Godwin, W. He, S. Helgesen, C. H. Ho, G. P. Irzyk, S. C. Jando, M. L. Alenquer, T. P. Jarvie, K. B. Jirage, J. B. Kim, J. R. Knight, J. R. Lanza, J. H. Leamon, S. M. Lefkowitz, M. Lei, J. Li, K. L. Lohman, H. Lu, V. B. Makhijani, K. E. McDade, M. P. McKenna, E. W. Myers, E. Nickerson, J. R. Nobile, R. Plant, B. P. Puc, M. T. Ronan, G. T. Roth, G. J. Sarkis, J. F. Simons, J. W. Simpson, M. Srinivasan, K. R. Tartaro, A. Tomasz, K. A. Vogt, G. A. Volkmer, S. H. Wang, Y. Wang, M. P. Weiner, P. Yu, R. F. Begley, and J. M. Rothberg, “Genome sequencing in microfabricated high-density picolitre reactors,” Nature, vol. 437, no. 7057, pp. 376-80, Sep 15, 2005.
[42] G. M. McQuillan, T. R. Townsend, H. A. Fields, M. Carroll, M. Leahy, and B. F. Polk, “Seroepidemiology of hepatitis B virus infection in the United States. 1976 to 1980,” Am J Med, vol. 87, no. 3A, pp. 5S-10S, Sep 4, 1989.
[43] J. R. Miller, S. Koren, and G. Sutton, “Assembly algorithms for next-generation sequencing data,” Genomics, vol. 95, no. 6, pp. 315-27, Jun, 2010.
[44] Z. Ning, A. J. Cox, and J. C. Mullikin, “SSAHA: a fast search method for large DNA databases,” Genome Res, vol. 11, no. 10, pp. 1725-9, Oct, 2001.
[45] D. M. Parkin, F. Bray, J. Ferlay, and P. Pisani, “Estimating the world cancer burden: Globocan 2000,” Int J Cancer, vol. 94, no. 2, pp. 153-6, Oct 15, 2001.
[46] S. M. Rumble, P. Lacroute, A. V. Dalca, M. Fiume, A. Sidow, and M. Brudno, “SHRiMP: accurate mapping of short color-space reads,” PLoS Comput Biol, vol. 5, no. 5, pp. e1000386, May, 2009.
[47] L. V. Sanchez, M. Maldonado, B. E. Bastidas-Ramirez, H. Norder, and A. Panduro, “Genotypes and S-gene variability of Mexican hepatitis B virus strains,” J Med Virol, vol. 68, no. 1, pp. 24-32, Sep, 2002.
[48] F. Sanger, and A. R. Coulson, “A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase,” J Mol Biol, vol. 94, no. 3, pp. 441-8, May 25, 1975.
[49] F. Sanger, S. Nicklen, and A. R. Coulson, “DNA sequencing with chain-terminating inhibitors,” Proc Natl Acad Sci U S A, vol. 74, no. 12, pp. 5463-7, Dec, 1977.
[50] J. Shendure, and H. Ji, “Next-generation DNA sequencing,” Nat Biotechnol, vol. 26, no. 10, pp. 1135-45, Oct, 2008.
[51] A. D. Smith, Z. Xuan, and M. Q. Zhang, “Using quality scores and longer reads improves accuracy of Solexa read mapping,” BMC Bioinformatics, vol. 9, pp. 128, 2008.
[52] A. D. Smith, W. Y. Chung, E. Hodges, J. Kendall, G. Hannon, J. Hicks, Z. Xuan, and M. Q. Zhang, “Updates to the RMAP short-read mapping software,” Bioinformatics, vol. 25, no. 21, pp. 2841-2, Nov 1, 2009.
[53] T. F. Smith, and M. S. Waterman, “Identification of common molecular subsequences,” J Mol Biol, vol. 147, no. 1, pp. 195-7, Mar 25, 1981.
[54] L. Stuyver, S. De Gendt, C. Van Geyt, F. Zoulim, M. Fried, R. F. Schinazi, and R. Rossau, “A new genotype of hepatitis B virus: complete genome and phylogenetic relatedness,” J Gen Virol, vol. 81, no. Pt 1, pp. 67-74, Jan, 2000.
[55] S. Vieth, C. Manegold, C. Drosten, T. Nippraschk, and S. Gunther, “Sequence and phylogenetic analysis of hepatitis B virus genotype G isolated in Germany,” Virus Genes, vol. 24, no. 2, pp. 153-6, Mar, 2002.
[56] P. D. Vouzis, and N. V. Sahinidis, “GPU-BLAST: using graphics processors to accelerate protein sequence alignment,” Bioinformatics, vol. 27, no. 2, pp. 182-8, Jan 15, 2011.
[57] R. L. Warren, G. G. Sutton, S. J. Jones, and R. A. Holt, “Assembling millions of short DNA sequences using SSAKE,” Bioinformatics, vol. 23, no. 4, pp. 500-1, Feb 15, 2007.
[58] D. Weese, A. K. Emde, T. Rausch, A. Doring, and K. Reinert, “RazerS--fast read mapping with sensitivity control,” Genome Res, vol. 19, no. 9, pp. 1646-54, Sep, 2009.
[59] WHO, “http://www.who.int/mediacentre/factsheets/fs204/en/.”
[60] K. Ye, M. H. Schulz, Q. Long, R. Apweiler, and Z. Ning, “Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads,” Bioinformatics, vol. 25, no. 21, pp. 2865-71, Nov 1, 2009.
[61] P. Yeung, D. K. Wong, C. L. Lai, J. Fung, W. K. Seto, and M. F. Yuen, “Association of hepatitis B virus pre-S deletions with the development of hepatocellular carcinoma in chronic hepatitis B,” J Infect Dis, vol. 203, no. 5, pp. 646-54, Mar 1, 2011.
[62] D. R. Zerbino, and E. Birney, “Velvet: algorithms for de novo short read assembly using de Bruijn graphs,” Genome Res, vol. 18, no. 5, pp. 821-9, May, 2008.
[63] D. Zhang, P. Dong, K. Zhang, L. Deng, C. Bach, W. Chen, F. Li, U. Protzer, H. Ding, and C. Zeng, “Whole genome HBV deletion profiles and the accumulation of preS deletion mutant during antiviral treatment,” BMC Microbiol, vol. 12, pp. 307, 2012.
[64] Q. Zhang, and G. Cao, “Genotypes, mutations, and viral load of hepatitis B virus and the risk of hepatocellular carcinoma: HBV properties and hepatocarcinogenesis,” Hepat Mon, vol. 11, no. 2, pp. 86-91, Feb, 2011.
[65] X. Zhou, L. Ren, Q. Meng, Y. Li, Y. Yu, and J. Yu, “The next-generation sequencing technology and application,” Protein Cell, vol. 1, no. 6, pp. 520-36, Jun, 2010.
校內:2018-08-27公開