| 研究生: |
唐文鴻 Tang, Wen-Hung |
|---|---|
| 論文名稱: |
利用FPGA加速SMEM Seeding 用於DNA序列比對 Accelerate the SMEM Seeding by using FPGA for DNA Squence Alignment |
| 指導教授: |
黃吉川
Hwang, Chi-Chuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 48 |
| 中文關鍵詞: | 序列比對 、FPGA 、SMEM Seeing 、BWA |
| 外文關鍵詞: | sequence alignment, FPGA, SMEM Seeing, BWA |
| 相關次數: | 點閱:74 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,利用FPGA加速基因比對算法已形成一個趨勢。而許多現代的比對演算法都透過所謂種子和擴展方式進行,BWA正是其中之一。本文利用FPGA加速BWA中的種子階段,這個階段的目標很簡單,犧牲少許的準確性,以更快地找出目標。
該階段使用 SMEM Seeding演算法產生種子。在本文中,我們提出了一種系統架構,其中包含解碼模組、比對模組、資料要求模組、資料接收模組、控制模組、PCIe控制器、記憶體控制器。其中比對模組實現了SMEM演算法中的 Backward search比對方法。完整系統在Xilinx V7 NetFPGA-SUME開發板上實現。
與CPU執行的效能進行比較,我們將SMEM算法提高了15倍。我們將進一步分析由於頻繁的存取DDR3 SDRAM而導致的設計性能瓶頸,並討論將來值得改善的部分。
In recent years, there has been a trend to use FPGA to accelerate DNA alignment algorithms. Many modern algorithms align through a method called the Seed-and-Extend paradigm, and BWA is one of them. In this paper, we use FPGA to accelerate the seeding phase of BWA. The goal of this phase is simple, sacrificing a little accuracy in order to find the target faster.
This phase uses the SMEM Seeding algorithm to generate seeds. In this paper, we propose a system architecture which contains decode module, align module, data request module, data receivie module, control module, PCIe controller, and memory controller. The align module implements the backward search method in the SMEM algorithm. The complete system is implemented on the Xilinx V7-NetFPGA SUME Development board.
Our implementation of SMEM algorithm is 15x faster, compared to software-only execution.We will further analyze the design performance bottleneck caused by frequent access to DDR3 SDRAM and discuss the parts that deserve improvement in the future.
[1] Heng Li and Richard Durbin. Fast and accurate short read alignment with burrows{wheeler transform. Bioinformatics, 25(14):1754{1760, 2009.
[2] S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of molecular biology, vol. 48, no. 3, pp. 443–453, 1970.
[3] T. F. Smith, M. S. Waterman et al., “Identification of common molecular subsequences,” Journal of molecular biology, vol.147, no. 1, pp. 195–197, 1981.
[4] N. Homer, B. Merriman, and S. F. Nelson, “Bfast: an alignment tool for large scale genome resequencing,” PloS one, vol. 4, no. 11, p. e7767, 2009.
[5] S. F. Altschul, T. L. Madden, A. A. Sch¨affer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped blast and psiblast: a new generation of protein database search programs,”Nucleicacids research, vol. 25, no. 17, pp. 3389–3402, 1997.
[6] S. Salamat and T. Rosing, “FPGA Acceleration of Sequence Alignment:A Survey,” arXiv preprint arXiv:2002.02394v2, 2020.
[7] I. H. G. S. Consortium et al., “Initial sequencing and analysis of the human genome,” nature, vol. 409, no. 6822, p.860, 2001.
[8] Y. Turakhia, G. Bejerano, and W. J. Dally, “Darwin: A genomics co-processor provides up to 15,000 x acceleration on long read assembly,” in ACM SIGPLAN Notices, vol. 53, no. 2. ACM,2018, pp.199–213.
[9] E. D. Pleasance, R. K. Cheetham, P. J. Stephens, D. J. McBride,S. J. Humphray, C. D. Greenman, I. Varela, M.-L. Lin, G. R.Ordo ́nez, G. R. Bignell ̃ et al., “A comprehensive catalogue of somatic mutations from a human cancer genome,” Nature, vol.463, no. 7278, p. 191, 2010.
[10] A. Lacour, A. Espinosa, E. Louwersheimer, S. Heilmann, I. Hernandez, S. Wolfsgruber, V. Fern ́ andez, H. Wagner, ́ M. Rosende-Roca, A. Mauleon ́ et al., “Genome-wide significant risk factors for alzheimers disease: role in progression to dementia due to alzheimer’s disease among subjects with mild cognitive impairment,” Molecular psychiatry, vol. 22, no. 1, p.153, 2017.
[11] Y. Cho, C.-H. Lee, E.-G. Jeong, M.-H. Kim, J. H. Hong, Y. Ko, B. Lee, G. Yun, B. J. Kim, J. Jung et al., “Prevalence of rare genetic variations and their implications in ngs-data interpretation,” Scientific reports, vol. 7, no. 1, p. 9810, 2017.
[12] N. Krumm, T. N. Turner, C. Baker, L. Vives, K. Mohajeri, K. Witherspoon, A. Raja, B. P. Coe, H. A. Stessman, Z.-X.He et al., “Excess of rare, inherited truncating mutations in autism,” Nature genetics, vol. 47, no. 6, p. 582, 2015.
[13] O. Spichenok, Z. M. Budimlija, A. A. Mitchell, A. Jenny, L. Kovacevic, D. Marjanovic, T. Caragine, M. Prinz, and E. Wurmbach, “Prediction of eye and skin color in diverse populations using seven snps,” Forensic Science International: Genetics, vol. 5, no. 5, pp. 472–478, 2011.
[14] T. C. Sequencing, R. H. Waterson, E. S. Lander, R. K. Wilson, A. Consortium et al., “Initial sequence of the chimpanzee genome and comparison with the human genome,” Nature, vol. 437, no. 7055, p. 69, 2005.
[15] C. Y. McLean, P. L. Reno, A. A. Pollen, A. I. Bassan, T. D. Capellini, C. Guenther, V. B. Indjeian, X. Lim, D. B. Menke, B. T. Schaar et al., “Human-specific loss of regulatory dna and the evolution of human-specific traits,” Nature, vol. 471, no. 7337, p. 216, 2011.
[16] M. A. Hamburg and F. S. Collins, “The path to personalized medicine,” New England Journal of Medicine, vol. 363, no. 4, pp. 301–304, 2010.
[17] A. Al Kawam, S. Khatri, and A. Datta, “A survey of software and hardware approaches to performing read alignment in next generation sequencing,” IEEE/ACM transactions on computational biology and bioinformatics, vol. 14, no. 6, pp. 1202–1213, 2016.
[18] F. Sanger, S. Nicklen, and A. R. Coulson, “Dna sequencing with chain-terminating inhibitors,” Proceedings of the national academy of sciences, vol. 74, no. 12, pp. 5463–5467, 1977.
[19] M. L. Metzker, “Sequencing technologiesthe next generation,”Nature reviews genetics, vol. 11, no. 1, p. 31, 2010.
[20] S. Kumar, K. K. Krishnani, B. Bhushan, and M. P. Brahmane,“Metagenomics: retrospect and prospects in high throughput age,” Biotechnology research international, vol. 2015, 2015.
[21] D. Fujiki, A. Subramaniyan, T. Zhang, Y. Zeng, R. Das, D. Blaauw, and S. Narayanasamy, “Genax: a genome sequencing accelerator,” in Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 2018, pp. 69–82.
[22] E. E. Schadt, S. Turner, and A. Kasarskis, “A window into third-generation sequencing,” Human molecular genetics, vol. 19, no. R2, pp. R227–R240, 2010.
[23] D. Gordon, J. Huddleston, M. J. Chaisson, C. M. Hill, Z. N. Kronenberg, K. M. Munson, M. Malig, A. Raja, I. Fiddes, L. W. Hillier et al., “Long-read sequence assembly of the gorilla genome,” Science, vol. 352, no. 6281, p. aae0344, 2016.
[24] S. Goodwin, J. Gurtowski, S. Ethe-Sayers, P. Deshpande, M. C. Schatz, and W. R. McCombie, “Oxford nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome,” Genome research, vol. 25, no. 11, pp. 1750–1756, 2015.
[25] http://devblog.dnanexus.com
[26] Z. Ning, A. J. Cox, and J. C. Mullikin, “Ssaha: a fast search method for large dna databases,” Genome research, vol. 11, no. 10, pp. 1725–1729, 2001.
[27] Ben Langmead, Cole Trapnell, Mihai Pop, Steven L Salzberg, et al. Ultrafast and memory-ecient alignment of short dna sequences to the human genome.Genome biol, 10(3):R25, 2009.
[28] Ruiqiang Li, Chang Yu, Yingrui Li, Tak-Wah Lam, Siu-Ming Yiu, Karsten Kristiansen, and Jun Wang. Soap2: an improved ultrafast tool for short read alignment. Bioinformatics, 25(15):1966{1967, 2009.
[29] Michael Burrows and David J Wheeler. A block-sorting losslessdata compression algorithm. 1994.
[30] P. Ferragina and G. Manzini, “Opportunistic data structures with applications,” in Proceedings 41st Annual Symposium on Foundations of Computer Science. IEEE, 2000, pp. 390–398.
[31] P. Ferragina and G. Manzini (2000). Opportunistic Data Structures with Applications. In Proceedings of FOCS, pages 390 – 398.
[32] H. Li. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997, 2013.
[33] 7 Series FPGAs Configurable Logic Block User Guide UG474 (v1.8) September 27, 2016
[34] AMBA® AXI™ and ACE™ Protocol Specification
[35] Vivado Design Suite Vivado AXI Referenc UG1037 (v4.0) July 15, 2017
[36] H. Li, “Exploring single-sample SNP and INDEL calling with whole- genome De Novo assembly,” Bioinformatics, vol. 28, no. 14, pp. 1838– 1844, Jul. 2012.