簡易檢索 / 詳目顯示

研究生: 唐文鴻
Tang, Wen-Hung
論文名稱: 利用FPGA加速SMEM Seeding 用於DNA序列比對
Accelerate the SMEM Seeding by using FPGA for DNA Squence Alignment
指導教授: 黃吉川
Hwang, Chi-Chuan
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 48
中文關鍵詞: 序列比對FPGASMEM SeeingBWA
外文關鍵詞: sequence alignment, FPGA, SMEM Seeing, BWA
相關次數: 點閱:74下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,利用FPGA加速基因比對算法已形成一個趨勢。而許多現代的比對演算法都透過所謂種子和擴展方式進行,BWA正是其中之一。本文利用FPGA加速BWA中的種子階段,這個階段的目標很簡單,犧牲少許的準確性,以更快地找出目標。
    該階段使用 SMEM Seeding演算法產生種子。在本文中,我們提出了一種系統架構,其中包含解碼模組、比對模組、資料要求模組、資料接收模組、控制模組、PCIe控制器、記憶體控制器。其中比對模組實現了SMEM演算法中的 Backward search比對方法。完整系統在Xilinx V7 NetFPGA-SUME開發板上實現。
    與CPU執行的效能進行比較,我們將SMEM算法提高了15倍。我們將進一步分析由於頻繁的存取DDR3 SDRAM而導致的設計性能瓶頸,並討論將來值得改善的部分。

    In recent years, there has been a trend to use FPGA to accelerate DNA alignment algorithms. Many modern algorithms align through a method called the Seed-and-Extend paradigm, and BWA is one of them. In this paper, we use FPGA to accelerate the seeding phase of BWA. The goal of this phase is simple, sacrificing a little accuracy in order to find the target faster.
    This phase uses the SMEM Seeding algorithm to generate seeds. In this paper, we propose a system architecture which contains decode module, align module, data request module, data receivie module, control module, PCIe controller, and memory controller. The align module implements the backward search method in the SMEM algorithm. The complete system is implemented on the Xilinx V7-NetFPGA SUME Development board.
    Our implementation of SMEM algorithm is 15x faster, compared to software-only execution.We will further analyze the design performance bottleneck caused by frequent access to DDR3 SDRAM and discuss the parts that deserve improvement in the future.

    摘要 I Extended Abstract II 致謝 VI 第1章 緒論 1 1-1 研究背景 1 1-2 研究動機 3 1-3 研究目的 3 1-4 本文架構 4 第2章 相關研究 5 2-1 次世代定序 5 2-2 DNA序列分析 6 2-3 Burrows-Wheeler Transform 搭配 FM-index演算法 8 2-3-1 Suffix Tree與Suffix array 9 2-3-2 BWT 11 2-3-3 FM -index與backward search 12 2-4 Burrows-Wheeler Aligner 18 2-5 BWA-MEM 18 2-6 FPGA 20 2-7 PCI Express 21 2-8 AXI協議 23 第3章 SMEM Seeding演算法 27 3-1 FMD-index 27 第4章 FPGA實現 30 4-1 BWA軟體資料前處理 31 4-2 系統架構 33 4-2-1 PCIE與Memory Interface 34 4-2-2 Processing Element(PE) 35 4-2-2-1 比對模組 36 4-2-2-2 解碼模組 36 4-2-2-3 控制模組 37 4-2-2-4 資料要求模組 39 4-2-2-5 資料接模組 40 第5章 結果與討論 41 第6章 未來展望 43 參考文獻 44

    [1] Heng Li and Richard Durbin. Fast and accurate short read alignment with burrows{wheeler transform. Bioinformatics, 25(14):1754{1760, 2009.
    [2] S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of molecular biology, vol. 48, no. 3, pp. 443–453, 1970.
    [3] T. F. Smith, M. S. Waterman et al., “Identification of common molecular subsequences,” Journal of molecular biology, vol.147, no. 1, pp. 195–197, 1981.
    [4] N. Homer, B. Merriman, and S. F. Nelson, “Bfast: an alignment tool for large scale genome resequencing,” PloS one, vol. 4, no. 11, p. e7767, 2009.
    [5] S. F. Altschul, T. L. Madden, A. A. Sch¨affer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped blast and psiblast: a new generation of protein database search programs,”Nucleicacids research, vol. 25, no. 17, pp. 3389–3402, 1997.
    [6] S. Salamat and T. Rosing, “FPGA Acceleration of Sequence Alignment:A Survey,” arXiv preprint arXiv:2002.02394v2, 2020.
    [7] I. H. G. S. Consortium et al., “Initial sequencing and analysis of the human genome,” nature, vol. 409, no. 6822, p.860, 2001.
    [8] Y. Turakhia, G. Bejerano, and W. J. Dally, “Darwin: A genomics co-processor provides up to 15,000 x acceleration on long read assembly,” in ACM SIGPLAN Notices, vol. 53, no. 2. ACM,2018, pp.199–213.
    [9] E. D. Pleasance, R. K. Cheetham, P. J. Stephens, D. J. McBride,S. J. Humphray, C. D. Greenman, I. Varela, M.-L. Lin, G. R.Ordo ́nez, G. R. Bignell ̃ et al., “A comprehensive catalogue of somatic mutations from a human cancer genome,” Nature, vol.463, no. 7278, p. 191, 2010.
    [10] A. Lacour, A. Espinosa, E. Louwersheimer, S. Heilmann, I. Hernandez, S. Wolfsgruber, V. Fern ́ andez, H. Wagner, ́ M. Rosende-Roca, A. Mauleon ́ et al., “Genome-wide significant risk factors for alzheimers disease: role in progression to dementia due to alzheimer’s disease among subjects with mild cognitive impairment,” Molecular psychiatry, vol. 22, no. 1, p.153, 2017.
    [11] Y. Cho, C.-H. Lee, E.-G. Jeong, M.-H. Kim, J. H. Hong, Y. Ko, B. Lee, G. Yun, B. J. Kim, J. Jung et al., “Prevalence of rare genetic variations and their implications in ngs-data interpretation,” Scientific reports, vol. 7, no. 1, p. 9810, 2017.
    [12] N. Krumm, T. N. Turner, C. Baker, L. Vives, K. Mohajeri, K. Witherspoon, A. Raja, B. P. Coe, H. A. Stessman, Z.-X.He et al., “Excess of rare, inherited truncating mutations in autism,” Nature genetics, vol. 47, no. 6, p. 582, 2015.
    [13] O. Spichenok, Z. M. Budimlija, A. A. Mitchell, A. Jenny, L. Kovacevic, D. Marjanovic, T. Caragine, M. Prinz, and E. Wurmbach, “Prediction of eye and skin color in diverse populations using seven snps,” Forensic Science International: Genetics, vol. 5, no. 5, pp. 472–478, 2011.
    [14] T. C. Sequencing, R. H. Waterson, E. S. Lander, R. K. Wilson, A. Consortium et al., “Initial sequence of the chimpanzee genome and comparison with the human genome,” Nature, vol. 437, no. 7055, p. 69, 2005.
    [15] C. Y. McLean, P. L. Reno, A. A. Pollen, A. I. Bassan, T. D. Capellini, C. Guenther, V. B. Indjeian, X. Lim, D. B. Menke, B. T. Schaar et al., “Human-specific loss of regulatory dna and the evolution of human-specific traits,” Nature, vol. 471, no. 7337, p. 216, 2011.
    [16] M. A. Hamburg and F. S. Collins, “The path to personalized medicine,” New England Journal of Medicine, vol. 363, no. 4, pp. 301–304, 2010.
    [17] A. Al Kawam, S. Khatri, and A. Datta, “A survey of software and hardware approaches to performing read alignment in next generation sequencing,” IEEE/ACM transactions on computational biology and bioinformatics, vol. 14, no. 6, pp. 1202–1213, 2016.
    [18] F. Sanger, S. Nicklen, and A. R. Coulson, “Dna sequencing with chain-terminating inhibitors,” Proceedings of the national academy of sciences, vol. 74, no. 12, pp. 5463–5467, 1977.
    [19] M. L. Metzker, “Sequencing technologiesthe next generation,”Nature reviews genetics, vol. 11, no. 1, p. 31, 2010.
    [20] S. Kumar, K. K. Krishnani, B. Bhushan, and M. P. Brahmane,“Metagenomics: retrospect and prospects in high throughput age,” Biotechnology research international, vol. 2015, 2015.
    [21] D. Fujiki, A. Subramaniyan, T. Zhang, Y. Zeng, R. Das, D. Blaauw, and S. Narayanasamy, “Genax: a genome sequencing accelerator,” in Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 2018, pp. 69–82.
    [22] E. E. Schadt, S. Turner, and A. Kasarskis, “A window into third-generation sequencing,” Human molecular genetics, vol. 19, no. R2, pp. R227–R240, 2010.
    [23] D. Gordon, J. Huddleston, M. J. Chaisson, C. M. Hill, Z. N. Kronenberg, K. M. Munson, M. Malig, A. Raja, I. Fiddes, L. W. Hillier et al., “Long-read sequence assembly of the gorilla genome,” Science, vol. 352, no. 6281, p. aae0344, 2016.
    [24] S. Goodwin, J. Gurtowski, S. Ethe-Sayers, P. Deshpande, M. C. Schatz, and W. R. McCombie, “Oxford nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome,” Genome research, vol. 25, no. 11, pp. 1750–1756, 2015.
    [25] http://devblog.dnanexus.com
    [26] Z. Ning, A. J. Cox, and J. C. Mullikin, “Ssaha: a fast search method for large dna databases,” Genome research, vol. 11, no. 10, pp. 1725–1729, 2001.
    [27] Ben Langmead, Cole Trapnell, Mihai Pop, Steven L Salzberg, et al. Ultrafast and memory-ecient alignment of short dna sequences to the human genome.Genome biol, 10(3):R25, 2009.
    [28] Ruiqiang Li, Chang Yu, Yingrui Li, Tak-Wah Lam, Siu-Ming Yiu, Karsten Kristiansen, and Jun Wang. Soap2: an improved ultrafast tool for short read alignment. Bioinformatics, 25(15):1966{1967, 2009.
    [29] Michael Burrows and David J Wheeler. A block-sorting losslessdata compression algorithm. 1994.
    [30] P. Ferragina and G. Manzini, “Opportunistic data structures with applications,” in Proceedings 41st Annual Symposium on Foundations of Computer Science. IEEE, 2000, pp. 390–398.
    [31] P. Ferragina and G. Manzini (2000). Opportunistic Data Structures with Applications. In Proceedings of FOCS, pages 390 – 398.
    [32] H. Li. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997, 2013.
    [33] 7 Series FPGAs Configurable Logic Block User Guide UG474 (v1.8) September 27, 2016
    [34] AMBA® AXI™ and ACE™ Protocol Specification
    [35] Vivado Design Suite Vivado AXI Referenc UG1037 (v4.0) July 15, 2017
    [36] H. Li, “Exploring single-sample SNP and INDEL calling with whole- genome De Novo assembly,” Bioinformatics, vol. 28, no. 14, pp. 1838– 1844, Jul. 2012.

    下載圖示
    2026-01-01公開
    QR CODE