研究生: |
潘致中 Pan, Chin-Chung |
---|---|
論文名稱: |
用於生物序列比對之高效能FPGA加速器 A High Performance FPGA-Based Accelerator for Biological Sequence Alignment |
指導教授: |
張燕光
Chang, Yeim-Kuan |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 英文 |
論文頁數: | 65 |
中文關鍵詞: | 序列比對 、可程式規劃邏輯陣列 、動態規劃演算法 、硬體加速器 |
外文關鍵詞: | Sequence alignment, FPGA, dynamic programming, hardware accelerator |
相關次數: | 點閱:152 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來生物資訊領域遭遇到強大的挑戰,資料庫以指數的快速成長,在做搜尋時,必須花費相當多的時間。因此,我們開始研究各種可以加速搜尋的方法,來縮短搜尋的時間。而序列比對(Sequence Alignment)可以廣泛的利用在生物資訊領域當中,主要目的是尋找脫氧核糖核酸、核糖核酸或蛋白質序列,兩條或多條序列彼此之間是否有相關性,如果一條未知的序列與資料庫中一條已知的序列有相關性,並且從已知的序列知道這樣的排列結果為癌基因,則我們就可以藉由序列比對這樣的方式,判讀未知的序列也可能為癌基因。這在生物資訊領域中,是一項非常重要的技術。
傳統上,序列比對需要使用動態規劃演算法計算兩條序列的相關性,造成計算量的負擔變得相當沉重。因此,近年提出許多硬體加速的方法實作在各種平台,包含可程式規劃邏輯陣列(FPGA)、圖形處理器(GPU)和單元寬頻處理器(CBU)等。本論文提出了三個加速的方法,分別為PAD, UCE和SC;PAD方案運用平行處理的方式循序計算反對角線;UCE方案可以省略計算大部分不被使用的可能性,並且可以保持高準確率;最後SC方案將資料庫分群,以便快速尋找到相似Top10的序列。這些方法利用有效的管線化技術,將它們實作在FPGA上。效能上,PAD的比原始AD方法快兩倍,而UCE在省略百分之五十的情形下,又可以再加快兩倍的速度,因此PAD結合UCE就可以達到原始AD四倍的效能。此外,SC方法約可以省略百分之四十的計算量。我們提出的這三個方法約可以達到約八倍的效能,這將對序列比對有相當大的幫助。
In recent years, the bioinformatics field faces a challenge on executing sequence alignment because the sequence databases grow at an exponential rate. Therefore, it is desired to find a variety of ways to shorten the time to compute the sequence alignment. In bioinformatics field, biological sequence alignment is a general operation for DNA, RNA and protein sequences. The main purpose of sequence alignment is to find how sequences in the database are correlated with each other. If one unknown sequence is closely related to a known database sequence, then some useful information from the known sequence can be obtained which is a very important technique in bioinformatics field.
Traditionally, sequence alignment is based on dynamic programming technique that spends a lot of computing time. Therefore, a large number of hardware acceleration approaches have been proposed recently, including FPGA, Graphics Processing Unit (GPU) and Cell Broad Engine (CBE). In this thesis, we propose three acceleration schemes that are Parallel Antidiagonal (PAD), Unnecessary Computation Exclusion (UCE), and Sequence Clustering (SC). PAD scheme parallelizes the sequential antidiagonal algorithm by exploiting its parallelism. UCE scheme smartly avoids some unnecessary computations without losing the computation accuracy. SC scheme reduces the computation time of computing the top-10 most similar sequences from database to the input sequence by a clustering method. These proposed schemes are implemented by using efficient pipelined technique on FPGA architecture. The performance of PAD is twice as fast as AD algorithm on FPGA and UCE achieves a speedup of two by excluding fifty percent of dynamic programming table computations. Hence, PAD combined with UCE can be four times as fast as AD. Besides, exclusion computation ratio of SC is achieved approximately 40% where exclusion computation ratio is percentage and gotten by saving computation divide by original computation. The throughput of three schemes is attained eight times that of ordinary ones and these can be used to offer great assistance for biological sequence alignment.
[1] S.F. Altschul, W. Gish, W. Miller, E.W. Myers and D.J Lipman, “Basic Local Alignment Search Tool,” Journal of Molecular Biology, pp. 215, 403^410, 1990.
[2] S. Aluru, N. Futamura and K. Mehrotra, “Parallel Biological Sequence Comparison Using Prefix Computations,” ACM Journal of Parallel and Distributed Computing, vol. 63, pp. 264-272, 2003.
[3] D.A. Benson, K.M.Ilene, D.J. Lipman, J. Ostell and D.L. Wheeler, “GenBank,” OXFORD Journal of Nucleic Acids Research, vol. 35, pp. D21-D25, 2007.
[4] K. Benkrid, Liu Ying and A. Benkrid, “A Highly Parameterized and Efficient FPGA-Based Skeleton for Pairwise Biological Sequence Alignment,” IEEE Transactions on Very Large Scale Integration Systems, vol. 17, pp. 561-570, 2009.
[5] A. Boukerche, J.M. Correa and A. Melo, R.P. Jacobi, “A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space,” IEEE Transactions on Computers, vol. 59, no. 6, pp. 808-821 JUNE 2010.
[6] S. R. Eddy, “Where did the BLOSUM62 alignment score matrix come from,” NPG Journal of Nature Biotechnology, 22, pp. 1035-1036, August 2004.
[7] D.F. Feng and R.F Doolittle, “Progressive sequence alignment as a prerequisite to correct phylogenetic trees,” Journal of Molecular Biology, vol., pp. 25, 351-360, 1987.
[8] O. Gotoh, “An Improved Algorithm for Matching Biological Sequences,” Journal of Molecular Biology, vol.162, pp.705-708, 1982.
[9] D.S Hirschberg, “A Linear Space Algorithm for Computing Maximal Common Subsequences,” ACM Communications, pp. 341–343, June 1975.
[10] X. Jiang, X. Liu, L. Xu, P. Zhang and N. Sun, "A Reconfigurable Accelerator for Smith-Waterman Algorithm," IEEE Transactions on Circuits and Systems II, vol. 54, no.12, pp.1077-1081, Dec. 2007.
[11] W. Liu, B. Schmidt, G. Voss, and W. Mueller-Wittig, “Streaming Algorithms for Biological Sequence Alignment on GPUs,” IEEE Transactions on Parallel and Distributed Systems, vol. 18, no. 9, pp. 1270-1281, June 2007.
[12] S. Lloyd and Q. Snell, “Sequence Alignment with Traceback on Reconfigurable Hardware,” IEEE Conference on Reconfigurable Computing and FPGAs - ReConFig ’08, pp. 259-264, Dec. 2008.
[13] B. Morgenstern, K. Frech, A. Dress and T. Werner, “Multiple DNA and Protein Sequence Alignment Based on Segment-to-Segment Comparison,” in proceedings of National Academy of Sciences, 93, pp. 12098-12103, 1996.
[14] B. Morgenstern, K. Frech, A. Dress and T. Werner, “DIALIGN: Finding Local Similarities by Multiple Sequence Alignment,” OXFORD Journal of Bioinformatics, vol.14, no.3, pp. 290-294, Mar. 1998.
[15] E.W. Myers and W. Miller, “Optimal Alignments in Linear Space,” OXFORD Journal of Bioinformatics, vol. 4, pp. 11-17, 1988.
[16] S. Needleman and C.Wunsch, “A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins,” Journal of Molecular Biology, pp. 48:443-453, 1970.
[17] T. Oliver, B. Schmidt and D. Maskell, “Hyper Customized Processors for Bio-Sequence Database Scanning on FPGAs,” ACM/SIGDA Symposium on Field-Programmable Gate Arrays, pp. 229-237, 2005.
[18] S. Rajko and S. Aluru, “Space and Time Optimal Parallel Sequence Alignments,” IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 12, pp. 1070-1081, Dec. 2004.
[19] T. Smith and M. Waterman, “Identification of Common Molecular Subsequences,” Journal of Journal of Molecular Biology, pp. 147:195-197, 1981.
[20] A. Sarje and S. Aluru, “Parallel Biological Sequence Alignments on the Cell Broadband Engine,” in proceedings of IEEE International Symposium on Parallel and Distributed Processing Symp., pp. 1-11, 2008.
[21] O. Storaasli and D. Strenski, “Experiences on 64 and 150 FPGA Systems,” in Proceedings of the Fourth Annual Reconfigurable Systems Summer Institute, 2008.
[22] S. Sarkar, G.R. Kulkarni, P.P. Pande and A. Kalyanaraman, “Network-on-Chip Hardware Accelerators for Biological Sequence Alignment,” IEEE Transactions on Computers, pp. 29 – 41, 2010.
[23] N. Sebastiao, T. Dias, N. Roma, and P. Flores, “Integrated Accelerator Architecture for DNA Sequences Alignment with Enhanced Traceback Phase,” in Proceedings of IEEE Conference on High Performance Computing and Simulation, pp. 16-23, 2010.
[24] J. Thompson, D. Higgins and T. Gibson, “CLUSTALW: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice,” Nucleic Acids Research, vol. 22, pp. 4673-4680, 1994.
[25] Y. Yamaguchi , T. Maruyama and A. Konagaya "High Speed Homology Search with FPGAs", Pacific Symposium on Biocomputing, pp. 271-282, 2002.
[26] C.W. Yu, K.H. Kwong, K.H. Lee and P.H.W. Leong, "A Smith-Waterman Systolic Cell", in Proceedings of the International Conference on Field Programmable Logic and Applications, pp.375-384, 2003.
[27] S. Yooseph et al., “The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families,” Public Library of Science Biology, vol. 5, no. 3, 2007.
[28] Y. Yamaguchi, H. K. Tsoi and W. Luk, “FPGA-Based Smith-Waterman Algorithm Analysis and Novel Design,” in proceedings of ACM Conference on Reconfigurable computing: architectures, tools and applications, pp.181-192, 2011.
[29] P. Zhang, G. Tan and G.R. Gao, “Implementation of the Smith- Waterman Algorithm on a Reconfigurable Supercomputing Platform”, in proceedings of ACM Conference on High-performance Reconfigurable Computing Technology and Applications, pp.39-48, 2007.
[30] National Institutes of Health, http://www.nih.gov/, 2011.
[31] NCBI FTP site, ftp://ftp.ncbi.nih.gov/blast/db/FASTA/, 2011.
[32] Uniref Database, http://www.ebi.ac.uk/uniref, 2011.
[33] Xilinx Virtex-5 Plarform FPGAs: Detailed description. http://www.xilinx.com/support/documentation/virtex-5.htm.