| 研究生: |
楊宜勳 Yang, Yi-Hsun |
|---|---|
| 論文名稱: |
基因序列比對之演算法設計與硬體實作 The Algorithm Design and Hardware Operation of Gene Sequence Alignment |
| 指導教授: |
黃吉川
Hwang, Chi-Chuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 中文 |
| 論文頁數: | 49 |
| 中文關鍵詞: | 序列比對 、FPGA 、Smith-Waterman演算法 、心跳式陣列 |
| 外文關鍵詞: | sequence alignment, FPGA, Smith-Waterman, systolic array |
| 相關次數: | 點閱:151 下載:5 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
DNA分子經由DNA定序儀可讀出一串由A、T、C、G組成的長字串,此字串即為DNA序列,為了瞭解某些未知的DNA序列所帶有的訊息,常見的方法是從已知功能的序列資料庫中取出序列與該DNA序列進行比較,計算其相似度,這種相似度計算被稱為序列比對。透過序列比對可以知道兩條序列之間的相似度,並從已知功能的序列中推測未知功能的序列可能具備的特性。
Smith-Waterman演算法屬於一種動態規劃演算法,是生物資訊學研究中雙序列比對的核心算法之一,隨著DNA序列定序所需時間不斷縮短,越來越多的生物序列資料被建構成資料庫,序列比對研究對計算能力的需求也不斷增長,因此加快計算速度是序列比對研究重要課題。
本文利用vivado設計套件將Smith-Waterman演算法實作於FPGA上,並以UART傳輸介面與主機溝通,透過UART傳輸介面將欲比對的序列傳輸到FPGA序列比對系統內,再由系統中的控制器將兩條序列輸入到Smith-Waterman的模塊內進行序列比對,並搭配心跳式陣列的平行運算功能,提高演算法運算的速度,最後得到最大相似度分數。
DNA sequence is a string which composed of A、T、C、G, in order to understand some of the unknown DNA sequence with the message, common method is to compare the sequence which extracted from the database of the known function to the DNA sequence and calculate the similarity, called sequence alignment. The similarity between the two sequences can be known by sequence alignment and the characteristics of the sequence of unknown functions can be inferred from the sequence of known functions. In the thesis, we implement the Smith-Waterman algorithm on FPGA by vivado design suite, and communicate with the computer by UART, we transmit the sequence to the system in FPGA, then input two sequences to the Smith-Waterman module for alignment by controller in system, with the systolic array to compute in parallel, it can improve the speed of the algorithm operation, and finally get the maximum similarity score.
參考文獻
[1] Pennisi E. “The Birth of the Nucleus”. Science, 2004, 305(5685): 766-768.
[2] Wikipedia-Gene
https://en.wikipedia.org/wiki/Gene
[3] Downie A W. “Pneumococcal Transformation-A Backward View Fourth Griffith Memorial Lecture”. Microbiology, 1972, 73(1): 1-11.
[4] Watson J D, Crick F H C. “Molecular structure of nucleic acids”. Nature, 1953, 171(4356): 737-738.
[5] Sanger F, Nicklen S, Coulson A R. “DNA sequencing with chain-terminating inhibitors”. Proceedings of the national academy of sciences, 1977, 74(12): 5463-5467.
[6] Mardis E R. “The impact of next-generation sequencing technology on genetics.” Trends in genetics, 2008, 24(3): 133-141.
[7] McCarthy A. “Third generation DNA sequencing: pacific biosciences' single molecule real time technology”. Chemistry & biology, 2010, 17(7): 675-676.
[8] 李思元, 莊以光. “DNA 定序技術之演進與發展”. Journal of Biomedical & Laboratory Sciences, 2010, 22(2): 49-58.
[9] Wikipedia-Indel
https://en.wikipedia.org/wiki/Indel
[10] 宋亞珍, 南紅梅, 劉楓, 等. “同源性、一致性和相似性的辨析”. 中國科技術語, 2011, 13(2): 48-50.
[11] Väli Ü, Brandström M, Johansson M, et al. “Insertion-deletion polymorphisms (indels) as genetic markers in natural populations”. BMC genetics, 2008, 9(1): 8.
[12] Wikipedia- Edit distance
https://en.wikipedia.org/wiki/Edit_distance
[13] Masek W J, Paterson M S. “A faster algorithm computing string edit distances”. Journal of Computer and System sciences, 1980, 20(1): 18-31.
[14] Levenshtein V I. “Binary codes capable of correcting deletions, insertions, and reversals” Soviet physics doklady. 1966, 10(8): 707-710.
[15] Wikipedia- Gap penalty
https://en.wikipedia.org/wiki/Gap_penalty
[16] ompa M. “Lecture notes on biological sequence analysis”. University of Washington, Seattle, Technical report, 2000, 25-26.
[17] Hodgman C, French A, Westhead D. “Instant Notes in Bioinformatics.” Garland Science. 2009, 143–144.
[18] Gibbs A J, McIntyre G A. “The diagram, a method for comparing sequences”. The FEBS Journal, 1970, 16(1): 1-11.
[19] Pearson W R, Miller W. “Dynamic programming algorithms for biological sequence comparison”. Methods in enzymology, 1992, 210: 575-601.
[20] Needleman S B, Wunsch C D. “A general method applicable to the search for similarities in the amino acid sequence of two proteins”. Journal of molecular biology, 1970, 48(3): 443-453.
[21] Smith T F, Waterman M S. “Comparison of biosequences”. Advances in applied mathematics, 1981, 2(4): 482-489.
[22] Digilient Inc., “Nexys4 DDR™ FPGA Board Reference Manual”, April 11, 2016 : 9.
[23] Ciletti, Michael D. “Advanced digital design with the Verilog HDL”. Vol. 1. Upper Saddle River: Prentice Hall, 2003.
[24] 張廣淵. “儲存子系統”. 計算機組裝與維護教程. 新華書店, 2004.
[25] 俞莉瓊, 付宇卓. “有限狀態機的 Verilog 設計與研究”. 微電子學與電腦, 2004, 21(11): 146-148.
[26] Golson S. “State machine design techniques for Verilog and VHDL”. Synopsys Journal of High-Level Design, 1994, 9(1-48): 12.
[27] Kung H T. “Why systolic architectures?”. IEEE computer, 1982, 15(1): 37-46.
[28] 孫慶平. “現場可程式設計邏輯閘陣列(FPGA)技術及其應用”. 數位通信, 1996, 1.
校內:2021-08-30公開