簡易檢索 / 詳目顯示

研究生: 楊宜勳
Yang, Yi-Hsun
論文名稱: 基因序列比對之演算法設計與硬體實作
The Algorithm Design and Hardware Operation of Gene Sequence Alignment
指導教授: 黃吉川
Hwang, Chi-Chuan
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 49
中文關鍵詞: 序列比對FPGASmith-Waterman演算法心跳式陣列
外文關鍵詞: sequence alignment, FPGA, Smith-Waterman, systolic array
相關次數: 點閱:151下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • DNA分子經由DNA定序儀可讀出一串由A、T、C、G組成的長字串,此字串即為DNA序列,為了瞭解某些未知的DNA序列所帶有的訊息,常見的方法是從已知功能的序列資料庫中取出序列與該DNA序列進行比較,計算其相似度,這種相似度計算被稱為序列比對。透過序列比對可以知道兩條序列之間的相似度,並從已知功能的序列中推測未知功能的序列可能具備的特性。
    Smith-Waterman演算法屬於一種動態規劃演算法,是生物資訊學研究中雙序列比對的核心算法之一,隨著DNA序列定序所需時間不斷縮短,越來越多的生物序列資料被建構成資料庫,序列比對研究對計算能力的需求也不斷增長,因此加快計算速度是序列比對研究重要課題。
    本文利用vivado設計套件將Smith-Waterman演算法實作於FPGA上,並以UART傳輸介面與主機溝通,透過UART傳輸介面將欲比對的序列傳輸到FPGA序列比對系統內,再由系統中的控制器將兩條序列輸入到Smith-Waterman的模塊內進行序列比對,並搭配心跳式陣列的平行運算功能,提高演算法運算的速度,最後得到最大相似度分數。

    DNA sequence is a string which composed of A、T、C、G, in order to understand some of the unknown DNA sequence with the message, common method is to compare the sequence which extracted from the database of the known function to the DNA sequence and calculate the similarity, called sequence alignment. The similarity between the two sequences can be known by sequence alignment and the characteristics of the sequence of unknown functions can be inferred from the sequence of known functions. In the thesis, we implement the Smith-Waterman algorithm on FPGA by vivado design suite, and communicate with the computer by UART, we transmit the sequence to the system in FPGA, then input two sequences to the Smith-Waterman module for alignment by controller in system, with the systolic array to compute in parallel, it can improve the speed of the algorithm operation, and finally get the maximum similarity score.

    摘要 I Abstract II 誌謝 X 目錄 XI 表目錄 XIV 圖目錄 XV 第一章 緒論 1 1.1遺傳物質-DNA 1 1.2研究背景 3 1.3 研究動機與目的 6 1.4本文架構 7 第二章 序列比對相關探討 9 2.1序列比對概念 9 2.2 序列間的編輯距離 10 2.3相似度計算 11 2.4雙序列比對算法 14 2.4.1點陣法 14 2.4.2動態規劃法 15 2.4.2.1 全域性比對:Needleman-Wunsch演算法 16 2.4.2.2區域性比對:Smith-Waterman演算法 18 第三章 硬體實作架構 21 3.1 Vivado設計套件 21 3.2 UART模塊 23 3.3 S序列記憶體 25 3.4 參考序列記憶體 25 3.5 SW控制器 27 3.6 SW陣列 29 3.6.1 心跳式陣列[27] 30 3.6.2 處理單元 31 3.6.3 SW陣列架構 32 3.7 最大相似度模塊 33 第四章 FPGA驗證結果 36 4.1 FPGA實作合成結果 37 4.2 系統驗證 39 4.3 運行時間分析 41 第五章 結論與未來展望 44 5.1 結論 44 5.2 未來展望 44 參考文獻 46

    參考文獻
    [1] Pennisi E. “The Birth of the Nucleus”. Science, 2004, 305(5685): 766-768.
    [2] Wikipedia-Gene
    https://en.wikipedia.org/wiki/Gene
    [3] Downie A W. “Pneumococcal Transformation-A Backward View Fourth Griffith Memorial Lecture”. Microbiology, 1972, 73(1): 1-11.
    [4] Watson J D, Crick F H C. “Molecular structure of nucleic acids”. Nature, 1953, 171(4356): 737-738.
    [5] Sanger F, Nicklen S, Coulson A R. “DNA sequencing with chain-terminating inhibitors”. Proceedings of the national academy of sciences, 1977, 74(12): 5463-5467.
    [6] Mardis E R. “The impact of next-generation sequencing technology on genetics.” Trends in genetics, 2008, 24(3): 133-141.
    [7] McCarthy A. “Third generation DNA sequencing: pacific biosciences' single molecule real time technology”. Chemistry & biology, 2010, 17(7): 675-676.
    [8] 李思元, 莊以光. “DNA 定序技術之演進與發展”. Journal of Biomedical & Laboratory Sciences, 2010, 22(2): 49-58.

    [9] Wikipedia-Indel
    https://en.wikipedia.org/wiki/Indel
    [10] 宋亞珍, 南紅梅, 劉楓, 等. “同源性、一致性和相似性的辨析”. 中國科技術語, 2011, 13(2): 48-50.
    [11] Väli Ü, Brandström M, Johansson M, et al. “Insertion-deletion polymorphisms (indels) as genetic markers in natural populations”. BMC genetics, 2008, 9(1): 8.
    [12] Wikipedia- Edit distance
    https://en.wikipedia.org/wiki/Edit_distance
    [13] Masek W J, Paterson M S. “A faster algorithm computing string edit distances”. Journal of Computer and System sciences, 1980, 20(1): 18-31.
    [14] Levenshtein V I. “Binary codes capable of correcting deletions, insertions, and reversals” Soviet physics doklady. 1966, 10(8): 707-710.
    [15] Wikipedia- Gap penalty
    https://en.wikipedia.org/wiki/Gap_penalty
    [16] ompa M. “Lecture notes on biological sequence analysis”. University of Washington, Seattle, Technical report, 2000, 25-26.
    [17] Hodgman C, French A, Westhead D. “Instant Notes in Bioinformatics.” Garland Science. 2009, 143–144.
    [18] Gibbs A J, McIntyre G A. “The diagram, a method for comparing sequences”. The FEBS Journal, 1970, 16(1): 1-11.
    [19] Pearson W R, Miller W. “Dynamic programming algorithms for biological sequence comparison”. Methods in enzymology, 1992, 210: 575-601.
    [20] Needleman S B, Wunsch C D. “A general method applicable to the search for similarities in the amino acid sequence of two proteins”. Journal of molecular biology, 1970, 48(3): 443-453.
    [21] Smith T F, Waterman M S. “Comparison of biosequences”. Advances in applied mathematics, 1981, 2(4): 482-489.
    [22] Digilient Inc., “Nexys4 DDR™ FPGA Board Reference Manual”, April 11, 2016 : 9.
    [23] Ciletti, Michael D. “Advanced digital design with the Verilog HDL”. Vol. 1. Upper Saddle River: Prentice Hall, 2003.
    [24] 張廣淵. “儲存子系統”. 計算機組裝與維護教程. 新華書店, 2004.
    [25] 俞莉瓊, 付宇卓. “有限狀態機的 Verilog 設計與研究”. 微電子學與電腦, 2004, 21(11): 146-148.
    [26] Golson S. “State machine design techniques for Verilog and VHDL”. Synopsys Journal of High-Level Design, 1994, 9(1-48): 12.
    [27] Kung H T. “Why systolic architectures?”. IEEE computer, 1982, 15(1): 37-46.
    [28] 孫慶平. “現場可程式設計邏輯閘陣列(FPGA)技術及其應用”. 數位通信, 1996, 1.

    無法下載圖示 校內:2021-08-30公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE