| 研究生: |
周育晨 Chou, Yu-Chen |
|---|---|
| 論文名稱: |
利用GPU去加速DNA序列Profile比對中的Pair Hidden Markov Model Forward Algorithm A GPU-Based Approach to accelerate the Pair Hidden Markov Model Forward Algorithm for DNA Sequence Profile Alignment |
| 指導教授: |
賀保羅
Horton, Paul |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 英文 |
| 論文頁數: | 48 |
| 中文關鍵詞: | GPU加速 、基因序列 、成對隱性馬可夫模型向前演算法 |
| 外文關鍵詞: | GPU acceleration, DNA sequence, pair HMM forward algorithm |
| 相關次數: | 點閱:49 下載:9 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
DNA序列比對是生物訊息學中一項相當關鍵的任務,對於基因識別、進化研究及個體化治療更是至關重要,而Pair Hidden Markov Model (HMM) Forward Algorithm是一種常見且準確的序列比對的方法,能考慮序列之間的插入、刪除和匹配。然而,它的計算量相當地龐大,尤其是在大規模基因組的情況下更為明顯。為了解決計算量龐大的問題,我們提出了一種基於GPU的方法來加速DNA序列之Profile比對的Pair HMM Forward Algorithm,通過利用現代GPU的平行運算能力,我們目標為優化該演算法的性能和效率。我們的方法引入了動態區塊配置、反對角線平行運算與記憶體存儲,確保了GPU資源的高效利用和計算速度的提升。這些優化解決了傳統CPU版本中的瓶頸,如低效的計算和不理想的負載平衡。實驗結果表明,與基於CPU的方法相比,我們的方法在大規模DNA序列上表現出顯著的性能提升,使其非常適合大規模基因組序列比對任務。這一進步為處理不斷增長的基因組數據量提供了一個可擴展且高效的可考慮之解決方案。
DNA sequence alignment is a critical task in bioinformatics, essential for gene identification, evolutionary studies, and personalized medicine. The Pair Hidden Markov Model (HMM) Forward Algorithm is a common and accurate method for aligning sequences, accounting for insertions, deletions, and matches between sequences. However, its computational demands are substantial, especially with large-scale genomic data. In order to solve the main issue of substantial computations, we present a GPU-based approach to accelerate the Pair-HMM Forward Algorithm for DNA sequence profile alignment in this study. By utilizing the parallel computational capabilities of modern GPUs, we aim to optimize the performance and efficiency of this algorithm. Our method introduces dynamic block configuration, anti-diagonal parallel and anti-diagonal memory storage, ensuring efficient utilization of GPU resources and improved computational speed. These optimizations address the bottleneck of traditional CPU-based implementations, such as inefficient computation and suboptimal load balancing. Experimental results demonstrate significant performance gains over CPU-based methods in large-scale DNA sequence, making GPU-based approach highly suitable for large-scale genomic sequence alignment tasks. This advancement provides a considerable solution with scalability and efficiency to handle the growing size of genomic data in bioinformatics.
[1] Jay Shendure and Hanlee Ji. “Next-generation DNA sequencing”. Nature Biotechnology 16.10 (2008), pp. 1135–1145.
[2] Heena Satam et al. “Next-Generation Sequencing Technology: Current Trends and Advancements”. Biology (2023), pp. 1–25.
[3] Dahui Qin. “Next-generation sequencing and its clinical application”. Cancer BiolMed 16.1 (2019), pp. 5–9.
[4] Dimitrios H Roukos. “Next-generation sequencing and epigenome technologies: potential medical applications”. Expert Review of Medical Devices (2014), pp. 723 726.
[5] Byung-Jun Yoon. “Hidden Markov Models and their Applications in Biological Sequence Analysis”. Current Genomics (2009), pp. 402–415.
[6] Bjarne Knudsen and Michael M. Miyamoto. “Sequence Alignments and Pair Hidden Markov Models Using Evolutionary History”. Bioinformatics (2003), pp. 453–460.
[7] Khar Heng Choo, Joo Chuan Tong, and Louxin Zhang. “Recent Applications of Hidden Markov Models in Computational Biology”. Current Genomics(2004), pp. 85–96.
[8] Enliang Li et al. “Improved GPU Implementations of the Pair-HMM Forward Algorithm for DNA Sequence Alignment”. IEEE Transactions on Parallel and Distributed Systems (2021), pp. 299–306.
[9] Shanshan Ren, Koen Bertel, and Zaid Al-Ars. “Exploration of Alternative GPU Implementations of the Pair-HMMs Forward Algorithm”. IEEE Transactions on Biomedical Circuits and Systems (2016), pp. 902–909.
[10] Shanshan Ren, Koen Bertel, and Zaid Al-Ars. “Exploration of GPU acceleration for Pair-HMM algorithm and its application in the DNA alignment problem”. Journal of Parallel and Distributed Computing (2019), pp. 1–15.
[11] Xuhua Xia. Bioinformatics and the Cell: Modern Computational Approaches in Genomics, Proteomics and Transcriptomics, Second Edition. Springer, 2018.
[12] Dimitrios P Lyras and Dirk Metzler. “ReformAlign: improved multiple sequence alignments using a profile-based meta-alignment approach”. BMC Bioinformatics 15.265 (2014), pp. 1–18.
[13] Claudine Médigue et al. “Detecting and Analyzing DNA Sequencing Errors: Toward a Higher Quality of the Bacillus subtilis Genome Sequence”. Genome Research (1999), pp. 1117–1127.
[14] Nvidia Corp. CUDA C++ Best Practices Guide. Nvidia, 2024.
[15] Shanshan Ren, Koen Bertels, and Zaid Al-Ars. “Efficient Acceleration of the PairHMMs Forward Algorithm for GATK HaplotypeCaller on Graphics Processing Units”. Evolutionary Bioinformatics 14 (2018), pp. 1–12.
[16] Subho S. Banerjee et al. “On Accelerating Pair-HMM Computations in Programmable Hardware”. Journal of Computational Biology (2017), pp. 1–8.
[17] Sitao Huang et al. “Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling”. STARnet (2017), pp. 275–284.
[18] Emad Alamoudi et al. “DNA Profiling Methods and Tools: A Review”. Journal of Computational Science (2018), pp. 216–231.
[19] Sean R. Eddy. “Profile hidden Markov models”. Bioinformatics 14.9 (1998), pp. 755–763.