| 研究生: |
盧宥霖 Lu, You-Lin |
|---|---|
| 論文名稱: |
EAGLE-GPU:使用圖形處理單元加速計算基於 DNA 測序數據之推定基因組變異的統計分析 EAGLE-GPU: Using Graphics Processing Units to accelerate computation of the statistical support of putative genome variants based on DNA sequencing data |
| 指導教授: |
賀保羅
Paul Horton |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 英文 |
| 論文頁數: | 33 |
| 中文關鍵詞: | 基因組 、變異位點偵測 、GPU 平行運算 |
| 外文關鍵詞: | genomics, variant calling, GPU acceleration |
| 相關次數: | 點閱:137 下載:56 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
變異位點偵測在分析基因組序列數據中扮演相當重要的角色。在相關領域中已有各項研究陸續被提出,為這一特定任務提供可行的解決方案。本研究的前身 EAGLE:顯式替代基因組可能性評估器,旨在通過計算各個被偵測到潛在變異位點的可能性,以統計學方法來進一步提高結果的精確度。根據當時的研究數據,這種技術確實有效提高了準確性,相當適合用來進一步分析、處理各大變異位點偵測工具所產生的結果。然而,由於其演算法在計算上的時間複雜度,執行該程式所需的時間會隨著定序讀取長度的增加而大幅增長。此外,高時間複雜度的特性也讓我們在基於原始程式嘗試構建更高層級的應用程式時,遇到執行效率上的問題。由於我們認為該演算法是可以高度平行處理的,在這項研究中,我們想調查簡稱 GPU 的圖形處理單元,是否可以有效加速該過程。在這裡,我們提出了 EAGLE-GPU,通過將基本計算函式重新實作為 GPU CUDA 內核,透過平行處理底層機率計算的方式進行加速。為了未來的可擴充性,我們目前實作的 GPU 平行程式僅包含最底層的計算部分。依據實驗結果,目前的實作可能不適合對應較短的定序資料,若以次世代定序資料為主要研究對象,需要進一步設計更加專一的 CUDA 內核提高平行性。然而,當序列長度增加時,實驗數據顯示利用 GPU 做加速能大幅地降低所需的執行時間,對於序列長度超過一萬的第三代定序而言,無疑是一大幫助。此外,由於直接在基礎計算上減少了過多的時間消耗,我們提出的這份研究也可以對日後建立更高層級的複雜應用程序帶來幫助。綜合來說,根據這次的研究結果,我們建議在遇到高度並行的數據集時適當的應用 GPU 的輔助以加速程式的執行,並且在遭遇未來的複雜問題時,也可以根據我們的結果針對該問題的特性設計 CUDA 內核,快速有效的產生解決的方案。
Variant calling plays an important role in analyzing genome sequence data. A
variety of studies have been conducted in the related field, proposing solutions to
this specific task. The predecessor of this study, EAGLE: Explicit Alternative
Genome Likelihood Evaluator, aimed to further improve the precision of the
results of those, by assessing the likelihood of the called variants. It was shown
that such technique enhance accuracy. However, because of the computational
complexity of its algorithm, the amount of time required to execute intensifies
as the length of sequenced reads grow. Also the running time may act as an
obstacle when building more advanced applications on top of EAGLE. In this
research, investigated if graphics processing units, or GPUs, could accelerate
the process; since the computation is highly parallel. Here, we propose EAGLEGPU, a revision of the original method rewriting the fundamental computing
functions as GPU CUDA kernels. Our experimental outcomes demonstrate
that although the current usage of GPU parallelism might not be suitable for
sequencing data with shorter reads, it is beneficial when the read lengths grow,
opening up opportunities for the third generation sequencing. In addition, our
work enables more complicated applications to be establish on top of that, enabled by the reduced time for the computationally intense operations. In conclusion, we recommend applying GPU acceleration when encountering a highly parallel computation to be applied to a dataset with long reads, and that more specialized CUDA kernels could be designed based on our results for future
complex problems.
[1] Sam Behjati and Patrick S Tarpey. “What is next generation sequencing?” Archives of Disease in ChildhoodEducation and Practice 98.6 (2013), pp. 236–238.
[2] Stephan C Schuster. “Next-generation sequencing transforms today’s biology”. Nature methods 5.1 (2008), pp. 16–18.
[3] Dimitrios H Roukos. “Next-generation sequencing and epigenome technologies: potential medical applications”. Expert Review of Medical Devices 7.6 (2010), pp. 723–726.
[4] Lisa D Moore, Thuc Le, and Guoping Fan. “DNA methylation and its basic function”. Neuropsychopharmacology 38.1 (2013), pp. 23–38.
[5] Vandana Shashi et al. “The utility of the traditional medical genetics diagnostic evaluation in the context of next-generation sequencing for undiagnosed genetic disorders”. Genetics in Medicine 16.2 (2014), pp. 176–182.
[6] Peter D Stenson et al. “The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies”. Human genetics 136.6 (2017), pp. 665–677.
[7] Simona Serratı̀et al. “Next-generation sequencing: advances and applications in cancer diagnosis”. OncoTargets and therapy 9 (2016), p. 7355.
[8] Dale Muzzey, Eric A Evans, and Caroline Lieber. “Understanding the basics of NGS: from mechanism to variant calling”. Current genetic medicine reports 3.4 (2015), pp. 158–165.
[9] Aaron McKenna et al. “The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data”. Genome research 20.9 (2010), pp. 1297–1303.
[10] Heng Li et al. “The sequence alignment/map format and SAMtools”. Bioinformatics 25.16 (2009), pp. 2078–2079.
[11] Jason O’Rawe et al. “Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing”. Genome medicine 5.3 (2013), pp. 1–18.
[12] Sohyun Hwang et al. “Systematic comparison of variant calling pipelines using gold standard personal exome variants”. Scientific reports 5.1 (2015), pp. 1–8.
[13] Xiaoqing Yu and Shuying Sun. “Comparing a few SNP calling algorithms using low-coverage sequencing data”. BMC bioinformatics 14.1 (2013), pp. 1–15.
[14] Tony Kuo et al. “EAGLE: explicit alternative genome likelihood evaluator”. BMC medical genomics 11.2 (2018), pp. 1–10.
[15] James D Foley et al. Introduction to computer graphics. Vol. 55. AddisonWesley Reading, 1994.
[16] John D Owens et al. “GPU computing”. Proceedings of the IEEE 96.5 (2008), pp. 879–899.
[17] Cristobal A Navarro, Nancy Hitschfeld-Kahler, and Luis Mateu. “A survey on parallel computing and its applications in data parallel problems using GPU architectures”. Communications in Computational Physics 15.2 (2014), pp. 285–329.
[18] John Nickolls and William J Dally. “The GPU computing era”. IEEE micro 30.2
(2010), pp. 56–69.
[19] Subtil N Pantaleoni J. NVBIO. https://nvlabs.github.io/nvbio/. 2015.
[20] Ben Langmead and Steven L Salzberg. “Fast gapped read alignment with Bowtie 2”. Nature methods 9.4 (2012), pp. 357–359.
[21] Ling Sing Yung et al. “GBOOST: a GPU based tool for detecting gene–gene interactions in genome–wide case control studies”. Bioinformatics 27.9 (2011), pp. 1309–1310.
[22] Stephen F Altschul et al. “Basic local alignment search tool”. Journal of molecular biology 215.3 (1990), pp. 403–410.
[23] Panagiotis D Vouzis and Nikolaos V Sahinidis. “GPUBLAST: using graphics processors to accelerate protein sequence alignment”. Bioinformatics 27.2 (2011), pp. 182–188.
[24] Nauman Ahmed et al. “GASAL2: a GPU accelerated sequence alignment library for high-throughput NGS data”. BMC bioinformatics 20.1 (2019), pp. 1–20.