成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王尊緯 Wang, Tsun-Wei
論文名稱：	WebGPU 於高效能生物資訊計算：瀏覽器端 Pair-HMM Forward 演算法之優化 WebGPU for High-Performance Bioinformatics Computing: Browser-Based Optimization of the Pair-HMM Forward Algorithm
指導教授：	賀保羅 Horton, Paul
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2025
畢業學年度：	113
語文別：	英文
論文頁數：	58
中文關鍵詞：	瀏覽器 GPU 計算、成對隱馬可夫模型、生物資訊加速
外文關鍵詞：	WebGPU, Pair-Hidden Markov Model, Bioinformatics Acceleration
相關次數：	點閱：82 下載：5
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著 GPU 加速在生物資訊領域日益普及 (Banerjee 2017, Liu 2021)，傳統 CUDA/OpenCL 工作流程仍需安裝廠商驅動程式，且受限於特定硬體，難以支援線上教學與前端臨床分析。2024 年標準化的 WebGPU 以單一 JavaScript API 統合 Vulkan、Direct3D 12 與 Metal (W3C 2024)，具備免安裝、跨硬體可攜性與本機資料駐留三大優勢。本研究以運算密集的 Pair-Hidden Markov Model Forward 演算法 (Durbin 1998) 為案例，評估該新框架的效能與可行性。

我們以周育晨公開的 C++/CUDA 原始碼為基準 (Chou 2024)，首先實作 WebGPU 基線版本；隨後針對 CPU 與 GPU 之間頻繁往返，以及 BindGroup 重建耗時等瓶頸，依序導入單一 CommandBuffer 批次提交與 Dynamic Uniform Offsets，形成優化版本 WebGPU-Optimized。

在 NVIDIA RTX 2070 Super、Apple M1 與 Intel UHD 620 三款裝置上，對序列長度從 10^2 到 10^5 的測試顯示：優化版本在最佳情況下可獲逾 100 倍加速，並達到 CUDA 效能的 84% 以上；三款裝置的相對 log-likelihood 誤差皆低於 10^-5。即便無 NVIDIA GPU，本方法仍較單執行緒 C++ 提供一至二個數量級的加速效果。

研究結果證實，僅憑 JavaScript 與 WGSL，即能在瀏覽器中於秒級完成 Pair-HMM Forward 計算。我們提出兩項瀏覽器端專屬優化策略，並提供跨硬體的詳細效能量測，為 Web 原生基因體分析工具奠定基礎，推動生物資訊運算的民主化與即時化。

As GPU acceleration gains traction in bioinformatics (Banerjee 2017, Liu 2021), conventional CUDA and OpenCL workflows still require vendor-specific driver installation and remain tied to particular hardware, limiting online teaching and front-end clinical analysis. Standardized in 2024, WebGPU unifies Vulkan, Direct3D 12 and Metal under a single JavaScript API (W3C 2024), offering three key advantages: no installation, cross-hardware portability and on-device data residency. Using the compute-intensive Pair-Hidden Markov Model Forward algorithm (Pair-HMM Forward) (Durbin 1998) as a case study, we assess the performance and feasibility of this emerging framework.

Building on the open-source C++/CUDA implementation by Chou Yu-Chen (Chou 2024), we first develop a WebGPU baseline. We then mitigate its principal bottlenecks—frequent round-trips between CPU and GPU and costly BindGroup reconstruction—by introducing (i) batched submission of a single CommandBuffer and (ii) Dynamic Uniform Offsets, yielding an optimized variant termed WebGPU-Optimized.

Benchmarks on three devices—NVIDIA RTX 2070 Super, Apple M1 and Intel UHD 620—across sequence lengths from 10^2 to 10^5 show that the optimized version achieves speed-ups exceeding 100× in the best case and reaches over 84 % of CUDA’s throughput, while maintaining relative log-likelihood error below 10^-5 on all devices. Even without an NVIDIA GPU, our WebGPU implementation outperforms single-threaded C++ by one to two orders of magnitude.

These findings demonstrate that pure JavaScript and WGSL can execute the Pair-HMM Forward algorithm within seconds in a web browser. We contribute two browser-specific optimization strategies and provide detailed cross-hardware performance measurements, laying the groundwork for web-native genomic analysis tools and advancing the democratization and real-time execution of bioinformatics workloads.

中文摘要 i
Abstract  iii
誌謝 v
Contents  vi
List of Tables  ix
List of Figures  x
Nomenclature  xi
Introduction 1
1 Background 1
2 Motivation and Objectives   2
3 Methods and Key Results     3
4 Conclusions and Contributions   3
Related Work 4
1 High-Performance Computing Requirements in Bioinformatics 4
1.1 Next-Generation Sequencing and Its Computational Challenges        4
1.2 The Central Role of the Pair-HMM Forward Algorithm 5
2 Conventional GPU Acceleration Frameworks: CUDA and OpenCL 5
2.1 CUDA in Bioinformatics       5
2.2 OpenCL’s Cross-Platform Ambition     5
2.3 Barriers and Limitations of Traditional Frameworks   6
3 The Emergence and Technical Characteristics of WebGPU     6
3.1 Technical Background            6
3.2 High-Performance Computing Potential of WebGPU   7
3.3 Challenges and Limitations of WebGPU         8
4 Initial Explorations of WebGPU in Bioinformatics               9
5 Research Gap and Positioning of This Work             9
Methods 11
1 Synthetic Dataset Generation             11
2 Mathematical Model                12
3 Pair-HMM Forward Algorithm                    12
4 System Design and Implementation                 14
4.1 C++/CUDA Version           14
4.2 WebGPU Baseline              15
4.3 Optimized WebGPU Version        17
5 Summary            19
Results 20
1 Experimental Environment         20
2 Performance Data         21
2.1 RTX 2070 Super: Runtime and Relative Ratios         21
2.2 Apple M1 and Intel UHD 620: Cross-Platform Results 24
3 Correctness Verification—Relative Log-Likelihood Error       26
4 Summary                    27
Discussion 28
1 Performance Analysis                28
1.1 RTX 2070 S: Performance Scaling and CUDA Gap     28
1.2 Cross-GPU Bottlenecks       28
1.3 SFU Throughput Bottleneck       29
1.4 API Overhead: Pointer Rotation vs  BindGroup Reconstruction             29
1.5 Numerical Stability: Log-space versus Scaling Coefficients            30
2 Numerical Stability and Error Analysis        31
3 Lessons from gpuPairHMM             33
4 Cross-Hardware Performance            34
4.1 Apple M1: Pros and Cons of a Unified-Memory Architecture (UMA)            34
4.2 Intel UHD 620: Driver Maturity and Scheduling Strategy 34
Future Work 35
1 Closing the Double-Precision Gap               35
2 Hybrid Acceleration with WASM+SIMD and WebGPU         36
2.1 Short sequences (? < 512)              36
2.2 Long sequences                37
3 Distributed Execution Across Edge and Cloud        37
Conclusion 39
1 Core Contributions               39
2 Academic and Industrial Impact                40
Reference 41
                                    

[1] Apple. Apple M1 Chip — Technical Overview. Apple Developer Documentation. 2020. URL: https : / / developer . apple . com / documentation / apple _ silicon / apple_m1.
[2] Apple. Apple M2 Max Chip — Technical Overview. Apple Developer Documentation. 2023. URL: https://developer.apple.com/documentation/apple_silicon/apple_m2_max.
[3] S. S. Banerjee et al. “Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling”. Proc. 27th FPL. 2017, pp. 165–172. DOI: 10.23919/FPL.2017.8056826.
[4] Y.-C. Chou. Pair-HMM Forward: Reference & GPU-Accelerated Implementations – A GPU-Based Approach to Accelerate the Pair Hidden Markov Model Forward Algorithm for DNA Sequence Profile Alignment. [GitHub repository]. 2024. URL: https://github.com/yuchen0620/ChouYuchen-master-thesis.
[5] R. Durbin et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998.
[6] P. Ferragina and G. Manzini. “Opportunistic Data Structures with Applications”. Proc. 41st IEEE FOCS. 2000, pp. 390–398. DOI: 10.1109/SFCS.2000.892127.
[7] P. Ghosh et al. “Web3DMol: Interactive Protein Structure Visualization Based on WebGL”. Bioinformatics 34.13 (2018), pp. 2275–2277. DOI: 10.1093/bioinformatics/bty534.
[8] Google Chrome Developers. WebGPU Now Available in Chrome. Chromium Blog. 2024. URL: https://blog.chromium.org/2024/05/webgpu-now-available.html.
[9] Google Chrome Team. Chrome’s 2024 Recap for Devs: Re-imagining the Web with AI. Chrome for Developers Blog. 2024. URL: https://developer.chrome.com/blog/chrome-2024-recap.
[10] Illumina. NovaSeq X Series Reagent Kits — Specifications. 2024. URL: https : / / www . illumina . com / systems / sequencing - platforms / novaseq - x - plus / specifications.html.
[11] Intel. Intel UHD Graphics 620 — Product Specifications. Intel ARK. 2018. URL: https://ark.intel.com/content/www/us/en/ark/products/126789.
[12] B. Jones. Toji.dev Blog Series: WebGPU Best Practices. 2023. URL: https://toji.dev/webgpu-best-practices/.
[13] A. Klöckner et al. “PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation”. Parallel Computing 38.3 (2012), pp. 157–174. DOI: 10.1016/j.parco.2011.09.001.
[14] K. Krampis, T. Booth, B. Chapman, et al. “Cloud BioLinux: Pre-configured and On-Demand Bioinformatics Computing for the Genomics Community”. BMC Bioinformatics 13 (2012), p. 42. DOI: 10.1186/1471-2105-13-42.
[15] B. Langmead et al. “Ultrafast and Memory-Efficient Alignment of Short DNA Sequences to the Human Genome”. Genome Biology 10.3 (2009), R25. DOI: 10.1186/gb-2009-10-3-r25.
[16] H. Li et al. “The Sequence Alignment/Map Format and SAMtools”. Bioinformatics 25.16 (2009), pp. 2078–2079. DOI: 10.1093/bioinformatics/btp352.
[17] H. Li and R. Durbin. “Fast and Accurate Long-Read Alignment with Burrows–Wheeler Transform”. Bioinformatics 26.5 (2010), pp. 589–595. DOI: 10.1093/bioinformatics/btq698.
[18] Y. Liu, S. Schrinner, et al. “GPU Acceleration in Genomics: A Comprehensive Survey”. Briefings in Bioinformatics 22.5 (2021), bbab042. DOI: 10.1093/bib/bbab042.
[19] Y. Liu, A. Wirawan, and B. Schmidt. “CUDASW++ 3.0: Accelerating Smith–Waterman Protein Database Search by Coupling CPU and GPU SIMD Instructions”. BMC Bioinformatics 14 (2013), p. 117. DOI: 10.1186/1471-2105-14-117.
[20] E. R. Mardis. “DNA Sequencing Technologies: 2006–2016”. Nature Protocols 12.2 (2017), pp. 213–218. DOI: 10.1038/nprot.2016.182.
[21] A. McKenna et al. “The Genome Analysis Toolkit: A MapReduce Framework for Analyzing Next-Generation DNA Sequencing Data”. Genome Research 20.9 (2010), pp. 1297–1303. DOI: 10.1101/gr.107524.110.
[22] MDN Web Docs. WebAssembly SIMD. 2023. URL: https://developer.mozilla.org/en-US/docs/WebAssembly/SIMD.
[23] MDN Web Docs. WebGPU API. 2025. URL: https://developer.mozilla.org/en-US/docs/Web/API/WebGPU_API.
[24] NVIDIA. GeForce RTX 2070 SUPER Founders Edition Specifications. Product brief. 2019. URL: https://www.nvidia.com/en-us/geforce/graphics-cards/rtx-2070-super/specs.
[25] NVIDIA Corporation. CUDA C++ Programming Guide v12.4. 2023. URL: https://docs.nvidia.com/cuda/cuda-c-programming-guide.
[26] B. Schmidt et al. “gpuPairHMM: High-Speed Pair-HMM Forward Algorithm for DNA Variant Calling on GPUs”. arXiv preprint arXiv:2411.11547 (2024). URL: https : / / arxiv.org/abs/2411.11547.
[27] J. E. Stone, D. Gohara, and G. Shi. “OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems”. Computing in Science & Engineering 12.3 (2010), pp. 66–73. DOI: 10.1109/MCSE.2010.69.
[28] TensorFlow.js Team. WebGPU Backend for TensorFlow.js. 2024. URL: https : / / www.tensorflow.org/js/guide/webgpu.
[29] M. Vasimuddin et al. “Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems”. Proceedings of IEEE IPDPS. 2019, pp. 314–324. DOI: 10.1109/IPDPS.2019.00041.
[30] W3C. WebGPU Specification. W3C Recommendation. 2024. URL: https://www.w3.org/TR/webgpu/.

校外：立即公開

簡易檢索 / 詳目顯示

相關論文