簡易檢索 / 詳目顯示

研究生: 蔡彥隆
Tsai, Yen-Lung
論文名稱: 一種基於記憶體內計算的過濾機制,用於加速隱藏式馬可夫模型蛋白質序列比對
A PIM-Based Filtering Mechanism for Accelerating HMM-Based Protein Sequence Alignment
指導教授: 何建忠
Ho, Chien-Chung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2026
畢業學年度: 114
語文別: 英文
論文頁數: 36
中文關鍵詞: 記憶體內計算隱藏式馬可夫模型蛋白質序列比對
外文關鍵詞: Processing-in-Memory, Hidden Markov Models (HMM), Protein Sequence Alignment
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 蛋白質序列比對是現代生物資訊學中的核心工作負載。然而,在掃描大型蛋白質資料庫時,其吞吐量日益受到「記憶體牆(memory wall)」的限制。在以 HMM 為基礎的流程(如 HMMER)中,MSV(Multiple Segment Viterbi)filter 是第一階段的核心運算,負責對每一條序列進行評分。由於需要頻繁存取模型參數與動態規劃(dynamic programming, DP)分數向量,其運算特性主要受到記憶體頻寬限制,因此屬於記憶體受限(memory-bound)的工作負載。
    本文提出 GLAD PIM MSVfilter,一種結合貪婪式負載平衡與資料區域性感知(data locality-aware)的記憶體內運算(Processing-in-Memory, PIM)設計,將 MSV 過濾運算卸載至實際的近記憶體處理平台上。GLAD PIM MSVfilter 結合了 Sequence Greedy Distribution Strategy,依據序列長度進行負載平衡分配至各個 DPU,同時維持每條序列的資料區域性;以及 MiW-SyncFree filter,其將 HMM profile 常駐於 WRAM 中,每條序列由單一 tasklet 負責執行、避免細粒度同步,並透過迴圈展開(loop unrolling)提升純量運算效率。
    在一個包含 2,042 個 DPU 的系統上,GLAD PIM MSVfilter 相較於單核心 CPU 基準最高可達 3.8× 的加速,同時將 host–DPU 之間的資料傳輸時間控制在總執行時間的1% 以下。對於小型 HMM profile,基於 DPU 的設計可達到甚至超越 8 核心 CPU 的效能;而對於大型 profile,透過 CPU–DPU 混合式工作負載分配可進一步降低整體執行時間。實驗結果顯示,PIM 是加速 HMM 為基礎蛋白質序列比對中記憶體受限過濾階段的一種有效平台。

    Protein sequence alignment is a core workload in modern bioinformatics, yet its throughput is increasingly limited by the memory wall when scanning large protein databases. In HMMbased pipelines such as HMMER, the MSV (Multiple Segment Viterbi) filter is the firststage kernel that scores every sequence and is predominantly memory-bound due to frequent accesses to model parameters and dynamic-programming (DP) score vectors.
    This paper presents GLAD PIM MSVfilter, a greedy load-balanced and data-locality-aware Processing-in-Memory (PIM) design that offloads MSV filtering onto a real processing-nearmemory platform. GLAD PIM MSVfilter combines a Sequence Greedy Distribution Strategy, which performs length-aware, load-balanced mapping of sequences to DPUs while preserving per-sequence locality, with a MiW-SyncFree filter, which keeps the HMM profile in WRAM, assigns each sequence to a single tasklet without fine-grained synchronization, and uses loop unrolling to improve scalar efficiency. On a 2,042-DPU system, GLAD PIM MSVfilter achieves up to a 3.8× speedup over a single-core CPU baseline while keeping host–DPU data movement below 1% of total runtime. For small HMM profiles, the DPUbased design matches or surpasses an 8-core CPU; for large profiles, a hybrid CPU–DPU workload partition further reduces runtime. These results show that PIM is an effective platform for accelerating the memory-bound filtering stage of HMM-based protein alignment.

    摘要 i Abstract ii Acknowledgements iii Table of Contents iv List of Tables v List of Figures vi Chapter 1. Introduction 1 Chapter 2. Background 4 2.1. HMM-based Alignment and MSV Filter 4 2.1.1. Profile HMMs for Protein Sequence Alignment 4 2.1.2. Multi-stage Filtering and the MSV Filter in HMMER 6 2.1.3. Performance Bottlenecks of the MSV Filter 8 2.2. Processing-in-Memory (PIM) System Basics 8 2.2.1. System Architecture 8 2.2.2. Constraints and Limitations in PIM Systems 9 Chapter 3. Motivation 10 Chapter 4. Greedy Load-Balanced and Data-Locality-Aware PIM MSV Filter Design 13 4.1. Design Overview 13 4.2. Sequence Greedy Distribution Strategy 14 4.3. MiW-SyncFree Filter 16 Chapter 5. Evaluation 19 5.1. Experimental Setups 19 5.2. Experimental Results 22 Chapter 6. Conclusion 26 References 27

    [1] Uniprotkb database. https://www.uniprot.org, 2025. Accessed: 2024-2025.
    [2] Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–410, 1990.
    [3] Liang-Chi Chen, Chien-Chung Ho, and Yuan-Hao Chang. Uppipe: A novel pipeline management on in-memory processors for rna-seq quantification. In 2023 60th ACM/IEEE Design Automation Conference (DAC), pages 1–6, 2023.
    [4] Steven Derrien. Hardware acceleration of hmmer on fpgas. Journal of Signal Processing Systems, 58:53–67, 2010.
    [5] Safaa Diab, Amir Nassereldine, Mohammed Alser, Juan Gómez Luna, Onur Mutlu, and Izzat El Hajj. A framework for high-throughput sequence alignment using real processing-in-memory systems. Bioinformatics, 39(5):btad155, 03 2023.
    [6] Sean R. Eddy. Accelerated profile hmm searches. PLOS Computational Biology, 7(10):1–16, 10 2011.
    [7] Sean R. Eddy. HMMER3.4. https://github.com/EddyRivasLab/hmmer, 2025. Accessed: 2024-2025.
    [8] Juan Gómez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu. Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system. IEEE Access, 10:52565–52608, 2022.
    [9] Intel. Intel Advisor. https://www.intel.com/content/www/us/en/developer/tools/oneapi/advisor.html, 2025. Accessed: 2024-2025.
    [10] Intel. Intel Developer Cloud. https://www.intel.com/content/www/us/en/developer/tools/devcloud/overview.html, 2025. Accessed: 2024-2025.
    [11] Dominique Lavenier, Remy Cimadomo, and Romaric Jodin. Variant calling parallelization on processor-in-memory architecture. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 204–207, 2020.
    [12] Dominique Lavenier, Charles Deltel, David Furodet, and Jean-François Roy. BLAST on UPMEM. Research Report RR-8878, INRIA Rennes - Bretagne Atlantique, March 2016.
    [13] Dongjae Lee, Bongjoon Hyun, Taehun Kim, and Minsoo Rhu. Analysis of data transfer bottlenecks in commercial pim systems: A study with upmem-pim. IEEE Computer Architecture Letters, 2024.
    [14] Dongjae Lee, Bongjoon Hyun, Taehun Kim, and Minsoo Rhu. Pim-mmu: A memory management unit for accelerating data transfers in commercial pim systems. In Proceedings of the 2024 57th IEEE/ACM International Symposium on Microarchitecture, MICRO ’24, page 627–642. IEEE Press, 2024.
    [15] I Lee, Bao-Kai Wang, Liang-Chi Chen, Wen Sheng Lim, Da-Wei Chang, Yu-Ming Chang, Chieng-Chung Ho, et al. Pim or cxl-pim? understanding architectural tradeoffs through large-scale benchmarking. arXiv preprint arXiv:2511.14400, 2025.
    [16] Fabian Sievers, Andreas Wilm, David Dineen, Toby J Gibson, Kevin Karplus, Weizhong Li, Rodrigo Lopez, Hamish McWilliam, Michael Remmert, Johannes Söding, Julie D Thompson, and Desmond G Higgins. Fast, scalable generation of highquality protein multiple sequence alignments using clustal omega. Molecular Systems Biology, 7(1):539, 2011.

    QR CODE