成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	阮祈翰 Ruan, Chi-Han
論文名稱：	使用Subset Seeds與GPU平行化增進Pangenome-based基因分型準確率與效率 Improving Pangenome-based Genotyping Accuracy and Efficiency Using Subset Seeds and GPU Parallelization
指導教授：	賀保羅 Horton, Paul
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2025
畢業學年度：	113
語文別：	英文
論文頁數：	39
中文關鍵詞：	基因分型、子集種子、GPU
外文關鍵詞：	Genotyping, Subset Seeds, GPU
相關次數：	點閱：71 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來，alignment-free（無比對）基因分型逐漸受到基因體學社群的關注，並促成了PanGenie 等工具的誕生。PanGenie 透過使用連續 k-mer 展現了優異的基因分型準確度。
在此基礎上，Häntze 與 Horton 提出了 MaskedPanGenie，藉由導入 spaced seed（間隔種子）來提升靈敏度。比較分析顯示，MaskedPanGenie 在 5× 與 30× 倍讀取覆蓋度下，相較於原版 PanGenie 能顯著提高基因分型的準確度，但其大幅增加的執行時間成為了應用上的主要瓶頸。
在這個研究中我們首先以全新的 subset seed（子集種子）設計取代spaced seed，進一步提升基因分型的準確度，此外，我們對基於隱藏馬可夫模型（HMM）的演算法使用 GPU 加速，顯著縮短整體運行時間。

In recent years, alignment-free genotyping has received increasing attention in the genomics community, leading to the development of tools such as PanGenie, which demonstrated high genotyping accuracy through the use of contiguous k-mers. Building on this approach, Häntze and Horton introduced MaskedPanGenie, which incorporates spaced seeds to enhance sensitivity. Comparative analyses revealed that MaskedPanGenie achieved notable improvements in genotyping accuracy over the original PanGenie, particularly under 5× and 30× sequencing coverage. However, its significantly increased runtime has become a major bottleneck in practical applications.
First, we replaced spaced seeds with a new seed design known as subset seeds, which led to further improvements in genotyping accuracy. In addition, we employed GPU acceleration to optimize the performance of the algorithm based on the Hidden Markov Model (HMM), significantly reducing the overall runtime.

中文摘要 i
Abstract ii
誌謝 iv
Contents v
List of Tables vii
List of Figures viii
Nomenclature ix
Introduction 1
Background & Related Work 3
1 K-mers 3
1.1 Contiguous k-mers 3
1.2 Spaced seed k-mers 4
2 K-mer Based Genotyping 4
2.1 PanGenie 5
2.2 MaskedPanGenie 7
3 Related tools 8
3.1 Jellyfish & MaskJelly 8
4 Performance metrics 8
4.1 Sensitivity 8
4.2 Weight Genotype Concordance (wGC) 9
Methods 10
1 Subset Seed 10
1.1 Formal definition 10
1.2 Illustrative example 11
2 Hidden Markov Model (HMM) 12
2.1 Genotyping 14
2.2 Implement GPU 15
Data 18
1 Pangenome reference 18
2 Reads 18
Results 20
1 Different seeds on Genotyping 20
1.1 Sensitivity 20
1.2 Genotyping results 20
2 Runtime for MaskedPanGenie 23
Discussion & Future Work 24
Conclusion 26
Bibliography 27
                                    

[1] Broad Institute, Picard Toolkit, https://broadinstitute.github.io/picard/,Version 2.27.4–SNAPSHOT. Accessed 2022-09-01, 2022.
[2] J. Ebler, P. Ebert, W. E. Clarke, T. Rausch, P. A. Audano, T. Houwaart, Y. Mao, J. O. Korbel, E. E. Eichler, M. C. Zody, et al., “Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes,” Nature genetics, vol. 54, no. 4, pp. 518–525, 2022.
[3] H. Häntze and P. Horton, “Effects of spaced k-mers on alignment-free genotyping,” Bioinformatics, vol. 39, no. Supplement_1, pp. i213–i221, 2023.
[4] L. Ilie, S. Ilie, and A. Mansouri Bigvand, “Speed: Fast computation of sensitive spaced seeds,” Bioinformatics, vol. 27, no. 17, pp. 2433–2434, 2011.
[5] P. Krusche, L. Trigg, P. C. Boutros, C. E. Mason, F. M. De La Vega, B. L. Moore, M. Gonzalez-Porta, M. A. Eberle, Z. Tezak, S. Lababidi, et al., “Best practices for benchmarking germline small-variant calls in human genomes,” Nature biotechnology, vol. 37, no. 5, pp. 555–560, 2019.
[6] H. Li, “Aligning sequence reads, clone sequences and assembly contigs with bwa-mem,” arXiv preprint arXiv:1303.3997, 2013.
[7] B. Ma, J. Tromp, and M. Li, “Patternhunter: Faster and more sensitive homology search,” Bioinformatics, vol. 18, no. 3, pp. 440–445, 2002.
[8] G. Marçais and C. Kingsford, “A fast, lock-free approach for efficient parallel counting of occurrences of k-mers,” Bioinformatics, vol. 27, no. 6, pp. 764–770, 2011.
[9] A. Rahman and L. Pachter, “Cgal: Computing genome assembly likelihoods,” Genome biology, vol. 14, pp. 1–10, 2013.

校外：立即公開

簡易檢索 / 詳目顯示

相關論文