簡易檢索 / 詳目顯示

研究生: 許芊瑀
Hsu, Chien-Yu
論文名稱: 亞硫酸氫鹽測序數據的甲基化調用軟體之基準評估
Benchmark evaluation of methylation calling software for bisulfite sequencing data
指導教授: 賀保羅
Horton, Paul
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 人工智慧科技碩士學位學程
Graduate Program of Artificial Intelligence
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 24
中文關鍵詞: DNA甲基化亞硫酸氫鹽測序對齊工具
外文關鍵詞: DNA methylation, bisulfite sequence, methylation calling tools
相關次數: 點閱:94下載:19
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • DNA甲基化是重要的表觀遺傳機制之一,DNA甲基化也涉及許多生物學過程,包括轉錄活性、基因組印記、發育和包括癌症在內的許多疾病。近幾年,隨著新一代測序技術的進步,為研究全基因組DNA甲基化提供了機會。另外,亞硫酸氫鹽測序為現在流行且強大的甲基化檢測方法,同時也是最多人研究的方法。全基因亞硫酸氫鹽測序和還原型亞硫酸氫鹽測序經常被用於生成DNA甲基化組,需要高效的工具來比對亞硫酸氫鹽測序數據。
    現今雖然已經有許多亞硫酸氫鹽工具被研究出來,但由於缺少基準的評估,這讓使用者在判斷工具的準確度方面造成很多不便。因此本文將對現有的工具進行基準的評估,進而對預測DNA甲基化擬定出不同的策略,以及根據不同的情況選擇適合的工具操作。接下來將在此篇文章中分析EAGLE-­METH、BS-­Seeker2和Bismark這三種對齊工具,比較其評估甲基化程度的能力,並討論在不同物種的數據或是不同染色體片段中,對於分析甲基化水平的影響。
    從實驗結果可以得知BS-­Seeker2和Bismark在甲基化水平的計算上準確度都非常高。但相較於BS­-Seeker2,在可視化甲基化分析方面Bismark有著更齊全的功能來可視化我們的結果。此外,我們認為EAGLE­-meth是能夠用來評估亞硫酸氫數據的比對工具。EAGLE-­meth已經可以預測出大部分CpG中發生甲基化的機率,這對DNA甲基化的研究提供了很大的幫助。

    DNA methylation is one of the important epigenetic mechanisms and that are also involved in many biological processes, including transcriptional activity, genomic imprinting, development, and many diseases including cancer. In recent years, advances in next­generation sequencing technologies have provided opportunities to study genome­wide DNA methylation. In addition, bisulfite sequencing is now a popular and powerful method for methylation detection and is also the most studied method. Whole gene bisulfite sequencing and reduced representation bisulfite sequencing are often used to generate DNA methylation, and efficient tools are needed to compare bisulfite sequencing data.
    Although many bisulfite tools have been developed, the lack of benchmarking makes it inconvenient for users to evaluate the accuracy of the tools. Therefore, in this article, we will benchmark the existing tools and develop different strategies for predicting DNA methylation and selecting the appropriate tool for each situation. In this paper, we will analyze three aligned tools, EAGLE-METH, BS­Seeker2, and Bismark, to compare their ability to evaluate the methylation level, and discuss the impact on the analysis of methylation level in different species of data or different chromosome fragments.
    From the experimental results, we can see that both BS­-Seeker2 and Bismark are very accurate in the calculation of methylation levels. However, compared to BS­-Seeker2, Bismark has a more complete set of features to visualize our results in terms of visual methylation analysis. In addition, we consider EAGLE-­meth as a comparative tool to evaluate bisulfite data, and EAGLE­-meth has been able to predict the probability of methylation in most CpG, which is a great help for DNA methylation studies.

    中文摘要 i Abstract iii 誌謝 v Contents vi List of Tables viii List of Figures ix 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Research framework . . . . . . . . . . . . . . . . . . . . . . 3 2 Materials and Related works 4 2.1 MethylFASTQ . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Methylation calling tools . . . . . . . . . . . . . . . . . . . . 5 2.3 EAGLE-­meth . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 BS-­Seeker2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Bismark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.6 Bowtie2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Experiment and Results 14 3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1 Bisulfite sequencing real data . . . . . . . . . . . . . 14 3.1.2 Generating simulation data using MethylFASTQ . . . 15 3.2 BS-­Seeker2 and Bismark . . . . . . . . . . . . . . . . . . . . 15 3.2.1 Experimental results of BS­-Seeker2 . . . . . . . . . . 15 3.2.2 Experimental results of Bismark . . . . . . . . . . . . 16 3.2.3 Compare the results of BS­-Seeker2 and Bismark . . . 18 3.3 Simulated data analysis . . . . . . . . . . . . . . . . . . . . . 20 4 Conclusions and Future Work 22 4.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Bibliography 23

    [1] Xiwei Sun et al. “A comprehensive evaluation of alignment software for reduced representation bisulfite sequencing data”. Bioinformatics 34.16 (2018), pp. 2715 2723
    [2] Felix Krueger et al. “DNA methylome analysis using short bisulfite sequencing data”. Nature methods 9.2 (2012), pp. 145–151.
    [3] Yaping Liu et al. “Bis­SNP: combined DNA methylation and SNP calling for Bisulfiteseq data”. Genome biology 13.7 (2012), pp. 1–14
    [4] Yulia A Medvedeva et al. “Effects of cytosine methylation on transcription factor binding sites”. BMC genomics 15.1 (2014), pp. 1–12.
    [5] Baoshan Ma et al. “Predicting DNA methylation level across human tissues”. Nucleic acids research 42.6 (2014), pp. 3515–3528.
    [6] Giulia Piaggeschi et al. “MethylFASTQ: a tool simulating bisulfite sequencing data”. 2019 27th Euromicro International Conference on Parallel, Distributed and NetworkBased Processing (PDP). IEEE. 2019, pp. 334–339.
    [7] Felix Krueger and Simon R Andrews. “Bismark: a flexible aligner and methylation caller for Bisulfite­Seq applications”. bioinformatics 27.11 (2011), pp. 1571–1572.
    [8] Yang Liu et al. “DNA methylation­calling tools for Oxford Nanopore sequencing: a survey and human epigenome­wide evaluation”. Genome biology 22.1 (2021), pp. 1– 33.
    [9] Tony Kuo et al. “EAGLE: explicit alternative genome likelihood evaluator”. BMC medical genomics 11.2 (2018), pp. 1–10.
    [10] Adam Nunn et al. “Comprehensive benchmarking of software for mapping whole genome bisulfite data: from read alignment to DNA methylation analysis”. Briefings in bioinformatics 22.5 (2021), bbab021.
    [11] Weilong Guo et al. “BS­-Seeker2: a versatile aligning pipeline for bisulfite sequencing data”. BMC genomics 14.1 (2013), pp. 1–8.
    [12] Brent S Pedersen et al. “Fast and accurate alignment of long bisulfite­seq reads”. arXiv preprint arXiv:1401.1129 (2014).
    [13] Donna Karolchik et al. “The UCSC genome browser database”. Nucleic acids research 31.1 (2003), pp. 51–54.
    [14] W James Kent et al. “The human genome browser at UCSC”. Genome research 12.6 (2002), pp. 996–1006.
    [15] Vivien Marx. “How to deduplicate PCR”. Nature Methods 14.5 (2017), pp. 473–476.
    [16] Ben Langmead and Steven L Salzberg. “Fast gapped­read alignment with Bowtie 2”. Nature methods 9.4 (2012), pp. 357–359.
    [17] Ben Langmead. “Aligning short sequencing reads with Bowtie”. Current protocols in bioinformatics 32.1 (2010), pp. 11–7.
    [18] José Luis Fernández Alemán and Youssef Oufaska. “SAMtool, a tool for deducing and implementing loop patterns”. Proceedings of the fifteenth annual conference on Innovation and technology in computer science education. 2010, pp. 68–72.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE