| 研究生: |
許芊瑀 Hsu, Chien-Yu |
|---|---|
| 論文名稱: |
亞硫酸氫鹽測序數據的甲基化調用軟體之基準評估 Benchmark evaluation of methylation calling software for bisulfite sequencing data |
| 指導教授: |
賀保羅
Horton, Paul |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 人工智慧科技碩士學位學程 Graduate Program of Artificial Intelligence |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 英文 |
| 論文頁數: | 24 |
| 中文關鍵詞: | DNA甲基化 、亞硫酸氫鹽測序 、對齊工具 |
| 外文關鍵詞: | DNA methylation, bisulfite sequence, methylation calling tools |
| 相關次數: | 點閱:94 下載:19 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
DNA甲基化是重要的表觀遺傳機制之一,DNA甲基化也涉及許多生物學過程,包括轉錄活性、基因組印記、發育和包括癌症在內的許多疾病。近幾年,隨著新一代測序技術的進步,為研究全基因組DNA甲基化提供了機會。另外,亞硫酸氫鹽測序為現在流行且強大的甲基化檢測方法,同時也是最多人研究的方法。全基因亞硫酸氫鹽測序和還原型亞硫酸氫鹽測序經常被用於生成DNA甲基化組,需要高效的工具來比對亞硫酸氫鹽測序數據。
現今雖然已經有許多亞硫酸氫鹽工具被研究出來,但由於缺少基準的評估,這讓使用者在判斷工具的準確度方面造成很多不便。因此本文將對現有的工具進行基準的評估,進而對預測DNA甲基化擬定出不同的策略,以及根據不同的情況選擇適合的工具操作。接下來將在此篇文章中分析EAGLE-METH、BS-Seeker2和Bismark這三種對齊工具,比較其評估甲基化程度的能力,並討論在不同物種的數據或是不同染色體片段中,對於分析甲基化水平的影響。
從實驗結果可以得知BS-Seeker2和Bismark在甲基化水平的計算上準確度都非常高。但相較於BS-Seeker2,在可視化甲基化分析方面Bismark有著更齊全的功能來可視化我們的結果。此外,我們認為EAGLE-meth是能夠用來評估亞硫酸氫數據的比對工具。EAGLE-meth已經可以預測出大部分CpG中發生甲基化的機率,這對DNA甲基化的研究提供了很大的幫助。
DNA methylation is one of the important epigenetic mechanisms and that are also involved in many biological processes, including transcriptional activity, genomic imprinting, development, and many diseases including cancer. In recent years, advances in nextgeneration sequencing technologies have provided opportunities to study genomewide DNA methylation. In addition, bisulfite sequencing is now a popular and powerful method for methylation detection and is also the most studied method. Whole gene bisulfite sequencing and reduced representation bisulfite sequencing are often used to generate DNA methylation, and efficient tools are needed to compare bisulfite sequencing data.
Although many bisulfite tools have been developed, the lack of benchmarking makes it inconvenient for users to evaluate the accuracy of the tools. Therefore, in this article, we will benchmark the existing tools and develop different strategies for predicting DNA methylation and selecting the appropriate tool for each situation. In this paper, we will analyze three aligned tools, EAGLE-METH, BSSeeker2, and Bismark, to compare their ability to evaluate the methylation level, and discuss the impact on the analysis of methylation level in different species of data or different chromosome fragments.
From the experimental results, we can see that both BS-Seeker2 and Bismark are very accurate in the calculation of methylation levels. However, compared to BS-Seeker2, Bismark has a more complete set of features to visualize our results in terms of visual methylation analysis. In addition, we consider EAGLE-meth as a comparative tool to evaluate bisulfite data, and EAGLE-meth has been able to predict the probability of methylation in most CpG, which is a great help for DNA methylation studies.
[1] Xiwei Sun et al. “A comprehensive evaluation of alignment software for reduced representation bisulfite sequencing data”. Bioinformatics 34.16 (2018), pp. 2715 2723
[2] Felix Krueger et al. “DNA methylome analysis using short bisulfite sequencing data”. Nature methods 9.2 (2012), pp. 145–151.
[3] Yaping Liu et al. “BisSNP: combined DNA methylation and SNP calling for Bisulfiteseq data”. Genome biology 13.7 (2012), pp. 1–14
[4] Yulia A Medvedeva et al. “Effects of cytosine methylation on transcription factor binding sites”. BMC genomics 15.1 (2014), pp. 1–12.
[5] Baoshan Ma et al. “Predicting DNA methylation level across human tissues”. Nucleic acids research 42.6 (2014), pp. 3515–3528.
[6] Giulia Piaggeschi et al. “MethylFASTQ: a tool simulating bisulfite sequencing data”. 2019 27th Euromicro International Conference on Parallel, Distributed and NetworkBased Processing (PDP). IEEE. 2019, pp. 334–339.
[7] Felix Krueger and Simon R Andrews. “Bismark: a flexible aligner and methylation caller for BisulfiteSeq applications”. bioinformatics 27.11 (2011), pp. 1571–1572.
[8] Yang Liu et al. “DNA methylationcalling tools for Oxford Nanopore sequencing: a survey and human epigenomewide evaluation”. Genome biology 22.1 (2021), pp. 1– 33.
[9] Tony Kuo et al. “EAGLE: explicit alternative genome likelihood evaluator”. BMC medical genomics 11.2 (2018), pp. 1–10.
[10] Adam Nunn et al. “Comprehensive benchmarking of software for mapping whole genome bisulfite data: from read alignment to DNA methylation analysis”. Briefings in bioinformatics 22.5 (2021), bbab021.
[11] Weilong Guo et al. “BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data”. BMC genomics 14.1 (2013), pp. 1–8.
[12] Brent S Pedersen et al. “Fast and accurate alignment of long bisulfiteseq reads”. arXiv preprint arXiv:1401.1129 (2014).
[13] Donna Karolchik et al. “The UCSC genome browser database”. Nucleic acids research 31.1 (2003), pp. 51–54.
[14] W James Kent et al. “The human genome browser at UCSC”. Genome research 12.6 (2002), pp. 996–1006.
[15] Vivien Marx. “How to deduplicate PCR”. Nature Methods 14.5 (2017), pp. 473–476.
[16] Ben Langmead and Steven L Salzberg. “Fast gappedread alignment with Bowtie 2”. Nature methods 9.4 (2012), pp. 357–359.
[17] Ben Langmead. “Aligning short sequencing reads with Bowtie”. Current protocols in bioinformatics 32.1 (2010), pp. 11–7.
[18] José Luis Fernández Alemán and Youssef Oufaska. “SAMtool, a tool for deducing and implementing loop patterns”. Proceedings of the fifteenth annual conference on Innovation and technology in computer science education. 2010, pp. 68–72.