| 研究生: |
張家銘 Chang, Chia-Ming |
|---|---|
| 論文名稱: |
甲基化版顯式替代基因相似度評分程式可行性研究 MethylEAGLE Methylation Explicit Alternative Likelihood Genome Evaluator Feasibility Study |
| 指導教授: |
賀保羅
Paul Horton |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 英文 |
| 論文頁數: | 87 |
| 中文關鍵詞: | Methylation 、probability model |
| 外文關鍵詞: | Methylation, probability model |
| 相關次數: | 點閱:93 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
DNA甲基化是表觀基因組信息的一個題目,在發育和癌症中起著核心作用。最近,亞硫酸氫鹽定序使推斷全基因組範圍內的DNA甲基化成為可能。但是,這種推斷是通過讀取映射間接得知的,並且當前的方法並未完全考慮所涉及的不確定性。我們以前的研究是對定序相似度的評估。因此,我們決定將該框架推廣到亞硫酸氫鹽定序。
MethylEagle和原始Eagle之間的兩個主要區別是:1)需要考慮單鏈脫氨; 2)與基因組變體的數量相比,潛在的條件甲基化胞嘧啶的數量要多得多(可能每個CpG)。在這項工作中,我們通過擴展Eagle來解決第一個問題,分別對每個鏈進行概率計算,然後組合結果。第二個問題特別棘手,因為原則上應考慮彼此讀取長度內胞嘧啶甲基化/未甲基化的所有組合。為了解決這個問題,我們實施了基於堆的搜索策略來有效地評估有前途的組合。我們希望對該過程進行建模將對亞硫酸氫鹽序列數據產生更可靠的解釋。
我們的模擬結果表明,原則上我們的概率模型可以正確地推斷整個基因組中的甲基化。我們還將程序與BSseeker2和以前的框架進行了比較。未來我們還有很多進步。但是,我們在細節和敏感方面都改進了以前的工作。
DNA methylation is a form of epigenomic information which plays a central role in development and cancer. Recently bisulfite sequencing has made it possible to infer DNA methylation on a genome-wide scale. This inference however is indirectly made via read mapping, and current methods do not fully take into account the uncertainties involved. Previous work from our group (EAGLE) has shown that an explicit probabilistic model of these uncertainties can effectively rank candidate genome variants according to how strongly they are supported by the read data.
Here we extend the probabilistic model of EAGLE to handle bisulfite sequence data by modeling the process. In bisulfite sequencing experiments, DNA is denatures and then the resulting mixture of single stranded DNA molecules are treated with bisulfite which deaminates unmethylated cytosines. Thus we expect the output of bisulfite sequencing to be a mixture of reads depending on which strand of the DNA was deaminated. Importantly, this output is different from what we would expect if those genomic cytosine positions were simply C:G or T:A substitutions.
The two main differences between MethylEagle and original Eagle are 1) the need to consider single stranded deamination and 2) the much greater number of potentially conditionally methylated cytosines (potentially every CpG) compared to the number of genome variants. In this work we addressed the first by extending Eagle to perform probability computation of each strand separately and then combine the results. The second problem is particularly problematic, as in principle all combinations of cytosine methylation/unmethylation within a read length of each other should be considered. To address that we implemented a heap based search strategy to efficiently evaluate promising combinations. We hope that modeling this process will yield more reliable interpretation bisulfite sequence data.
Our simulation results show that in principle our probability model can correctly infer methylation in whole genome. Also we compare our program with BSseeker2 and previous framework. We still have a lot to progress in future. But we improve previous work in both specific and sensitive.
[1] 大腸癌歷年發生率及死亡率(2018)[Online] Available: https://www.hpa.gov.tw/Pages/Detail.aspx?nodeid=615&pid=1126
[2] World cancer report 2014[Online] Available: https://publications.iarc.fr/Non-Series-Publications/World-Cancer-Reports/World-Cancer-Report-2014
[3] Heng Li, Richard Durbin, “Fast and Accurate Short Read Alignment With Burrows-Wheeler Transform”, Bioinformatics, 2009
[4] Heng Li, Richard Durbin, “Fast and Accurate Long-Read Alignment With Burrows-Wheeler Transform”, Bioinformatics, 2010
[5] Heng Li, “Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM”, Genomics, 2013
[6] Bird AP (1980). DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res 8: 1499–1504.
[7] Bird AP, Taggart M, Frommer M, Miller OJ, Macleod D (1985). A fraction of the ouse genome that is derived from islands of nonmethylated, CpG-rich DNA. Cell 40: 91–99. One of the pioneering studies that first identified CpG islands and described their sequence characteristics.
[8] Saxonov S, Berg P, Brutlag DL (2006). A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc Natl Acad Sci USA 103: 1412–1417
[9] Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data.
[10] Tongwu Zhang, Yingfeng Luo, Kan Liu, Linlin Pan, Bing Zhang, Jun Yu, Songnian Hu, “BIGpre: A Quality Assessment Package for Next-Generation Sequencing Data”, Genomics, Proteomics & Bioinformatics ,2011
[11] Martı’nez-Alca’ntara A, Ballesteros E, Feng C, et al. PIQA: pipeline for Illumina G1 genome analyzer data quality assessment. Bioinformatics 2009;25:2438–9.
[12] Xi, Y., Li, W., 2009. BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinf. 10, 232. http://dx.doi.org/10.1186/1471-2105-10-232.
[13] Smith AD, Chung W-Y, Hodges E, et al. Updates to the RMAP short-read mapping software. Bioinformatics 2009;25: 2841–2.
[14] Segemehl Hoffmann S, Otto C, Kurtz S, et al. Fast mapping of short sequences with mismatches, insertions and deletions using index structures. PLoSComput Biol 2009;5:e1000502.
[15] Pao-Yang Chen, Shawn J Cokus & Matteo Pellegrini, “BS Seeker: precise mapping for bisulfite sequencing”, BMC Mioinformatics, 2010.
[16] Guo W, Fiziev P, Yan W, et al. BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data. BMC Genomics 2013;14:774.
[17] Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 2011;27:1571–2.
[18] Harris EY, Ponts N, Levchuk A, et al. BRAT: bisulfite-treated reads analysis tool. Bioinformatics 2010;26: 572–3.
[19] Harris EY, Ponts N, Le Roch KG, et al. BRAT-BW: efficient and accurate mapping of bisulfite-treated reads. Bioinformatics 2012;28:1795–6.
[20] Ryan, D.P., Ehninger, D., 2014. Bison: bisulfite alignment on nodes of a cluster. BMC Bioinf. 15, 337. http://dx.doi.org/10.1186/1471-2105-15-337.
[21] Pedersen, B., Hsieh, T.-F., Ibarra, C., Fischer, R.L., 2011. MethylCoder: software pipeline for bisulfite-treated sequences. Bioinformatics 27, 2435–2436. http://dx.doi.org/10.1093/bioinformatics/btr394.
[22] Lim J-Q, Tennakoon C, Li G, et al. BatMeth: improved mapper for bisulfite sequencing reads on DNA methylation. Genome Biol 2012;13:R82.
[23] Benoukraf T, Wongphayak S, Hadi LHA, et al. GBSA. http://ctrad-csi.nus.edu.sg/gbsa/(14 April 2014, date last accessed).
[24] Campagna D, Telatin A, Forcato C, et al. PASS-bis: a bisulfite aligner suitable for whole methylome analysis of Illumina and SOLiD reads. Bioinformatics 2013; 29:268–70.
[25] Rubio-Camarillo M, Go’mez-Lo’pez G, Ferna’ndez JM, et al. RUbioSeq: a suite of parallelized pipelines to automate exome variation and bisulfite-seq analyses. Bioinformatics 2013;29:1687–9.
[26] Yutaka Saito , Junko Tsuji, Toutai Mituyama, “Bisulfighter: Accurate Detection of Methylated Cytosines and Differentially Methylated Regions”, Nucleic Acids Res, 2014
[27] Frith, M.C., Mori, R., Asai, K., 2012. A mostly traditional approach improves alignment of bisulfite-converted DNA. Nucleic Acids Res. 40, e100. http://dx.doi.org/10.1093/ nar/gks275.
[28] Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin, and 1000 Genome Project Data Processing Subgroup, “The Sequence Alignment/Map format and SAMtools”, Bioinformatics, 2009
[29] Tony Kuo, Martin C. Frith, Jun Sese & Paul Horton, “EAGLE: Explicit Alternative Genome Likelihood Evaluator”, BMC Medical Genomics, 2018
[30] William R. Pearson and David J. Lipman, “Improved tools for biological sequence comparison.”, Biochemistry, 1988
[31] Peter J. A. Cock, Christopher J. Fields, Naohisa Goto, Michael L. Heuer, and Peter M. Rice, “The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants”, Nucleic Acids Res. 2010 Apr;
[32] Sarkinfada, F. (2014). Applications of molecular diagnostic techniques for infectious diseases. The medical Laboratory Scientist; 35(1):16- 25.
[33] Todd C. Lorenz, “Polymerase Chain Reaction: Basic Protocol Plus Troubleshooting and Optimization Strategies”, Journal of Visualized Experiments, 2012