| 研究生: |
李建儒 Lee, Chien-Ju |
|---|---|
| 論文名稱: |
藉由Illumina對邊序列解決重疊群圖之模糊三元結構來提升454組序 Improving 454 assembly by resolving ambiguous triad structures in the contig graph using Illumina paired-end data |
| 指導教授: |
張天豪
Chang, Tien-Hao |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 中文 |
| 論文頁數: | 54 |
| 中文關鍵詞: | 基因體組序 、第二代基因定序 |
| 外文關鍵詞: | genome assembly, next-generation sequencing |
| 相關次數: | 點閱:113 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
基因定序(genome sequencing)是研究基因體非常重要的項目,目前已有許多第二代基因定序(next-generation sequencing, NGS)技術,例如Roche/454 pyrosequencing、Illmina Genome Analyzer (IGA)、Helicos以及ABI SOLiD等,因為其低成本與高輸出產量的特性,被廣泛應用在許多基因體定序計畫之中。然而,第二代基因定序技術產生的序列資料往往包含很多錯誤,使得後續的組序(genome assembly)工作變得更加困難。
過去的研究顯示可以透過結合兩種第二代基因定序技術的資料,交互比對來修正單一定序技術產生的錯誤。本篇論文提出一個組序流程,藉由結合454與IGA兩種定序技術的資料來提升組序的品質。本論文首先利用454的序列資料進行初步的組序工作,接著找出454組序中疑似有錯誤的部分,稱為「三元結構」,並且透過兩種運用比對IGA序列資料的分析方法來解決454組序中的模糊結構。我們使用兩個物種的基因體序列資料來驗證上述組序流程的正確性,實驗的結果顯示該流程可以有效偵測組序中的三元結構,並解決一半以上的三元結構。
Genome sequencing is the first step to study a new genome. Recent advance in next-generation sequencing (NGS) technologies, such as Roche/454 pyrosequencing, Illumina Genome Analyzer (IGA), Helicos and ABI SOLiD), enables high-throughput sequencing data with lower cost than conventional sequencing technologies such as Sanger. However, the sequencing data generated by NGS technologies usually contain many errors, making the following genome assembly relatively difficult.
Previous studies have revealed that these sequencing errors can be corrected via combining sequencing data from different NGS technologies. This study presents an assembly pipeline, which utilizes sequencing data from both 454 and IGA technologies to improve genome assembly. This pipeline first assembled 454 sequencing data to construct a preliminary assembly. Then, the connections with suspicious errors in the preliminary assembly, denoted triad structure, were identified by a detection algorithm and solved by two analyses using IGA sequencing data. We used two genomes to evaluate the proposed assembly pipeline. The experimental results show that our pipeline can detect the triad structures in the preliminary assemblies and solve more than half of them.
1. Lee, S., The Evolution and Development of DNA Sequencing Technology. J Biomed Lab Sci 2010. 22 p. 49-58.
2. ES, L., L. LM, and B. B, Initial sequencing and analysis of the human genome. Nature 2001, 409:860-921, 2001. 409: p. 860-921.
3. J, S., W. J, and G. J, Quality assessment of the human genome sequence. Nature, 2004. 429: p. 365-368.
4. JC, V., A. MD, and M. EW, The sequence of the human genome. Science, 2001. 291: p. 1304-1351.
5. Mardis, E.R., Next-Generation DNA Sequencing Methods. Genomics and Human Genetics, 2008. 9: p. 387-402.
6. Mardis, E.R., The impact of next-generation sequencing technology on genetics cell, 2008. 24(3): p. 133-141.
7. Metzker, M.L., Sequencing technologies — the next generation. Nature Reviews 2009. 11: p. p31-46.
8. DR, Z. and B. E, Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res, 2008. 18: p. 821-829.
9. JT, S., et al., ABySS: a parallel assembler for short read sequence data. Genome Res, 2009. 19: p. 1117-1123.
10. Bartel, D.P., MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell, 2004. 116(2): p. 281-297.
11. Tsai, I.J. and T.D. Otto, Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biology, 2010. 11(4): p. R41.
12. DiGuistini, S. and N.Y. Liao, De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data. Genome Biology, 2009. 10: p. R94.
13. Phillippy, A.M., M.C. Schatz, and M. Pop, Genome assembly forensics: finding the elusive mis-assembly. Genome Biology, 2008. 9: p. R55.
14. Sayers, E.W., et al., Database resources of the national center for biotechnology information. Nucleic acids research, 2011. 39: p. D38.
15. G. Stoesser, e.a., The EMBL Nucleotide Sequence Database. Nucleic Acids Research, 1999. 27 p. D29-D33.
16. S. Miyazaki, e.a., DNA Data Bank of Japan (DDBJ) in XML. Nucleic Acids Research, 2003. 31 p. D13-D16.
校內:2016-08-03公開