簡易檢索 / 詳目顯示

研究生: 陳鄧安
Deng, An Chen
論文名稱: 評估不同計算工具利用 Illumina 雙端定序16S rDNAs 做物種鑑定之準確度
Evaluating accuracy of species identification using Illumina paired-end sequences of 16S rDNAs by various computational tools
指導教授: 劉宗霖
Liu, Tsung-Lin
學位類別: 碩士
Master
系所名稱: 生物科學與科技學院 - 生物資訊與訊息傳遞研究所
Insitute of Bioinformatics and Biosignal Transduction
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 52
中文關鍵詞: 物種鑑定準確度16S rDNA雙端定序bowtie2soap2blastrdp
外文關鍵詞: taxonomy classification accuracy, 16S rDNA, paired-end read, bowtie2, soap2, blast, rdp
相關次數: 點閱:161下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 宏觀基因體學(Metagenomics)是直接去研究環境中微生物DNA的研究。用下世代定序平台,將這些微生物DNA序列定序,將序列對上已知物種的DNA資料庫,以得知微生物的物種。下世代定序平台有Illumina和454,比以往Sanger定序能以更低的成本產生更多的序列。454定序平台可以產生序列比較長( 700-800 base pair ),但是成本較為昂貴,隨著技術的改善Illumina paired-end read定序可以達到600 base pair( 2x300 base pair ),並且以更低的成本產生更多的序列。所以使用Illumina定序平台來進行宏觀基因體學的研究越來越熱門。以往研究都是以454 single-end read定序為主(序列長度較長),但目前還沒有研究評估物種鑑定工具對於paired-end read的準確度。本研究會以模擬的paired-end read資料為主,分別產生兩種資料: (1) large (amplicon size 大於600 base pair, i.e., non-overlapping) and (2) small (amplicon size 400-500 base pair, i.e., overlapping)。然後分別去用Blast, RDP, SOAP2和Bowtie2進行物種鑑定,計算各個物種鑑定工具的準確率。在兩種模擬資料中,使用RDP是最準確的。本研究首次探討對於paired-end read資料,RDP為較佳之物種鑑定工具。

    Metagenomics is the study of microbial DNAs from environmental samples. Next-generation sequencing (NGS) can produce much more reads with a lower cost than Sanger sequencing. There are two common NGS platforms, i.e., Illumina and 454. Although 454 platform offers read lengths up to 800 bp, Illumina platform produces more reads with a lower cost than 454 and offers read lengths up to 600 bp (2x300 bp). So many researchers prefer Illumina in metagenomic studies. In previous studies, no research explored the classification accuracy using Illumina paired-end reads. In our study, we evaluated classification accuracy of four popular tools, i.e., Bowtie2, SOAP2, RDP and Blast. We used SILVA 16S rDNA database for simulating Illumina reads (2x300 bp) with two 16S primer sets: (1) larger dataset containing non-overlapping paired-end reads and the amplicon size was over 600bp; (2) small dataset containing overlapping paired-end reads and the amplicon size was between 400-500bp. We compared classification accuracy of Bowtie2, SOAP2, Blast and RDP. The results showed that RDP was the most accurate. Thus, we suggested that RDP was a better tool for species classification with Illumina PE data.

    中英文摘要 I 誌謝 V 目錄 VI 表目錄 IX 圖目錄 X 第壹章 序論 1 第一節 文獻探討 1 一 宏觀基因體學 1 二 細菌16S rDNA 2 三 下世代定序平台介紹 3 四 通用性引子介紹 4 五 現有的16S rDNA資料庫 4 六 現有的物種鑑定工具 5 第二節 動機與目的 6 一 研究動機 6 二 研究目的 8 第貳章 資料蒐集與方法 9 第一節 Large 模擬資料 9 一 參考資料庫選擇與過濾 9 二 模擬引子過濾與整理 9 三 模擬資料的評估與產生 10 四. 運用多種鑑定工具比對模擬資料 11 五. 比對結果處理與準確度評估 14 第二節 Small 模擬資料( 沒合併 ) 17 一 參考資料庫選擇與過濾 17 二 模擬引子過濾與整理 17 三 模擬資料的評估與產生 17 四 運用多種鑑定工具比對模擬資料 17 五. 比對結果處理與準確度評估 18 第参節 Small 模擬資料( 有合併 ) 19 一 合併雙端定序序列 19 二 模擬資料隨機抽樣 19 三 物種鑑定工具方法 19 四 比對結果處理與準確度評估 20 第参章 實驗結果分析與討論 21 一 16S rDNA資料庫的篩選 22 二 引子篩選與評估抓取物種數量 23 三 真實quality profile的評估與篩選 25 四 Large模擬資料結果 28 (一). SAOP2分開比對與一起比對差別 28 (二). 物種鑑定工具處理Paired-end read情況評估 29 (三). 物種鑑定工具準確率評估( Large ) 30 五 Small模擬資料結果 31 (一) Small模擬資料結果(沒合併 ) 32 (二). Small模擬資料結果( 合併 ) 34 六 比較不同的模擬資料 36 (一). Large與Small paired-end資料 36 (二). Small paired-end資料 合併與不合併比較 37 七 物種鑑定工具比較 37 第肆章 結論 39 參考文獻 41 補充 44

    [1] J. Handelsman, J. Tiedje, L. Alvarez-Cohen, M. Ashburner, I. Cann, E. Delong, et al., "The new science of metagenomics: Revealing the secrets of our microbial planet," Nat Res Council Report, vol. 13, 2007.
    [2] K. Kurokawa, T. Itoh, T. Kuwahara, K. Oshima, H. Toh, A. Toyoda, et al., "Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes," Dna Research, vol. 14, pp. 169-181, 2007.
    [3] J. Handelsman, "Metagenomics: application of genomics to uncultured microorganisms," Microbiology and molecular biology reviews, vol. 68, pp. 669-685, 2004.
    [4] W. R. Streit and R. A. Schmitz, "Metagenomics–the key to the uncultured microbes," Current opinion in microbiology, vol. 7, pp. 492-498, 2004.
    [5] P. Belda-Ferre, L. D. Alcaraz, R. Cabrera-Rubio, H. Romero, A. Simón-Soro, M. Pignatelli, et al., "The oral metagenome in health and disease," The ISME journal, vol. 6, pp. 46-56, 2012.
    [6] S. G. Tringe and P. Hugenholtz, "A renaissance for the pioneering 16S rRNA gene," Current opinion in microbiology, vol. 11, pp. 442-446, 2008.
    [7] S. Ganesh, D. J. Parris, E. F. DeLong, and F. J. Stewart, "Metagenomic analysis of size-fractionated picoplankton in a marine oxygen minimum zone," The ISME journal, vol. 8, pp. 187-211, 2014.
    [8] T. Coenye and P. Vandamme, "Intragenomic heterogeneity between multiple 16S ribosomal RNA operons in sequenced bacterial genomes," FEMS microbiology letters, vol. 228, pp. 45-49, 2003.
    [9] A. Morandi, O. Zhaxybayeva, J. P. Gogarten, and J. Graf, "Evolutionary and diagnostic implications of intragenomic heterogeneity in the 16S rRNA gene in Aeromonas strains," Journal of bacteriology, vol. 187, pp. 6561-6564, 2005.
    [10] F. Sanger, S. Nicklen, and A. R. Coulson, "DNA sequencing with chain-terminating inhibitors," Proceedings of the National Academy of Sciences, vol. 74, pp. 5463-5467, 1977.
    [11] J. S. Reis-Filho, "Next-generation sequencing," Breast Cancer Res, vol. 11, p. S12, 2009.
    [12] C. W. Nossa, W. E. Oberdorf, L. Yang, J. A. Aas, B. J. Paster, T. Z. DeSantis, et al., "Design of 16S rRNA gene primers for 454 pyrosequencing of the human foregut microbiome," World journal of gastroenterology: WJG, vol. 16, p. 4135, 2010.
    [13] M. L. Metzker, "Sequencing technologies—the next generation," Nature reviews genetics, vol. 11, pp. 31-46, 2010.
    [14] A. Klindworth, E. Pruesse, T. Schweer, J. Peplies, C. Quast, M. Horn, et al., "Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies," Nucleic acids research, p. gks808, 2012.
    [15] G. J. Olsen, R. Overbeek, N. Larsen, T. L. Marsh, M. J. McCaughey, M. A. Maciukenas, et al., "The ribosomal database project," Nucleic Acids Research, vol. 20, pp. 2199-2200, 1992.
    [16] C. Quast, E. Pruesse, P. Yilmaz, J. Gerken, T. Schweer, P. Yarza, et al., "The SILVA ribosomal RNA gene database project: improved data processing and web-based tools," Nucleic acids research, p. gks1219, 2012.
    [17] T. Z. DeSantis, P. Hugenholtz, N. Larsen, M. Rojas, E. L. Brodie, K. Keller, et al., "Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB," Applied and environmental microbiology, vol. 72, pp. 5069-5072, 2006.
    [18] T. Liu, C.-M. Li, Y.-L. Han, T.-Y. Chiang, Y.-C. Chiang, and H.-M. Sung, "Highly diversified fungi are associated with the achlorophyllous orchid Gastrodia flavilabella," BMC genomics, vol. 16, p. 185, 2015.
    [19] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, "Basic local alignment search tool," Journal of molecular biology, vol. 215, pp. 403-410, 1990.
    [20] J. R. Cole, Q. Wang, E. Cardenas, J. Fish, B. Chai, R. J. Farris, et al., "The Ribosomal Database Project: improved alignments and new tools for rRNA analysis," Nucleic acids research, vol. 37, pp. D141-D145, 2009.
    [21] R. Li, C. Yu, Y. Li, T.-W. Lam, S.-M. Yiu, K. Kristiansen, et al., "SOAP2: an improved ultrafast tool for short read alignment," Bioinformatics, vol. 25, pp. 1966-1967, 2009.
    [22] B. Langmead and S. L. Salzberg, "Fast gapped-read alignment with Bowtie 2," Nature methods, vol. 9, pp. 357-359, 2012.
    [23] J. M. Rothberg and J. H. Leamon, "The development and impact of 454 sequencing," Nature biotechnology, vol. 26, pp. 1117-1124, 2008.
    [24] A. Schlüter, T. Bekel, N. N. Diaz, M. Dondrup, R. Eichenlaub, K.-H. Gartemann, et al., "The metagenome of a biogas-producing microbial community of a production-scale biogas plant fermenter analysed by the 454-pyrosequencing technology," Journal of Biotechnology, vol. 136, pp. 77-90, 2008.
    [25] V. Lazarevic, K. Whiteson, S. Huse, D. Hernandez, L. Farinelli, M. Østerås, et al., "Metagenomic study of the oral microbiota by Illumina high-throughput sequencing," Journal of microbiological methods, vol. 79, pp. 266-271, 2009.
    [26] J. Qin, R. Li, J. Raes, M. Arumugam, K. S. Burgdorf, C. Manichanh, et al., "A human gut microbial gene catalogue established by metagenomic sequencing," nature, vol. 464, pp. 59-65, 2010.
    [27] T. Magoč and S. L. Salzberg, "FLASH: fast length adjustment of short reads to improve genome assemblies," Bioinformatics, vol. 27, pp. 2957-2963, 2011.
    [28] K. Rotmistrovsky, W. Jang, and G. D. Schuler, "A web server for performing electronic PCR," Nucleic acids research, vol. 32, pp. W108-W112, 2004.
    [29] W. Huang, L. Li, J. R. Myers, and G. T. Marth, "ART: a next-generation sequencing read simulator," Bioinformatics, vol. 28, pp. 593-594, 2012.
    [30] J. J. Werner, D. Zhou, J. G. Caporaso, R. Knight, and L. T. Angenent, "Comparison of Illumina paired-end and single-direction sequencing for microbial 16S rRNA gene amplicon surveys," ISME Journal-International Society for Microbial Ecology, vol. 6, p. 1273, 2012.

    下載圖示 校內:2017-08-31公開
    校外:2017-08-31公開
    QR CODE