簡易檢索 / 詳目顯示

研究生: 邱聖博
Chiu, Sheng-Po
論文名稱: 橙黃壺菌 BL10 之全基因體及轉錄體的探討
Genomic and transcriptomic analyses of Aurantiochytrium sp. strain BL10
指導教授: 劉宗霖
Liu, Tsung-Lin
學位類別: 碩士
Master
系所名稱: 生物科學與科技學院 - 生物科技與產業科學系
Department of Biotechnology and Bioindustry Sciences
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 97
中文關鍵詞: 真核生物多碳鏈不飽和脂肪酸組序基因預測基因體功能註解染色體套數花生四烯酸α-次亞麻油酸
外文關鍵詞: Eukaryote, polyunsaturated fatty acid, genome assembly, gene prediction, gene annotation, Ploidy estimation, arachidonic acid, alpha-linolenic acid
相關次數: 點閱:61下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 橙黃壺菌 BL10 為生長在河口的真核微藻,BL10 已經被商業化使用來大
    量生產 DHA、蝦紅素等物質。然而若想要更有效率利用基因體去產生這類對
    於人體有效益的不飽和多碳鏈脂肪酸,就需要知道相關的基因。本研究中很大
    的困難就在於 BL10 屬於未被組序過的物種,從組序、基因預測、基因體功能
    註解、核糖核酸組序以及染色體套數分析都是沒有正確答案可以參考。本實驗
    成功將 Illumina 和 PacBio 的資料結合並且成功組序,評估組序的正確性後,
    我們也成功將基因體的序列重複區域遮蓋,進一步做基因預測以及功能性的註
    解。我們成功註解出許多基因和長碳鍊不飽和脂肪酸、脂肪酸連接酶、花生四
    烯酸、α-次亞麻油酸以及玉米黃素的生合成有關。另外我們將 10 小時(快速生
    長期)、30 小時(油脂累積期)以及照 30 小時照藍光組分別做表現量差異的實驗,
    並且發現 BL10 在 30 小時的油脂累積表現量大於 10 小時,我們也發現在 30
    小時照藍光的蝦紅素相關胡蘿蔔素表現量大於 30 小時,並且這兩種現象和生
    物實驗是吻合的。最後本實驗也針對 BL10 的染色體套數做了多倍體和單倍體
    兩種較全面性的比較發現 BL10 是屬於二倍體,結合基因體組序、基因預測、
    功能性註解和染色體套數分析結合轉錄組資料,整理成一份最佳流程。

    According to previous studies, Aurantiochytrium sp. strain BL10 is a promising source of docosahexaenoic acid (DHA) and astaxanthin under certain condition. BL10 has also been used to produce large scale of DHA commercially. W have successfully unveiled genomic sequence of this non-model species through bioinformatics and identified genes that involves in many kinds of polyunsaturated fatty acid (PUFA) production. This research acquired its 59-Mbp genome, and then annotated whole genome after characterizing its genes by gene prediction. Additionally, this research has successfully categorized around 11k genes, and some of the them are found to play
    significant roles in certain kinds of long chain fatty acid producing pathways like arachidonic acid (-6), alpha-linolenic acid and oleic acid (-9) by gene ontology. Moreover, the research also found many genes that are related to astaxanthin synthesis including beta-carotene synthase and zeaxanthin. Interestingly, the research has found several genes that are highly expressed under oil accumulation stage (30hrs) involving in polyunsaturated fatty acid metabolism and arachidonate CoA ligase and alpha-linolenic acid metabolism. Moreover, the research found beta-carotene and zeaxanthin related genes highly expressed under 30hrs blue light comparing to 30hrs. In the research, the research also found that BL10 belongs to diploid by comparison with different ploidy estimation tools. Last, the research will establish a customized pipeline including genome assembly, gene prediction, gene ontology, gene annotation, differential gene expression analysis and ploidy estimation which will facilitate the future studies on BL10.

    Chinese Abstract (中文摘要) I Abstract II Acknowledgements V Table of Contents VI Contents of Tables IX Contents of Figures X Contents of Appendices XII Abbreviation List XIII 1. Research Background 1 1-1 Aurantiochytrium sp. strain BL10 1 1-2 De novo genome assembly, gene prediction and gene annotation 2 1-3 Ploidy estimation 5 1-4 Challenges in genome assembly, gene annotation and ploidy estimation 6 1-5 RNA assembly 8 1-6 Evaluation of genome and RNA assembly 9 1-7 Differential gene expression and enrichment analysis 9 1-8 Research Aim 10 2. Materials and Methods 12 2-1 Genome assembly 12 2-2 Genome assembly software description 13 2-3 Repeatmodeler and Repeatmasker 14 2-4 De novo genome assembly assessment 15 2-5 RNA assembly 17 2-6 Genome structure annotation - PASA 18 2-7 Preparation for training new species model in Augustus 18 2-8 Gene prediction 19 2-9 Blastp 21 2-10 Gene ontology – Blast2GO 21 2-11 Differential gene expression and enrichment analysis 21 2-12 Ploidy estimation 23 3. Results 26 3-1 Quality evaluation of Sequencing data 26 3-2 MaSuRCA genome assembly 26 3-3 Repeatmasking 28 3-4 RNA assembly 29 3-5 Augustus 31 3-6 Gene ontology and annotation 32 3-7 Differential gene expression and enrichment analysis 34 3-8 Ploidy estimation 34 4. Discussion 36 4-1 Overall 36 4-2 MaSuRCA 38 4-3 Trinity 38 4-4 Blastp database 39 4-5 Carotenoid biosynthetic process 40 4-6 Differential gene expression and enrichment analysis 40 4-7 Ploidy estimation 40 References 42

    Ackermann, M., and Strimmer, K. A general modular framework for gene set enrichment analysis. BioMed Central Bioinformatics 10, 47, 2009.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., and Sherlock, G. Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25-29, 2000.
    Augusto Corrêa dos Santos, R., Goldman, G.H., and Riaño-Pachón, D.M. ploidyNGS: visually exploring ploidy with Next Generation Sequencing data. Bioinformatics 33, 2575-2576, 2017.
    Baptista, R.P., Reis-Cunha, J.L., DeBarry, J.D., Chiari, E., Kissinger, J.C., Bartholomeu, D.C., and Macedo, A.M. Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231. Microbial Genomics 4, 4, 2018.
    Batzer, M.A., and Deininger, P.L. A human-specific subfamily of Alu sequences. Genomics 9, 481-487, 1991.
    Bernander, R., Palm, J.E.D., and Svärd, S.G. Genome ploidy in different stages of the Giardia lamblia life cycle. Cellular Microbiology 3, 55-62, 2001.
    Campbell, M.A., Haas, B.J., Hamilton, J.P., Mount, S.M., and Buell, C.R. Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis. BioMed Central Genomics 7, 327, 2006.
    Chaung, K.C., Chu, C.Y., Su, Y.M., and Chen, Y.M. Effect of culture conditions on growth, lipid content, and fatty acid composition of Aurantiochytrium mangrovei strain BL10. Applied Microbiology and Biotechnology Express 2, 42, 2012.
    Chen, W.C. Optimizing microalgae genome assembly of high throughput sequencing data, Master thesis. Retrieved from National Cheng Kung University of Department of Biotechnology and Bioindustry Sciences, 2018.
    Commins, J., Toft, C., and Fares, M.A. Computational Biology Methods and Their Application to the Comparative Genomics of Endocellular Symbiotic Bacteria of Insects. Biological Procedures Online 11, 52, 2009.
    Conesa, A., Götz, S., García-Gómez, J.M., Terol, J., Talón, M., and Robles, M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674-3676, 2005.
    Consortium, T.U. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research 49, D480-D489, 2020.
    Coradetti, S.T., Pinel, D., Geiselman, G.M., Ito, M., Mondo, S.J., Reilly, M.C., Cheng, Y.F., Bauer, S., Grigoriev, I.V., Gladden, J.M., Simmons, B.A., Brem, R.B., Arkin, A.P., and Skerker, J.M. Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides. Elife, 7, 2018.
    Ejigu, G.F., and Jung, J. Review on the Computational Genome Annotation of Sequences Obtained by Next-Generation Sequencing. Biology 9, 295, 2020.
    Fisher, J.A., Ellis, J.R., Ruttenberg, J.C., and Nicolau, A. Parallel processing: a smart compiler and a dumb machine. Special Interest Group on Programming Languages Notices 19, 37-47, 1984.
    Flavell, A.J., Pearce, S.R., Heslop-Harrison, P., and Kumar, A. The evolution of Ty1-copia group retrotransposons in eukaryote genomes. Genetica 100, 185-195, 1997.
    Flynn, J.M., Hubley, R., Goubert, C., Rosen, J., Clark, A.G., Feschotte, C., and Smit, A.F. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences of the United States of America 117, 9451-9457, 2020.
    Furlan, V.J.M., Maus, V., Batista, I., and Bandarra, N.M. Production of docosahexaenoic acid by Aurantiochytrium sp. ATCC PRA-276. Brazilian Journal of Microbiology 48, 359-365, 2017.
    Götz, S., García-Gómez, J.M., Terol, J., Williams, T.D., Nagaraj, S.H., Nueda, M.J., Robles, M., Talón, M., Dopazo, J., and Conesa, A. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Research 36, 3420-3435, 2008.
    Ganuza, E., Yang, S., Amezquita, M., Giraldo-Silva, A., and Andersen, R.A. Genomics, Biology and Phylogeny Aurantiochytrium acetophilum sp. nov. (Thraustrochytriaceae), Including First Evidence of Sexual Reproduction. Protist 170, 209-232, 2019.
    Georganas, E., Buluç, A., Chapman, J., Oliker, L., Rokhsar, D., and Yelick, K. Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly. International Conference for High Performance Computing, Networking, Storage and Analysis 14, 437-448, 2015.
    Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman, N., and Regev, A. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29, 644-652, 2011.
    Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D., Bowden, J., Couger, M.B., Eccles, D., Li, B., Lieber, M., MacManes, M.D., Ott, M., Orvis, J., Pochet, N., Strozzi, F., Weeks, N., Westerman, R., William, T., Dewey, C.N., Henschel, R., LeDuc, R.D., Friedman, N., and Regev, A. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols 8, 1494-1512, 2013.
    Haas, B.J., Salzberg, S.L., Zhu, W., Pertea, M., Allen, J.E., Orvis, J., White, O., Buell, C.R., and Wortman, J.R. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biology 9, R7, 2008.
    Han, D., Li, Y., and Hu, Q. Astaxanthin in microalgae: Pathways, functions and biotechnological implications. ALGAE 28, 131-147, 2013.
    Hoff, K.J., and Stanke, M. WebAUGUSTUS - a web service for training AUGUSTUS and predicting genes in eukaryotes. Nucleic Acids Research 41, W123-W128, 2013.
    Hung, J.H., Yang, T.H., Hu, Z., Weng, Z., and DeLisi, C. Gene set enrichment analysis: performance evaluation and usage guidelines. Briefings in Bioinformatics 13, 281-291, 2011.
    Iwasaka, H., Koyanagi, R., Satoh, R., Nagano, A., Watanabe, K., Hisata, K., Satoh, N., and Aki, T. A Possible Trifunctional β-Carotene Synthase Gene Identified in the Draft Genome of Aurantiochytrium sp. Strain KH105. Genes 9, 200, 2018.
    Kanehisa, M., and Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28, 27-30, 2000.
    Keller, O., Kollmar, M., Stanke, M., and Waack, S. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics 27, 757-763, 2011.
    Kolmogorov, M., Bickhart, D.M., Behsaz, B., Gurevich, A., Rayko, M., Shin, S.B., Kuhn, K., Yuan, J., Polevikov, E., Smith, T.P.L., and Pevzner, P.A. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nature Methods 17, 1103-1110, 2020.
    Kolmogorov, M., Yuan, J., Lin, Y., and Pevzner, P.A. Assembly of long, error-prone reads using repeat graphs. Nature Biotechnology 37, 540-546, 2019.
    Langmead, B., and Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357-359, 2012.
    Lee, J., Nishiyama, T., Shigenobu, S., Yamaguchi, K., Suzuki, Y., Shimada, T., Katsuma, S., and Kiuchi, T. The genome sequence of Samia ricini, a new model species of lepidopteran insect. Molecular Ecology Resources 21, 327-339, 2021.
    Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Subgroup, G.P.D.P. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079, 2009.
    Lin, Y., Yuan, J., Kolmogorov, M., Shen, M.W., Chaisson, M., and Pevzner, P.A. Assembly of long error-prone reads using de Bruijn graphs. Proceedings of the National Academy of Sciences of the United States of America 113, E8396-E8405, 2016.
    Love, M.I., Huber, W., and Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15, 550, 2014.
    Maldonado, R., Jiménez, J., and Casadesús, J. Changes of ploidy during the Azotobacter vinelandii growth cycle. Journal of Bacteriology 176, 3911, 1994.
    Margarido, G.R., and Heckerman, D. ConPADE: genome assembly ploidy estimation from next-generation sequencing data. Public Library of Science Computational Biology 11, e1004229, 2015.
    Michael, A.N., Fernando, A.Q., Johan, A.d.B., Srikumar, S., and Paul, A. Random-set methods identify distinct aspects ofthe enrichment signal in gene-set analysis. The Annals of Applied Statistics 1, 85-106, 2007.
    Miller, J., Koren, S., and Sutton, G. Miller, J. R., Koren, S. & Sutton, G. Assembly algorithms for next-generation sequencing data. Genomics 95, 315-327, 2010.
    Miller, J.R., Delcher, A.L., Koren, S., Venter, E., Walenz, B.P., Brownley, A., Johnson, J., Li, K., Mobarry, C., and Sutton, G. Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 24, 2818-2824, 2008.
    Pevzner, P.A. 1-Tuple DNA sequencing: computer analysis. Journal of Biomolecular Structure and Dynamics 7, 63-73, 1989.
    Rabiner, L., and Juang, B. An introduction to hidden Markov models. Acoustics, Speech, and Signal Processing Society Magazine 3, 4-16, 1986.
    Riddle, N.C., Kato, A., and Birchler, J.A. Genetic variation for the response to ploidy change in Zea mays L. Theoretical and Applied Genetics 114, 101-111, 2006.
    Rong, C., Chen, H., Tang, X., Gu, Z., Zhao, J., Zhang, H., Chen, W., and Chen, Y. Characterization and molecular docking of new Δ17 fatty acid desaturase genes from Rhizophagus irregularis and Octopus bimaculoides. Royal Society of Chemistry Advances 9, 6871-6880, 2019.
    Schulz, M.H., Zerbino, D.R., Vingron, M., and Birney, E. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28, 1086-1092, 2012.
    Seppey, M., Manni, M., and Zdobnov, E.M. BUSCO: Assessing Genome Assembly and Annotation Completeness. Methods in Molecular Biology 1962, 227-245, 2019.
    Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov, E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210-3212, 2015.
    Stanke, M., Diekhans, M., Baertsch, R., and Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637-644, 2008.
    Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., and Morgenstern, B. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research 34, W435-W439, 2006.
    Stanke, M., and Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, ii215-ii225, 2003.
    Su, Y.M. Biological characterization of a heterotrophic marine microalga - Aurantiochytrium sp. strain BL10 isolated from northern Taiwan, Master thesis. Retrieved from National Cheng Kung University of Institute of Biotechnology, 2012.
    Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., and Mesirov, J.P. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102, 15545, 2005.
    Surget-Groba, Y., and Montoya-Burgos, J.I. Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Research 20, 1432-1440, 2010.
    Tørresen, O.K., Star, B., Mier, P., Andrade-Navarro, M.A., Bateman, A., Jarnot, P., Gruca, A., Grynberg, M., Kajava, A.V., Promponas, V.J., Anisimova, M., Jakobsen, K.S., and Linke, D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Research 47, 10994-11006, 2019.
    Todd, R.T., Forche, A., and Selmecki, A. Ploidy Variation in Fungi: Polyploidy, Aneuploidy, and Genome Evolution. Microbiology Spectrum 5, 599-618. 2017.
    Waterhouse, R.M., Tegenfeldt, F., Li, J., Zdobnov, E.M., and Kriventseva, E.V. OrthoDB: a hierarchical catalog of animal, fungal and bacterial orthologs. Nucleic Acids Research 41, D358-365, 2013.
    Waterhouse, R.M., Zdobnov, E.M., and Kriventseva, E.V. Correlating Traits of Gene Retention, Sequence Divergence, Duplicability and Essentiality in Vertebrates, Arthropods, and Fungi. Genome Biology and Evolution 3, 75-86, 2010.
    Weiss, C.L., Pais, M., Cano, L.M., Kamoun, S., and Burbano, H.A. nQuire: a statistical framework for ploidy estimation using next generation sequencing. BioMed Central Bioinformatics 19, 122, 2018.
    Yandell, M., and Ence, D. A beginner's guide to eukaryotic genome annotation. Nature Reviews Genetics 13, 329-342, 2012.
    Yang, H.L., Lu, C.K., Chen, S.F., Chen, Y.M., and Chen, Y.M. Isolation and Characterization of Taiwanese Heterotrophic Microalgae: Screening of Strains for Docosahexaenoic Acid (DHA) Production. Marine Biotechnology 12, 173-185, 2010.
    Yassour, M., Kaplan, T., Fraser, H.B., Levin, J.Z., Pfiffner, J., Adiconis, X., Schroth, G., Luo, S., Khrebtukova, I., Gnirke, A., Nusbaum, C., Thompson, D.-A., Friedman, N., and Regev, A. Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proceedings of the National Academy of Sciences of the United States of America 106, 3264-3269, 2009.
    Zerbino, D.R., and Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18, 821-829, 2008.
    Zhu, Z., Zhang, S., Liu, H., Shen, H., Lin, X., Yang, F., Zhou, Y.J., Jin, G., Ye, M., Zou, H., and Zhao, Z.K. A multi-omic map of the lipid-producing yeast Rhodosporidium toruloides. Nature Communications 3, 1112, 2012.
    Zimin, A.V., Marçais, G., Puiu, D., Roberts, M., Salzberg, S.L., and Yorke, J.A. The MaSuRCA genome assembler. Bioinformatics 29, 2669-2677, 2013.

    下載圖示 校內:2024-06-30公開
    校外:2024-06-30公開
    QR CODE