簡易檢索 / 詳目顯示

研究生: 楊茜雯
Yang, Chien-Wen
論文名稱: 利用ChIP-seq大數據建立植物轉錄調控資料庫與整合分析平台
Construction of a plant transcription regulatory database and integrated analysis platform based on ChIP-seq data
指導教授: 張文綺
Chang, Wen-Chi
學位類別: 碩士
Master
系所名稱: 生物科學與科技學院 - 熱帶植物與微生物科學研究所
Institute of Tropical Plant Sciences
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 64
中文關鍵詞: 染色體免疫沉澱測序資料庫轉錄調控網路資料探勘轉錄因子資料視覺化
外文關鍵詞: ChIP-seq database, Transcriptional regulatory network, Data mining, Transcription factor, Graphical visualization
相關次數: 點閱:58下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於定序技術不斷進步,現今在研究轉錄調控機制時,常使用染色體免疫沉澱定序(ChIP-seq)檢測轉錄因子在染色體上的結合位點,探索蛋白質與染色體的交互作用及調控關係。公開資源如GEO或是SRA等資料庫雖存放了大量的ChIP-seq實驗資料,但卻因缺乏系統性的整理與分析導致這些資料無法被有效率地使用。許多研究團隊在近幾年開發了ChIP-seq相關的資料庫提供更便利地資料檢索及下游分析功能,但針對植物物種設計的ChIP-seq資料庫卻非常稀少,相比動物資料庫分析功能也匱乏許多。為解決上述問題,本實驗室先前研究中曾開發一個名為PCBase (Plant ChIP-seq Database)的植物ChIP-seq資料庫,提供使用者查找植物的蛋白質與DNA的調控關係,但PCBase的資料整合及資料視覺化方面的功能仍不夠完善。因此本研究改善舊有PCBase的缺點,將之升級為PCBase 2.0資料庫,其涵蓋了七個高等植物與一個藻類物種,收錄了205種不同的轉錄調控蛋白,合計共1,982筆的ChIP-seq資料,並透過重新設計資料庫介面及功能來改善資料庫功能及其資料可理解性。在此更新版本中,我們引入標籤化整理系統,藉由將所有ChIP-seq資料的實驗組織及處理條件標籤化,提升資料庫的資料檢索效率並實現整合分析的功能。除此之外,藉由資料標籤索引優化了ChIP-seq的分析結果,PCBase 2.0針對基因及蛋白質的調控功能分析新增了 (1)各條件處理/組織下蛋白質在目標基因上的結合區域偏好分析、(2)調控因子結合熱點動態顯示工具、(3)各處理條件下調控蛋白之標靶基因富集功能分析。另外PCBase 2.0使用MEME及STRME分析轉錄因子的特徵序列,並將所獲得之轉錄因子結合位置權重矩陣(position weight matrices)用於掃描任意序列上潛在的轉錄因子結合區域。藉由改善先前資料庫分析系統的不足,我們希望PCBase 2.0能夠提供使用者一個更加友善的ChIP-seq資料庫平台,幫助研究者們能夠更便利地使用實驗驗證之大數據分析結果來研究植物細胞內的轉錄調控關係。

    Chromatin Immunoprecipitation sequencing (ChIP-seq) is a powerful technique used to study the interactions between proteins and DNA throughout the entire genome. Public repositories such as GEO and SRA contain a vast amount of ChIP-seq data, but a significant portion of it is remains unorganized or unanalyzed. This poses a challenge for researchers who want to obtain insights into gene regulation. While some ChIP-seq databases were developed from previous studies, there is still a shortage of plant-specific databases for ChIP-seq data mining and analysis. In this study, a database named PCBase 2.0 (http://PCBase.itps.ncku.edu.tw) has been established to identify the transcriptional relationships of 205 different transcription regulators from 1,982 ChIP-seq samples across seven land plant species and one green alga. All ChIP-seq data were manually curated and systematically analyzed. To improve the understanding and usability of ChIP-seq data, a tagging system was used to classify the disorganized ChIP-seq samples into corresponding tissue/treatment clusters. This helped improve the following integrated analyses: (i) searchable gene-centered interfaces revealing the binding preferences of genetic regions based on experimental condition clusters, (ii) an in-house developed peak-visualization function illustrating all regulatory proteins occupancy regions dynamically, and (iii) significantly enriched functional annotations of the regulator target genes in different conditions. Moreover, the de novo motif discovery of each transcription factor was performed by MEME and STREME algorithms. The position weight matrices were further used to scan the potential transcription factor binding sites on any promoter sequence of interest. As stated above, we believe that PCBase 2.0 represents a user-friendly platform to investigate transcriptional regulatory relationships based on experimental ChIP-seq data.

    中文摘要 I Abstract II 誌謝 III INDEX OF CONTENTS IV LIST OF TABLES V LIST OF FIGURES VI LIST OF ABBREVIATIONS VII 1. INTRODUCTION 1 1.1 The importance of transcriptional regulation in plant 1 1.2 The current experimental methods in studying transcription regulation 2 1.3 Introduction of chromatin immunoprecipitation sequencing 3 1.4 The current ChIP-seq databases and the limitations of plant-specific resources 4 1.5 The specific aims of this study 5 2. MATERIALS AND METHODS 6 2.1 ChIP-seq data collection 6 2.2 Workflow of ChIP-seq data processing 6 2.3 Integration of species genome annotation file and ChIP-seq analysis results 7 2.4 Identification of TFBS and Construction of Promoter Analysis 8 2.5 Construction of the website interface 9 3. RESULTS AND DISCUSSIONS 10 3.1 Overview of the website interface and the function improvements 10 3.1.1 Gene Search 10 3.1.2 Protein Search 11 3.1.3 Promoter Analysis 12 3.1.4 Genome Browser 12 3.1.5 Download 13 3.2 Case study of the Gene Search function 13 3.3 Case study of the Protein Search function 14 4. CONCLUSION AND PROSPECTS 16 5. REFERENCE 17

    Alejandra Mandel, M., Gustafson-Brown, C., Savidge, B., & Yanofsky, M. F. (1992). Molecular characterization of the Arabidopsis floral homeotic gene APETALA1. Nature, 360(6401), 273-277. https://doi.org/10.1038/360273a0
    Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., & Sherlock, G. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics, 25(1), 25-29. https://doi.org/10.1038/75556
    Bailey, T. L. (2021). STREME: accurate and versatile sequence motif discovery. Bioinformatics, 37(18), 2834-2840. https://doi.org/10.1093/bioinformatics/btab203
    Bailey, T. L., Johnson, J., Grant, C. E., & Noble, W. S. (2015). The MEME Suite. Nucleic Acids Res, 43(W1), W39-49. https://doi.org/10.1093/nar/gkv416
    Balasubramanian, S., Sureshkumar, S., Lempe, J., & Weigel, D. (2006). Potent Induction of Arabidopsis thaliana Flowering by Elevated Growth Temperature. PLOS Genetics, 2(7), e106. https://doi.org/10.1371/journal.pgen.0020106
    Bowman, J. L., Alvarez, J., Weigel, D., Meyerowitz, E. M., & Smyth, D. R. (1993). Control of flower development in Arabidopsis thaliana by APETALA1 and interacting genes. Development, 119(3), 721-743. https://doi.org/10.1242/dev.119.3.721
    Buels, R., Yao, E., Diesh, C. M., Hayes, R. D., Munoz-Torres, M., Helt, G., Goodstein, D. M., Elsik, C. G., Lewis, S. E., Stein, L., & Holmes, I. H. (2016). JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biology, 17(1), 66. https://doi.org/10.1186/s13059-016-0924-1
    Casamassimi, A., & Ciccodicola, A. (2019). Transcriptional Regulation: Molecules, Involved Mechanisms, and Misregulation. Int J Mol Sci, 20(6). https://doi.org/10.3390/ijms20061281
    Chow, C.-N., Lee, T.-Y., Hung, Y.-C., Li, G.-Z., Tseng, K.-C., Liu, Y.-H., Kuo, P.-L., Zheng, H.-Q., & Chang, W.-C. (2019). PlantPAN3.0: a new and updated resource for reconstructing transcriptional regulatory networks from ChIP-seq experiments in plants. Nucleic Acids Research, 47(D1), D1155-D1163. https://doi.org/10.1093/nar/gky1081
    Chow, C.-N., Tseng, K.-C., Hou, P.-F., Wu, N.-Y., Lee, T.-Y., & Chang, W.-C. (2022). Mysteries of gene regulation: Promoters are not the sole triggers of gene expression. Computational and Structural Biotechnology Journal, 20, 4910-4920. https://doi.org/https://doi.org/10.1016/j.csbj.2022.08.058
    Consortium, T. G. O., Aleksander, S. A., Balhoff, J., Carbon, S., Cherry, J. M., Drabkin, H. J., Ebert, D., Feuermann, M., Gaudet, P., Harris, N. L., Hill, D. P., Lee, R., Mi, H., Moxon, S., Mungall, C. J., Muruganugan, A., Mushayahama, T., Sternberg, P. W., Thomas, P. D., . . . Westerfield, M. (2023). The Gene Ontology knowledgebase in 2023. Genetics, 224(1). https://doi.org/10.1093/genetics/iyad031
    Danecek, P., Bonfield, J. K., Liddle, J., Marshall, J., Ohan, V., Pollard, M. O., Whitwham, A., Keane, T., McCarthy, S. A., Davies, R. M., & Li, H. (2021). Twelve years of SAMtools and BCFtools. GigaScience, 10(2). https://doi.org/10.1093/gigascience/giab008
    Edgar, R., Domrachev, M., & Lash, A. E. (2002). Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research, 30(1), 207-210. https://doi.org/10.1093/nar/30.1.207
    Feng, J., Liu, T., Qin, B., Zhang, Y., & Liu, X. S. (2012). Identifying ChIP-seq enrichment using MACS. Nature Protocols, 7(9), 1728-1740. https://doi.org/10.1038/nprot.2012.101
    Fu, L.-Y., Zhu, T., Zhou, X., Yu, R., He, Z., Zhang, P., Wu, Z., Chen, M., Kaufmann, K., & Chen, D. (2022). ChIP-Hub provides an integrative platform for exploring plant regulome. Nature Communications, 13(1), 3413. https://doi.org/10.1038/s41467-022-30770-1
    Godwin, J., & Farrona, S. (2022). The Importance of Networking: Plant Polycomb Repressive Complex 2 and Its Interactors. Epigenomes, 6(1), 8. https://www.mdpi.com/2075-4655/6/1/8
    Gustafson-Brown, C., Savidge, B., & Yanofsky, M. F. (1994). Regulation of the arabidopsis floral homeotic gene APETALA1. Cell, 76(1), 131-143. https://doi.org/https://doi.org/10.1016/0092-8674(94)90178-3
    Handy, D. E., Castro, R., & Loscalzo, J. (2011). Epigenetic Modifications. Circulation, 123(19), 2145-2156. https://doi.org/doi:10.1161/CIRCULATIONAHA.110.956839
    Hanemian, M., Vasseur, F., Marchadier, E., Gilbault, E., Bresson, J., Gy, I., Violle, C., & Loudet, O. (2020). Natural variation at FLM splicing has pleiotropic effects modulating ecological strategies in Arabidopsis thaliana. Nature Communications, 11(1), 4140. https://doi.org/10.1038/s41467-020-17896-w
    Helliwell, C. A., Wood, C. C., Robertson, M., James Peacock, W., & Dennis, E. S. (2006). The Arabidopsis FLC protein interacts directly in vivo with SOC1 and FT chromatin and is part of a high-molecular-weight protein complex. The Plant Journal, 46(2), 183-192. https://doi.org/https://doi.org/10.1111/j.1365-313X.2006.02686.x
    Hitz, B. C., Lee, J.-W., Jolanki, O., Kagda, M. S., Graham, K., Sud, P., Gabdank, I., Seth Strattan, J., Sloan, C. A., Dreszer, T., Rowe, L. D., Podduturi, N. R., Malladi, V. S., Chan, E. T., Davidson, J. M., Ho, M., Miyasato, S., Simison, M., Tanaka, F., . . . Cherry, J. M. (2023). The ENCODE Uniform Analysis Pipelines. bioRxiv, 2023.2004.2004.535623. https://doi.org/10.1101/2023.04.04.535623
    Huang, J., Zheng, W., Zhang, P., Lin, Q., Chen, Z., Xuan, J., Liu, C., Wu, D., Huang, Q., Zheng, L., Liu, S., Zhou, K., Qu, L., Li, B., & Yang, J. (2022). ChIPBase v3.0: the encyclopedia of transcriptional regulations of non-coding RNAs and protein-coding genes. Nucleic Acids Research, 51(D1), D46-D56. https://doi.org/10.1093/nar/gkac1067
    Jerkovic´, I., & Cavalli, G. (2021). Understanding 3D genome organization by multidisciplinary methods. Nature Reviews Molecular Cell Biology, 22(8), 511-528. https://doi.org/10.1038/s41580-021-00362-w
    Jin, R., Klasfeld, S., Zhu, Y., Fernandez Garcia, M., Xiao, J., Han, S.-K., Konkol, A., & Wagner, D. (2021). LEAFY is a pioneer transcription factor and licenses cell reprogramming to floral fate. Nature Communications, 12(1), 626. https://doi.org/10.1038/s41467-020-20883-w
    Jin, S., Kim, S. Y., Susila, H., Nasim, Z., Youn, G., & Ahn, J. H. (2022). FLOWERING LOCUS M isoforms differentially affect the subcellular localization and stability of SHORT VEGETATIVE PHASE to regulate temperature-responsive flowering in Arabidopsis. Molecular Plant, 15(11), 1696-1709. https://doi.org/https://doi.org/10.1016/j.molp.2022.08.007
    Kadauke, S., & Blobel, G. A. (2009). Chromatin loops in gene regulation. Biochim Biophys Acta, 1789(1), 17-25. https://doi.org/10.1016/j.bbagrm.2008.07.002
    Kel, A. E., Gößling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis, O. V., & Wingender, E. (2003). MATCHTM: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Research, 31(13), 3576-3579. https://doi.org/10.1093/nar/gkg585
    Kim, D.-H., Doyle, M. R., Sung, S., & Amasino, R. M. (2009). Vernalization: Winter and the Timing of Flowering in Plants. Annual Review of Cell and Developmental Biology, 25(Volume 25, 2009), 277-299. https://doi.org/https://doi.org/10.1146/annurev.cellbio.042308.113411
    Kodama, Y., Shumway, M., & Leinonen, R. (2012). The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res, 40(Database issue), D54-56. https://doi.org/10.1093/nar/gkr854
    Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357-359. https://doi.org/10.1038/nmeth.1923
    Machanick, P., & Bailey, T. L. (2011). MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics, 27(12), 1696-1697. https://doi.org/10.1093/bioinformatics/btr189
    Macrae, R. K., & Long, J. A. Transcriptional Regulation in Plants. In Encyclopedia of Life Sciences. https://doi.org/https://doi.org/10.1002/9780470015902.a0023755
    Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal; Vol 17, No 1: Next Generation Sequencing Data Analysis. https://doi.org/10.14806/ej.17.1.200
    Molina, C., & Grotewold, E. (2005). Genome wide analysis of Arabidopsis core promoters. BMC Genomics, 6(1), 25. https://doi.org/10.1186/1471-2164-6-25
    Park, P. J. (2009). ChIP–seq: advantages and challenges of a maturing technology. Nature Reviews Genetics, 10(10), 669-680. https://doi.org/10.1038/nrg2641
    Posé, D., Verhage, L., Ott, F., Yant, L., Mathieu, J., Angenent, G. C., Immink, R. G. H., & Schmid, M. (2013). Temperature-dependent regulation of flowering by antagonistic FLM variants. Nature, 503(7476), 414-417. https://doi.org/10.1038/nature12633
    Riechmann, J. L. (2002). Transcriptional regulation: a genomic overview. Arabidopsis Book, 1, e0085. https://doi.org/10.1199/tab.0085
    Scortecci, K. C., Michaels, S. D., & Amasino, R. M. (2001). Identification of a MADS-box gene, FLOWERING LOCUS M, that represses flowering. The Plant Journal, 26(2), 229-236. https://doi.org/https://doi.org/10.1046/j.1365-313x.2001.01024.x
    Searle, I., He, Y., Turck, F., Vincent, C., Fornara, F., Kröber, S., Amasino, R. A., & Coupland, G. (2006). The transcription factor FLC confers a flowering response to vernalization by repressing meristem competence and systemic signaling in Arabidopsis. Genes Dev, 20(7), 898-912. https://doi.org/10.1101/gad.373506
    Sung, S., He, Y., Eshoo, T. W., Tamada, Y., Johnson, L., Nakahigashi, K., Goto, K., Jacobsen, S. E., & Amasino, R. M. (2006). Epigenetic maintenance of the vernalized state in Arabidopsis thaliana requires LIKE HETEROCHROMATIN PROTEIN 1. Nature Genetics, 38(6), 706-710. https://doi.org/10.1038/ng1795
    Taing, L., Dandawate, A., L’Yi, S., Gehlenborg, N., Brown, M., & Meyer, Clifford A. (2023). Cistrome Data Browser: integrated search, analysis and visualization of chromatin data. Nucleic Acids Research, 52(D1), D61-D66. https://doi.org/10.1093/nar/gkad1069
    Wagner, D., Sablowski, R. W. M., & Meyerowitz, E. M. (1999). Transcriptional Activation of APETALA1 by LEAFY. Science, 285(5427), 582-584. https://doi.org/doi:10.1126/science.285.5427.582
    Yamaguchi, N., Matsubara, S., Yoshimizu, K., Seki, M., Hamada, K., Kamitani, M., Kurita, Y., Nomura, Y., Nagashima, K., Inagaki, S., Suzuki, T., Gan, E.-S., To, T., Kakutani, T., Nagano, A. J., Satake, A., & Ito, T. (2021). H3K27me3 demethylases alter HSP22 and HSP17.6C expression in response to recurring heat in Arabidopsis. Nature Communications, 12(1), 3480. https://doi.org/10.1038/s41467-021-23766-w
    Zou, Z., Ohta, T., Miura, F., & Oki, S. (2022). ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating ChIP-seq, ATAC-seq and Bisulfite-seq data. Nucleic Acids Research, 50(W1), W175-W182. https://doi.org/10.1093/nar/gkac199

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE