研究生: |
楊茜雯 Yang, Chien-Wen |
---|---|
論文名稱: |
利用ChIP-seq大數據建立植物轉錄調控資料庫與整合分析平台 Construction of a plant transcription regulatory database and integrated analysis platform based on ChIP-seq data |
指導教授: |
張文綺
Chang, Wen-Chi |
學位類別: |
碩士 Master |
系所名稱: |
生物科學與科技學院 - 熱帶植物與微生物科學研究所 Institute of Tropical Plant Sciences |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 英文 |
論文頁數: | 64 |
中文關鍵詞: | 染色體免疫沉澱測序資料庫 、轉錄調控網路 、資料探勘 、轉錄因子 、資料視覺化 |
外文關鍵詞: | ChIP-seq database, Transcriptional regulatory network, Data mining, Transcription factor, Graphical visualization |
相關次數: | 點閱:58 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於定序技術不斷進步,現今在研究轉錄調控機制時,常使用染色體免疫沉澱定序(ChIP-seq)檢測轉錄因子在染色體上的結合位點,探索蛋白質與染色體的交互作用及調控關係。公開資源如GEO或是SRA等資料庫雖存放了大量的ChIP-seq實驗資料,但卻因缺乏系統性的整理與分析導致這些資料無法被有效率地使用。許多研究團隊在近幾年開發了ChIP-seq相關的資料庫提供更便利地資料檢索及下游分析功能,但針對植物物種設計的ChIP-seq資料庫卻非常稀少,相比動物資料庫分析功能也匱乏許多。為解決上述問題,本實驗室先前研究中曾開發一個名為PCBase (Plant ChIP-seq Database)的植物ChIP-seq資料庫,提供使用者查找植物的蛋白質與DNA的調控關係,但PCBase的資料整合及資料視覺化方面的功能仍不夠完善。因此本研究改善舊有PCBase的缺點,將之升級為PCBase 2.0資料庫,其涵蓋了七個高等植物與一個藻類物種,收錄了205種不同的轉錄調控蛋白,合計共1,982筆的ChIP-seq資料,並透過重新設計資料庫介面及功能來改善資料庫功能及其資料可理解性。在此更新版本中,我們引入標籤化整理系統,藉由將所有ChIP-seq資料的實驗組織及處理條件標籤化,提升資料庫的資料檢索效率並實現整合分析的功能。除此之外,藉由資料標籤索引優化了ChIP-seq的分析結果,PCBase 2.0針對基因及蛋白質的調控功能分析新增了 (1)各條件處理/組織下蛋白質在目標基因上的結合區域偏好分析、(2)調控因子結合熱點動態顯示工具、(3)各處理條件下調控蛋白之標靶基因富集功能分析。另外PCBase 2.0使用MEME及STRME分析轉錄因子的特徵序列,並將所獲得之轉錄因子結合位置權重矩陣(position weight matrices)用於掃描任意序列上潛在的轉錄因子結合區域。藉由改善先前資料庫分析系統的不足,我們希望PCBase 2.0能夠提供使用者一個更加友善的ChIP-seq資料庫平台,幫助研究者們能夠更便利地使用實驗驗證之大數據分析結果來研究植物細胞內的轉錄調控關係。
Chromatin Immunoprecipitation sequencing (ChIP-seq) is a powerful technique used to study the interactions between proteins and DNA throughout the entire genome. Public repositories such as GEO and SRA contain a vast amount of ChIP-seq data, but a significant portion of it is remains unorganized or unanalyzed. This poses a challenge for researchers who want to obtain insights into gene regulation. While some ChIP-seq databases were developed from previous studies, there is still a shortage of plant-specific databases for ChIP-seq data mining and analysis. In this study, a database named PCBase 2.0 (http://PCBase.itps.ncku.edu.tw) has been established to identify the transcriptional relationships of 205 different transcription regulators from 1,982 ChIP-seq samples across seven land plant species and one green alga. All ChIP-seq data were manually curated and systematically analyzed. To improve the understanding and usability of ChIP-seq data, a tagging system was used to classify the disorganized ChIP-seq samples into corresponding tissue/treatment clusters. This helped improve the following integrated analyses: (i) searchable gene-centered interfaces revealing the binding preferences of genetic regions based on experimental condition clusters, (ii) an in-house developed peak-visualization function illustrating all regulatory proteins occupancy regions dynamically, and (iii) significantly enriched functional annotations of the regulator target genes in different conditions. Moreover, the de novo motif discovery of each transcription factor was performed by MEME and STREME algorithms. The position weight matrices were further used to scan the potential transcription factor binding sites on any promoter sequence of interest. As stated above, we believe that PCBase 2.0 represents a user-friendly platform to investigate transcriptional regulatory relationships based on experimental ChIP-seq data.
Alejandra Mandel, M., Gustafson-Brown, C., Savidge, B., & Yanofsky, M. F. (1992). Molecular characterization of the Arabidopsis floral homeotic gene APETALA1. Nature, 360(6401), 273-277. https://doi.org/10.1038/360273a0
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., & Sherlock, G. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics, 25(1), 25-29. https://doi.org/10.1038/75556
Bailey, T. L. (2021). STREME: accurate and versatile sequence motif discovery. Bioinformatics, 37(18), 2834-2840. https://doi.org/10.1093/bioinformatics/btab203
Bailey, T. L., Johnson, J., Grant, C. E., & Noble, W. S. (2015). The MEME Suite. Nucleic Acids Res, 43(W1), W39-49. https://doi.org/10.1093/nar/gkv416
Balasubramanian, S., Sureshkumar, S., Lempe, J., & Weigel, D. (2006). Potent Induction of Arabidopsis thaliana Flowering by Elevated Growth Temperature. PLOS Genetics, 2(7), e106. https://doi.org/10.1371/journal.pgen.0020106
Bowman, J. L., Alvarez, J., Weigel, D., Meyerowitz, E. M., & Smyth, D. R. (1993). Control of flower development in Arabidopsis thaliana by APETALA1 and interacting genes. Development, 119(3), 721-743. https://doi.org/10.1242/dev.119.3.721
Buels, R., Yao, E., Diesh, C. M., Hayes, R. D., Munoz-Torres, M., Helt, G., Goodstein, D. M., Elsik, C. G., Lewis, S. E., Stein, L., & Holmes, I. H. (2016). JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biology, 17(1), 66. https://doi.org/10.1186/s13059-016-0924-1
Casamassimi, A., & Ciccodicola, A. (2019). Transcriptional Regulation: Molecules, Involved Mechanisms, and Misregulation. Int J Mol Sci, 20(6). https://doi.org/10.3390/ijms20061281
Chow, C.-N., Lee, T.-Y., Hung, Y.-C., Li, G.-Z., Tseng, K.-C., Liu, Y.-H., Kuo, P.-L., Zheng, H.-Q., & Chang, W.-C. (2019). PlantPAN3.0: a new and updated resource for reconstructing transcriptional regulatory networks from ChIP-seq experiments in plants. Nucleic Acids Research, 47(D1), D1155-D1163. https://doi.org/10.1093/nar/gky1081
Chow, C.-N., Tseng, K.-C., Hou, P.-F., Wu, N.-Y., Lee, T.-Y., & Chang, W.-C. (2022). Mysteries of gene regulation: Promoters are not the sole triggers of gene expression. Computational and Structural Biotechnology Journal, 20, 4910-4920. https://doi.org/https://doi.org/10.1016/j.csbj.2022.08.058
Consortium, T. G. O., Aleksander, S. A., Balhoff, J., Carbon, S., Cherry, J. M., Drabkin, H. J., Ebert, D., Feuermann, M., Gaudet, P., Harris, N. L., Hill, D. P., Lee, R., Mi, H., Moxon, S., Mungall, C. J., Muruganugan, A., Mushayahama, T., Sternberg, P. W., Thomas, P. D., . . . Westerfield, M. (2023). The Gene Ontology knowledgebase in 2023. Genetics, 224(1). https://doi.org/10.1093/genetics/iyad031
Danecek, P., Bonfield, J. K., Liddle, J., Marshall, J., Ohan, V., Pollard, M. O., Whitwham, A., Keane, T., McCarthy, S. A., Davies, R. M., & Li, H. (2021). Twelve years of SAMtools and BCFtools. GigaScience, 10(2). https://doi.org/10.1093/gigascience/giab008
Edgar, R., Domrachev, M., & Lash, A. E. (2002). Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research, 30(1), 207-210. https://doi.org/10.1093/nar/30.1.207
Feng, J., Liu, T., Qin, B., Zhang, Y., & Liu, X. S. (2012). Identifying ChIP-seq enrichment using MACS. Nature Protocols, 7(9), 1728-1740. https://doi.org/10.1038/nprot.2012.101
Fu, L.-Y., Zhu, T., Zhou, X., Yu, R., He, Z., Zhang, P., Wu, Z., Chen, M., Kaufmann, K., & Chen, D. (2022). ChIP-Hub provides an integrative platform for exploring plant regulome. Nature Communications, 13(1), 3413. https://doi.org/10.1038/s41467-022-30770-1
Godwin, J., & Farrona, S. (2022). The Importance of Networking: Plant Polycomb Repressive Complex 2 and Its Interactors. Epigenomes, 6(1), 8. https://www.mdpi.com/2075-4655/6/1/8
Gustafson-Brown, C., Savidge, B., & Yanofsky, M. F. (1994). Regulation of the arabidopsis floral homeotic gene APETALA1. Cell, 76(1), 131-143. https://doi.org/https://doi.org/10.1016/0092-8674(94)90178-3
Handy, D. E., Castro, R., & Loscalzo, J. (2011). Epigenetic Modifications. Circulation, 123(19), 2145-2156. https://doi.org/doi:10.1161/CIRCULATIONAHA.110.956839
Hanemian, M., Vasseur, F., Marchadier, E., Gilbault, E., Bresson, J., Gy, I., Violle, C., & Loudet, O. (2020). Natural variation at FLM splicing has pleiotropic effects modulating ecological strategies in Arabidopsis thaliana. Nature Communications, 11(1), 4140. https://doi.org/10.1038/s41467-020-17896-w
Helliwell, C. A., Wood, C. C., Robertson, M., James Peacock, W., & Dennis, E. S. (2006). The Arabidopsis FLC protein interacts directly in vivo with SOC1 and FT chromatin and is part of a high-molecular-weight protein complex. The Plant Journal, 46(2), 183-192. https://doi.org/https://doi.org/10.1111/j.1365-313X.2006.02686.x
Hitz, B. C., Lee, J.-W., Jolanki, O., Kagda, M. S., Graham, K., Sud, P., Gabdank, I., Seth Strattan, J., Sloan, C. A., Dreszer, T., Rowe, L. D., Podduturi, N. R., Malladi, V. S., Chan, E. T., Davidson, J. M., Ho, M., Miyasato, S., Simison, M., Tanaka, F., . . . Cherry, J. M. (2023). The ENCODE Uniform Analysis Pipelines. bioRxiv, 2023.2004.2004.535623. https://doi.org/10.1101/2023.04.04.535623
Huang, J., Zheng, W., Zhang, P., Lin, Q., Chen, Z., Xuan, J., Liu, C., Wu, D., Huang, Q., Zheng, L., Liu, S., Zhou, K., Qu, L., Li, B., & Yang, J. (2022). ChIPBase v3.0: the encyclopedia of transcriptional regulations of non-coding RNAs and protein-coding genes. Nucleic Acids Research, 51(D1), D46-D56. https://doi.org/10.1093/nar/gkac1067
Jerkovic´, I., & Cavalli, G. (2021). Understanding 3D genome organization by multidisciplinary methods. Nature Reviews Molecular Cell Biology, 22(8), 511-528. https://doi.org/10.1038/s41580-021-00362-w
Jin, R., Klasfeld, S., Zhu, Y., Fernandez Garcia, M., Xiao, J., Han, S.-K., Konkol, A., & Wagner, D. (2021). LEAFY is a pioneer transcription factor and licenses cell reprogramming to floral fate. Nature Communications, 12(1), 626. https://doi.org/10.1038/s41467-020-20883-w
Jin, S., Kim, S. Y., Susila, H., Nasim, Z., Youn, G., & Ahn, J. H. (2022). FLOWERING LOCUS M isoforms differentially affect the subcellular localization and stability of SHORT VEGETATIVE PHASE to regulate temperature-responsive flowering in Arabidopsis. Molecular Plant, 15(11), 1696-1709. https://doi.org/https://doi.org/10.1016/j.molp.2022.08.007
Kadauke, S., & Blobel, G. A. (2009). Chromatin loops in gene regulation. Biochim Biophys Acta, 1789(1), 17-25. https://doi.org/10.1016/j.bbagrm.2008.07.002
Kel, A. E., Gößling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis, O. V., & Wingender, E. (2003). MATCHTM: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Research, 31(13), 3576-3579. https://doi.org/10.1093/nar/gkg585
Kim, D.-H., Doyle, M. R., Sung, S., & Amasino, R. M. (2009). Vernalization: Winter and the Timing of Flowering in Plants. Annual Review of Cell and Developmental Biology, 25(Volume 25, 2009), 277-299. https://doi.org/https://doi.org/10.1146/annurev.cellbio.042308.113411
Kodama, Y., Shumway, M., & Leinonen, R. (2012). The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res, 40(Database issue), D54-56. https://doi.org/10.1093/nar/gkr854
Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357-359. https://doi.org/10.1038/nmeth.1923
Machanick, P., & Bailey, T. L. (2011). MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics, 27(12), 1696-1697. https://doi.org/10.1093/bioinformatics/btr189
Macrae, R. K., & Long, J. A. Transcriptional Regulation in Plants. In Encyclopedia of Life Sciences. https://doi.org/https://doi.org/10.1002/9780470015902.a0023755
Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal; Vol 17, No 1: Next Generation Sequencing Data Analysis. https://doi.org/10.14806/ej.17.1.200
Molina, C., & Grotewold, E. (2005). Genome wide analysis of Arabidopsis core promoters. BMC Genomics, 6(1), 25. https://doi.org/10.1186/1471-2164-6-25
Park, P. J. (2009). ChIP–seq: advantages and challenges of a maturing technology. Nature Reviews Genetics, 10(10), 669-680. https://doi.org/10.1038/nrg2641
Posé, D., Verhage, L., Ott, F., Yant, L., Mathieu, J., Angenent, G. C., Immink, R. G. H., & Schmid, M. (2013). Temperature-dependent regulation of flowering by antagonistic FLM variants. Nature, 503(7476), 414-417. https://doi.org/10.1038/nature12633
Riechmann, J. L. (2002). Transcriptional regulation: a genomic overview. Arabidopsis Book, 1, e0085. https://doi.org/10.1199/tab.0085
Scortecci, K. C., Michaels, S. D., & Amasino, R. M. (2001). Identification of a MADS-box gene, FLOWERING LOCUS M, that represses flowering. The Plant Journal, 26(2), 229-236. https://doi.org/https://doi.org/10.1046/j.1365-313x.2001.01024.x
Searle, I., He, Y., Turck, F., Vincent, C., Fornara, F., Kröber, S., Amasino, R. A., & Coupland, G. (2006). The transcription factor FLC confers a flowering response to vernalization by repressing meristem competence and systemic signaling in Arabidopsis. Genes Dev, 20(7), 898-912. https://doi.org/10.1101/gad.373506
Sung, S., He, Y., Eshoo, T. W., Tamada, Y., Johnson, L., Nakahigashi, K., Goto, K., Jacobsen, S. E., & Amasino, R. M. (2006). Epigenetic maintenance of the vernalized state in Arabidopsis thaliana requires LIKE HETEROCHROMATIN PROTEIN 1. Nature Genetics, 38(6), 706-710. https://doi.org/10.1038/ng1795
Taing, L., Dandawate, A., L’Yi, S., Gehlenborg, N., Brown, M., & Meyer, Clifford A. (2023). Cistrome Data Browser: integrated search, analysis and visualization of chromatin data. Nucleic Acids Research, 52(D1), D61-D66. https://doi.org/10.1093/nar/gkad1069
Wagner, D., Sablowski, R. W. M., & Meyerowitz, E. M. (1999). Transcriptional Activation of APETALA1 by LEAFY. Science, 285(5427), 582-584. https://doi.org/doi:10.1126/science.285.5427.582
Yamaguchi, N., Matsubara, S., Yoshimizu, K., Seki, M., Hamada, K., Kamitani, M., Kurita, Y., Nomura, Y., Nagashima, K., Inagaki, S., Suzuki, T., Gan, E.-S., To, T., Kakutani, T., Nagano, A. J., Satake, A., & Ito, T. (2021). H3K27me3 demethylases alter HSP22 and HSP17.6C expression in response to recurring heat in Arabidopsis. Nature Communications, 12(1), 3480. https://doi.org/10.1038/s41467-021-23766-w
Zou, Z., Ohta, T., Miura, F., & Oki, S. (2022). ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating ChIP-seq, ATAC-seq and Bisulfite-seq data. Nucleic Acids Research, 50(W1), W175-W182. https://doi.org/10.1093/nar/gkac199