簡易檢索 / 詳目顯示

研究生: 陳怡安
Chen, Yi-An
論文名稱: 阿拉伯芥基因轉錄調控網路暨啟動子預測模型之建立
Construction of transcriptional regulatory networks and promoter prediction model in Arabidopsis thaliana
指導教授: 張文綺
Chang, Wen-Chi
學位類別: 碩士
Master
系所名稱: 生物科學與科技學院 - 熱帶植物科學研究所
Institute of Tropical Plant Sciences
論文出版年: 2012
畢業學年度: 100
語文別: 英文
論文頁數: 41
中文關鍵詞: 轉錄調控網路轉錄因子啟動子轉錄起始位置向量支持器演算法
外文關鍵詞: transcriptional regulatory network, transcription factor, promoter, transcription start site, support vector machine
相關次數: 點閱:71下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 建立植物轉錄調控網路在系統生物學中已經成為一個重要課題。隨著生物晶片及轉錄因子結合位置(TFBSs)資料的增加,建立轉錄調控網路變成是刻不容緩的工作。為了這個目標,這個研究建立了一個資料庫支援系統-AtPAN (Arabidopsis thaliana Promoter Analysis Net),這個系統可以預測一個或多個阿拉伯芥基因啟動子上的TFBSs以及其相對應的轉錄因子(TFs)。根據生物晶片表現的資料和文獻,AtPAN可以進一步分析目標基因及與其共同表現的轉錄因子(TFs)。與這些轉錄因子有相互作用的蛋白質也被整合來重建基因轉錄調控網路。此外,亦可分析一群基因啟動子上所具有之共同的組合調控轉錄因子(combinatorial TFs)。另外,AtPAN提供了高可信度的轉錄因子,這些轉錄因子結合在兩同源基因啟動子保留區內。然而,在AtPAN中並非所有基因都具有實驗驗證的轉錄起始位置(TSSs)去定義有功能的啟動子。因此,這個研究應用了向量支持器 (SVM) 演算法來找出高可信度的轉錄起始位置。組蛋白三上第四號賴胺酸三甲基化(H3K4me3), 組蛋白三上第九號賴胺酸乙酰基化(H3K9ac), 轉錄起始位置標籤(TSS tags),核小體佔用率(nucleosome occupancy) 等次世代定序 (NGSs)資料,及在轉錄起始位置附近富集的模體(motifs)相關資訊都被整合來建立轉錄起始位置預測模型。這個預測模型的精確度、敏感度、專一性及正確性分別為91.27%, 90.67%, 91.33%, 91.00%。藉由應用這個預測模型,可找出每個阿拉伯芥基因的高可信度的轉錄起始位置。總歸之,AtPAN不只成功的利用高通量資料去找出高可信度的TSSs,同時也建立了易操作的網頁介面重建阿拉伯芥的基因轉錄調控網路。這個資源目前可在http://AtPAN.itps.ncku.edu.tw/免費使用。

    Construction of transcriptional regulatory networks (TRNs) in plants has received considerable attention in systems biology. With increasing availability of high-throughput data and transcription factor binding profiles, reconstruction of TRNs becomes an urgent task. For this purpose, this work develops a database-assisted system, AtPAN (Arabidopsis thaliana Promoter Analysis Net), capable of detecting transcription factor binding sites (TFBSs) and their corresponding transcription factors (TFs) in a promoter or a set of promoters in Arabidopsis. For further analysis, according to the microarray expression data and literature, the co-expressed TFs and their target genes can be retrieved from AtPAN. Proteins interacting with the co-expressed TFs were also incorporated to reconstruct TRNs. Furthermore, combinatorial TFs can be identified in a group of gene promoters. Additionally, high-confident TFs located in conserved regions between homologous genes are also provided in AtPAN. However, not all the genes in AtPAN have their experimentally verified transcription start sites (TSSs) to define a functional promoter. Therefore, a Support Vector Machine (SVM) algorithm was applied to identify high-confident TSSs in this study. With several next-generation sequencing (NGS) datasets of H3K4me3, H3K9ac, TSS tags and nucleosome occupancy, information of enriched motifs around known TSSs is also incorporated to construct a TSSs prediction model. The performance of the model in this study is 91.27%, 90.67%, 91.33%, 91.00% for precision, sensitivity, specificity and accuracy, respectively. Following implementation of this prediction model, high-confident TSSs were representative for each Arabidopsis genes. In conclusion, this work not only successfully identify high-confident TSSs based on high-through sequencing data, but also established a user-friendly web interface for users to investigate the mechanism of gene regulation. The resource is now freely available at http://AtPAN.itps.ncku.edu.tw/.

    TABLE OF CONTENTS 中文摘要 I Abstract II 誌謝 III LIST OF TABLES VI LIST OF FIGURES VI 1. Introduction 1 1.1. Overview of gene regulatory network (GRN) 1 1.2. Transcription factors mediated gene regulation 1 1.3. Promoter features contribute to transcription 2 1.4. Related Works 3 1.4.1. TFs/TFBSs and microarray、PPI databases 3 1.4.2. Promoter databases and TSS prediction model 4 1.5. Objective of this study 5 2. Material and Methods 7 2.1.Construction of AtPAN 7 2.1.1. Content and interface of website 7 2.1.2. Integration of external databases 7 2.1.3. Identifying TFBSs in promoter sequences and homologous conserved regions 8 2.1.4. Identifying co-expressed TFs and their target genes 8 2.1.5. Identifying co-occurrence of TFBSs in a group of gene promoters 9 2.2. Construction of TSS prediction model 10 2.2.1. Concept and application of SVM 10 2.2.2. Data processing and feature extraction 10 2.2.3. Model training and evaluation 11 3. Results 12 3.1. AtPAN – Arabidopsis thaliana Promoter Analysis Net 12 3.1.1.Case study of reconstruction of co-expressed TRNs 12 3.1.2. Case study of reconstruction of TFs-TGs network in gene group analysis 13 3.1.3. Promoter analysis– identification of TFBSs by input promoter sequence 14 3.1.4. Cross Species – identification of TFBSs in conserved regions 15 3.2. Construction of TSS prediction model 16 3.2.1. Distribution of TSS-relevant features around TSSs 16 3.2.2. Performance of TSS prediction model 16 4. Discussion and Prospects 17 5. References 19 LIST OF TABLES Table 1 Features of core promoters or transcription start sites 25 Table 2 Data statistics of AtPAN 26 Table 3 Performance of validation models 27 Table 4 Comparisons of AtPAN with the other databases 28 LIST OF FIGURES Figure.1 Concepts of AtPAN 29 Figure.2 System flow of AtPAN. 30 Figure.3 The problem of overlapping reads of multiple TSSs. 31 Figure.4 System flow of model training 32 Figure.5 Search result interface of AtPAN 33 Figure.6 Result of co-expressed TRN in AtPAN 34 Figure.7 Result of TF/TFBSs scanning in AtPAN 35 Figure.8 Output interface of gene group analysis. 36 Figure.9 Promoter and TRNs analysis result of PDF1.2 (LCR77) and PR4 (HEL) by AtPAN. 37 Figure.10 Output example of TFBSs drawing tool 38 Figure.11 Promoter analysis result of UGT84B2. 39 Figure.12 Cross Species analysis results of Lhcb genes. 40 Figure.13 Distribution of features around TSSs. 41

    Alabadı́, D., Yanovsky, M.J., Más, P., Harmer, S.L., and Kay, S.A. (2002). Critical Role for CCA1 and LHY in Maintaining Circadian Rhythmicity in Arabidopsis. Current Biology 12, 757-761.
    Arlot, S., and Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys 4, 40-79.
    Bader, G.D., Betel, D., and Hogue, C.W.V. (2003). BIND: the Biomolecular Interaction Network Database. Nucleic Acids Research 31, 248-250.
    Bassel, G.W., Lan, H., Glaab, E., Gibbs, D.J., Gerjets, T., Krasnogor, N., Bonner, A.J., Holdsworth, M.J., and Provart, N.J. (2011). Genome-wide network model capturing seed germination reveals coordinated regulation of plant cellular phase transitions. Proceedings of the National Academy of Sciences 108, 9709-9714.
    Bate, N., and Twell, D. (1998). Functional architecture of a late pollen promoter: pollen-specific transcription is developmentally regulated by multiple stage-specific and co-dependent activator elements. Plant Molecular Biology 37, 859-869.
    Bulow, L., Engelmann, S., Schindler, M., and Hehl, R. (2009a). AthaMap, integrating transcriptional and post-transcriptional data. Nucleic Acids Res 37, D983-986.
    Bulow, L., Engelmann, S., Schindler, M., and Hehl, R. (2009b). AthaMap, integrating transcriptional and post-transcriptional data. Nucleic Acids Research 37, D983-D986.
    Chang, C.-C., and Lin, C.-J. (2001). LIBSVM: A library for Support Vector Machine. http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
    Chang, W.-C., Lee, T.-Y., Huang, H.-D., Huang, H.-Y., and Pan, R.-L. (2008a). PlantPAN: Plant promoter analysis navigator, for identifying combinatorial cis-regulatory elements with distance constraint in plant gene groups. BMC Genomics 9, 561.
    Chang, W.C., Lee, T.Y., Huang, H.D., Huang, H.Y., and Pan, R.L. (2008b). PlantPAN: Plant promoter analysis navigator, for identifying combinatorial cis-regulatory elements with distance constraint in plant gene groups. BMC Genomics 9, 561.
    Charron, J.B.F., He, H., Elling, A.A., and Deng, X.W. (2009). Dynamic Landscapes of Four Histone Modifications during Deetiolation in Arabidopsis. The Plant Cell Online 21, 3732-3748.
    Chini, A., Fonseca, S., Fernández, G., Adie, B., Chico, J.M., Lorenzo, O., García-Casado, G., López-Vidriero, I., Lozano, F.M., Ponce, M.R., Micol, J.L., and Solano, R. (2007). The JAZ family of repressors is the missing link in jasmonate signalling. Nature 448, 666-671.
    Chodavarapu, R.K., Feng, S., Bernatavichute, Y.V., Chen, P.-Y., Stroud, H., Yu, Y., Hetzel, J.A., Kuo, F., Kim, J., Cokus, S.J., Casero, D., Bernal, M., Huijser, P., Clark, A.T., Krämer, U., Merchant, S.S., Zhang, X., Jacobsen, S.E., and Pellegrini, M. (2010). Relationship between nucleosome positioning and DNA methylation. Nature 466, 388-392.
    Galuschka, C., Schindler, M., Bulow, L., and Hehl, R. (2007). AthaMap web tools for the analysis and identification of co-regulated genes. Nucleic Acids Res 35, D857-862.
    Green, R.M., and Tobin, E.M. (1999). Loss of the circadian clock-associated protein 1 in Arabidopsis results in altered clock-regulated gene expression. Proceedings of the National Academy of Sciences 96, 4176-4179.
    Higo, K., Ugawa, Y., Iwamoto, M., and Korenaga, T. (1999a). Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Research 27, 297-300.
    Higo, K., Ugawa, Y., Iwamoto, M., and Korenaga, T. (1999b). Plant cis-acting regulatory DNA elements(PLACE) database: 1999. Nucleic Acids Res 27, 297-300.
    Hong, Z., Zhang, Z., Olson, J.M., and Verma, D.P.S. (2001). A Novel UDP-Glucose Transferase Is Part of the Callose Synthase Complex and Interacts with Phragmoplastin at the Forming Cell Plate. The Plant Cell 13, 769-779.
    Hsieh, T.F., Ibarra, C.A., Silva, P., Zemach, A., Eshed-Williams, L., Fischer, R.L., and Zilberman, D. (2009). Genome-Wide Demethylation of Arabidopsis Endosperm. Science 324, 1451-1454.
    Juven-Gershon, T., Hsu, J.-Y., Theisen, J.W.M., and Kadonaga, J.T. (2008). The RNA polymerase II core promoter — the gateway to transcription. Current Opinion in Cell Biology 20, 253-259.
    Kerrien, S., Alam-Faruque, Y., Aranda, B., Bancarz, I., Bridge, A., Derow, C., Dimmer, E., Feuermann, M., Friedrichsen, A., Huntley, R., Kohler, C., Khadake, J., Leroy, C., Liban, A., Lieftink, C., Montecchi-Palazzi, L., Orchard, S., Risse, J., Robbe, K., Roechert, B., Thorneycroft, D., Zhang, Y., Apweiler, R., and Hermjakob, H. (2007). IntAct—open source resource for molecular interaction data. Nucleic Acids Research 35, 561-565.
    Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10, R25.
    Lee, T.-Y., Chang, W.-C., Hsu, J., Chang, T.-H., and Shien, D.-M. (2012). GPMiner: an integrated system for mining combinatorial cis-regulatory elements in mammalian gene group. BMC Genomics 13, S3.
    Lescot, M., Déhais, P., Thijs, G., Marchal, K., Moreau, Y., Van de Peer, Y., Rouzé, P., and Rombauts, S. (2002a). PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Research 30, 325-327.
    Lescot, M., Dehais, P., Thijs, G., Marchal, K., Moreau, Y., Van de Peer, Y., Rouze, P., and Rombauts, S. (2002b). PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Res 30, 325-327.
    Lin, M., Shen, X., and Chen, X. (2011). PAIR: the predicted Arabidopsis interactome resource. Nucleic Acids Research 39, D1134-D1140.
    Lin, S.-W., Lee, Z.-J., Chen, S.-C., and Tseng, T.-Y. (2008). Parameter determination of support vector machine and feature selection using simulated annealing approach. Applied Soft Computing 8, 1505-1512.
    Lorenzo, O., Piqueras, R., Sánchez-Serrano, J.J., and Solano, R. (2003). ETHYLENE RESPONSE FACTOR1 Integrates Signals from Ethylene and Jasmonate Pathways in Plant Defense. The Plant Cell Online 15, 165-178.
    Matys, V., Fricke, E., Geffers, R., ßling, E.G., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-Margoulis, O.V., Kloos, D.-U., Land, S., Lewicki-Potapov, B., Michael, H., Mu¨nch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., and Wingender, E. (2003). TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Research Vol.31, 374-378.
    Molina, C.G., Erich. (2005). Genome wide analysis of Arabidopsis core promoters. BMC Genomics 6, 25.
    Morey, C., Mookherjee, S., Rajasekaran, G., and Bansal, M. (2011). DNA Free Energy-Based Promoter Prediction and Comparative Analysis of Arabidopsis and Rice Genomes. Plant Physiology 156, 1300-1315.
    Mostafavi, S., Ray, D., DavidWarde-Farley, Grouios, C., and Morris, Q. (2008). GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biology 9.
    Motoaki Seki, M.S., Tetsuya Sakurai, Kenji Akiyama, Kei Iida, Junko Ishida,Maiko Nakajima, Akiko Enju, Mari Narusaka, Miki Fujita, Youko Oono, Ayako Kamei,Kazuko Yamaguchi-Shinozaki and Kazuo Shinozaki. (2004). RIKEN Arabidopsis full-length (RAFL) cDNA and its applications for expression pro®ling under abiotic stress conditions. Journal of Experimental Botany.
    Obayashi, T., Hayashi, S., Saeki, M., Ohta, H., and Kinoshita, K. (2009). ATTED-II provides coexpressed gene networks for Arabidopsis. Nucleic Acids Research 37, D987-D991.
    Pauw, B., and Memelink, J. (2004). Jasmonate-Responsive Gene Expression. Journal of Plant Growth Regulation 23, 200-210.
    Preston, J., Tatematsu, K., Kanno, Y., Hobo, T., Kimura, M., Jikumaru, Y., Yano, R., Kamiya, Y., and Nambara, E. (2009). Temporal Expression Patterns of Hormone Metabolism Genes during Imbibition of Arabidopsis thaliana Seeds: A Comparative Study on Dormant and Non-Dormant Accessions. Plant and Cell Physiology 50, 1786-1800.
    Rouaida Calvin Perier, V.P., Thomas Junier, Claude Bonnard and Philipp Bucher (2000). The Eukaryotic Promoter Database (EPD). Nucleic Acids Research 28, 302-303.
    Satoh, R., Fujita, Y., Nakashima, K., Shinozaki, K., and Yamaguchi-Shinozaki, K. (2004). A Novel Subgroup of bZIP Proteins Functions as Transcriptional Activators in Hypoosmolarity-Responsive Expression of the ProDH Gene in Arabidopsis. Plant Cell Physiology 45, 309-317.
    Shahmuradov, I.A. (2003). PlantProm: a database of plant promoter sequences. Nucleic Acids Research 31, 114-117.
    Shiraki, T. (2003). Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proceedings of the National Academy of Sciences 100, 15776-15781.
    Sims, R.J., and Reinberg, D. (2006). Histone H3 Lys 4 methylation: caught in a bind? Genes & Development 20, 2779-2786.
    Stark, C., Breitkreutz, B.-J., Chatr-aryamontri, A., Boucher, L., Oughtred, R., Livstone, M.S., Nixon, J., Van Auken, K., Wang, X., Shi, X., Reguly, T., Rust, J.M., Winter, A., Dolinski, K., and Tyers, M. (2011). The BioGRID Interaction Database: 2011 update. Nucleic Acids Research 39, D698-D704.
    Su, N., Wang, Y., Qian, M., and Deng, M. (2010). Combinatorial regulation of transcription factors and microRNAs. BMC Systems Biology 4, 150.
    Thines, B., Katsir, L., Melotto, M., Niu, Y., Mandaokar, A., Liu, G., Nomura, K., He, S.Y., Howe, G.A., and Browse, J. (2007). JAZ repressor proteins are targets of the SCFCOI1 complex during jasmonate signalling. Nature 448, 661-665.
    van Dijk, K., Ding, Y., Malkaram, S., Riethoven, J.-J.M., Liu, R., Yang, J., Laczko, P., Chen, H., Xia, Y., Ladunga, I., Avramova, Z., and Fromm, M. (2010). Dynamic Changes in Genome-Wide Histone H3 Lysine 4 Methylation Patterns in Response to Dehydration Stress in Arabidopsis thaliana. BMC Plant Biology 10, 238.
    Vandepoele, K., Quimbaya, M., Casneuf, T., Veylder, L., and Van de, Y. (2009). Unraveling Transcriptional Control in Arabidopsis Using cis-Regulatory Elements and Coexpression Networks. Plant Physiology 150, 535-546.
    Verslues, P.E., and Bray, E.A. (2006). Role of abscisic acid (ABA) and Arabidopsis thaliana ABA-insensitive loci in low water potential-induced ABA and proline accumulation. Journal of Experimental Botany 57, 201-212.
    Yamamoto, Y.Y., and Obokata, J. (2007). ppdb: a plant promoter database. Nucleic Acids Research 36, D977-D981.
    Yamamoto, Y.Y., and Obokata, J. (2008). ppdb: a plant promoter database. Nucleic Acids Res 36, D977-981.
    Yamamoto, Y.Y., Yoshioka, Y., Hyakumachi, M., and Obokata, J. (2011). Characteristics of Core Promoter Types with respect to Gene Structure and Expression in Arabidopsis thaliana. DNA Research 18, 333-342.
    Yamamoto, Y.Y., Yoshitsugu, T., Sakurai, T., Seki, M., Shinozaki, K., and Obokata, J. (2009). Heterogeneity of Arabidopsis core promoters revealed by high-density TSS analysis. The Plant Journal 60, 350-362.
    Yamamoto, Y.Y., Ichida, H., Matsui, M., Obokata, J., Sakurai, T., Satou, M., Seki, M., Shinozaki, K., and Abe, T. (2007). Identification of plant promoter constituents by analysis of local distribution of short sequences. BMC Genomics 8, 67.
    Yilmaz, A., Mejia-Guerra, M.K., Kurz, K., Liang, X., Welch, L., and Grotewold, E. AGRIS: the Arabidopsis Gene Regulatory Information Server, an update. Nucleic Acids Res 39, D1118-1122.
    Yilmaz, A., Mejia-Guerra, M.K., Kurz, K., Liang, X., Welch, L., and Grotewold, E. (2011). AGRIS: the Arabidopsis Gene Regulatory Information Server, an update. Nucleic Acids Research 39, D1118-D1122.
    Zhao, X., Xuan, Z., and Zhang, M.Q. (2007). Boosting with stumps for predicting transcription start sites. Genome Biology 8, R17.
    Zhou, J., Wang, X., He, K., Charron, J.-B.F., Elling, A.A., and Deng, X.W. (2010). Genome-wide profiling of histone H3 lysine 9 acetylation and dimethylation in Arabidopsis reveals correlation between multiple histone marks and gene expression. Plant Molecular Biology 72, 585-595.
    Zhu, D., Tang, Y.-T., Li, S.-J., Kao, H.-Y., Tsai, S.-J., and Wang, H.-C. (2011). Using Unsupervised Patterns to Extract Gene Regulation Relationships for Network Construction. PLoS ONE 6, e19633

    下載圖示 校內:2013-08-24公開
    校外:2014-08-24公開
    QR CODE