| 研究生: | 孫佑杰 Sun, You-Jie | 
|---|---|
| 論文名稱: | 利用轉錄因子交互作用來提升順式調控模組的預測準確性 Enhancing cis-regulatory model prediction by considering transcription factor interactions | 
| 指導教授: | 張天豪 Chang, Tien-Hao | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 電機工程學系 Department of Electrical Engineering | 
| 論文出版年: | 2013 | 
| 畢業學年度: | 101 | 
| 語文別: | 中文 | 
| 論文頁數: | 46 | 
| 中文關鍵詞: | 順式調控模組 、轉錄因子交互作用 | 
| 外文關鍵詞: | cis-regulatory module, transcription factor interactions | 
| 相關次數: | 點閱:77 下載:1 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
順式調控模組(cis-regulatory module, CRM)是一段長度不固定的DNA序列,通常序列長度為100 ~ 1000個鹼基對(base pair),至少四個以上的轉錄因子會結合到啟動子(promoter)區域上,而轉錄因子結合位點(transcription factor binding site, TFBS)最少10個以上。順式調控模組在真核生物基因調控中扮演重要的角色。目前順式調控模組的預測方法,都需要基因序列和轉錄因子結合位點(transcription factor binding site, TFBS)的資料(有些預測方法,僅需基因序列即可)。有研究指出,只考慮轉錄因子結合位點之間的距離來預測順式調控模組是不足的;某些結合因子間的交互作用也會影響順式調控模組的功能。然而,過去的順式調控模組預測方法,都未將結合因子間交互作用資訊考慮進去。
在本研究中,我們針對ClusterBuster這個順式調控模組預測器提出一個順式調控模組的預測模型,此模型將轉錄因子交互作用資訊考慮進去。根據轉錄因子交互作用的個數,重新計算順式調控模組預測器輸出的順式調控模組預測分數。首先,將我們準備的序列資料集和轉錄因子結合位點資料輸入進順式調控模組預測器,再利用輸出的順式調控模組預測分數將順式調控模組候選者排序 ,最後計算出轉錄因子交互作用對的數量,代入到預測模型中,依據模型得到的分數重新調整順式調控模組候選者順序。前幾名的順式調控模組候選者,都是有文獻證據,而不是順式調控模組預測器誤判的結果。我們選擇黑腹果蠅(Drosophila melanogaster)39條基因當作實驗對象,序列資料集由64條基因序列組成,轉錄因子結合位點資料共122筆。實驗結果證實,我們的預測模型可增加3.1%的順式調控模組預測準確度。
Cis-regulatory module (CRM) is a stretch of DNA, usually 100-1000 DNA base pairs in length and contains on the order of 10 or more binding sites for at least four transcription factors. CRM play an important role in the gene regulatory of Eukaryote. Existing methods of predict CRMs based on gene sequences and transcription factor binding sites (Some prediction methods require gene sequences only). Research has shown that it’s not enough predicting CRMs to consider only the distance of TFBS. Some protein–protein interactions between bound factors can also influence the function of CRMs. However, no existing CRM prediction methods take such interaction data between bound factors into account.
In this study, we proposed a new CRM prediction model that considering interactions of transcription factor. Recalculate CRM predicted scores based on pair amount of transcription factor interactions. First, we executes  CRM predictor by using our dataset as the input. Then, we adopt the predict score and sort the CRM candidates, which is the output of the previous CRM predictor. Finally, calculate the amounts of the pairs of TF Interaction , input the result to the module and adjust the order of the CRM candidates by total score of the module. Top few candidates both proved by literature evidence instead of the mistaken results made by the CRM predictor. We selected 39 genes of Drosophila melanogaster as experimental subjects, the sequence data set consists of 64 gene sequences and the transcription factor binding sites data set consists of 122 transcription factors. The experiments show that this prediction model increase the accuracy of CRM prediction by 3.1%.
1.	Davidson E.H., The regulatory genome: gene regulatory networks in development and evolution. 2010: Academic Press.
2.	Davidson E.H., Genomic regulatory systems: in development and evolution. 2001: Academic Press.
3.	Levine M. and E.H. Davidson: Gene regulatory networks for development. Proc Natl Acad Sci U S A 2005, 102(14):4936-42.
4.	Li E. and E.H. Davidson: Building developmental gene regulatory networks. Birth Defects Res C Embryo Today 2009, 87(2):123-30.
5.	Gupta M. and J.S. Liu: De novo cis-regulatory module elicitation for eukaryotic genomes. Proceedings of the National Academy of Sciences of the United States of America 2005, 102(20):7079-7084.
6.	Jeziorska D.M., K.W. Jordan, and K.W. Vance: A systems biology approach to understanding cis-regulatory module function. Seminars in cell & developmental biology 
7.	Yuh C.: Genomic Cis-Regulatory Logic: Experimental and Computational Analysis of a Sea Urchin Gene. Science 1998, 279(5358):1896-1902.
8.	Gallo S.M., et al.: REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila. Nucleic Acids Res 2011, 39(Database issue):D118-23.
9.	Robertson G., et al.: cisRED: a database system for genome-scale computational discovery of regulatory elements. Nucleic acids research 2006, 34(suppl 1):D68-D73.
10.	Sharov A.A., D.B. Dudekula, and M.S. Ko: CisView: a browser and database of cis-regulatory modules predicted in the mouse genome. DNA research 2006, 13(3):123-134.
11.	Frith M.C.: Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nucleic Acids Research 2003, 31(13):3666-3668.
12.	Alkema W.B., et al.: MSCAN: identification of functional clusters of transcription factor binding sites. Nucleic Acids Res 2004, 32(Web Server issue):W195-8.
13.	Zhou Q. and W.H. Wong: CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc Natl Acad Sci U S A 2004, 101(33):12114-9.
14.	Su J., S.A. Teichmann, and T.A. Down: Assessing computational methods of cis-regulatory module prediction. PLoS computational biology 2010, 6(12):e1001020.
15.	Berman B.P., et al.: Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl Acad Sci U S A 2002, 99(2):757-62.
16.	Liebler D.C., Introduction to proteomics: tools for the new biology. 2001: Humana press.
17.	protein-protein interaction. Available from: http://kimgen677s10.weebly.com/uploads/3/6/1/8/3618941/290277.jpg?325.
18.	Pierstorff N., C.M. Bergman, and T. Wiehe: Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA. Bioinformatics 2006, 22(23):2858-64.
19.	Bailey T.L., et al.: MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 2006, 34(Web Server issue):W369-73.
20.	Bailey T.L. and W.S. Noble: Searching for statistically significant regulatory modules. Bioinformatics 2003, 19(Suppl 2):ii16-ii25.
21.	Sinha S. and X. He: MORPH: Probabilistic alignment combined with hidden Markov models of cis-regulatory modules. PLoS Computational Biology 2005, preprint(2007):e216.
22.	Johansson O., et al.: Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm. Bioinformatics 2003, 19(Suppl 1):i169-i176.
23.	Zhou Q. and W.H. Wong: Coupling hidden Markov models for the discovery of Cis -regulatory modules in multiple species. The Annals of Applied Statistics 2007, 1(1):36-65.
24.	Stark C., et al.: BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006, 34(Database issue):D535-9.
25.	Drysdale R.A., M.A. Crosby, and C. FlyBase: FlyBase: genes and gene models. Nucleic Acids Res 2005, 33(Database issue):D390-5.
26.	Marygold S.J., et al.: FlyBase: improvements to the bibliography. Nucleic Acids Res 2013, 41(Database issue):D751-7.
27.	Pearson W.R. and D.J. Lipman: Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences 1988, 85(8):2444-2448.
28.	Kulakovskiy I.V. and V.J. Makeev: Discovery of DNA motifs recognized by transcription factors through integration of different experimental sources. Biophysics 2010, 54(6):667-674.
29.	Portales-Casamar E., et al.: JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res 2010, 38(Database issue):D105-10.
30.	Gallo S.M., et al.: REDfly: a Regulatory Element Database for Drosophila. Bioinformatics 2006, 22(3):381-3.
31.	Karolchik D.: The UCSC Genome Browser Database. Nucleic Acids Research 2003, 31(1):51-54.
32.	Oyang Y.-J., et al.: Data classification with a relaxed model of variable kernel density estimation. Neural Networks, 2005. IJCNN'05. Proceedings. 2005 IEEE International Joint Conference on Year,IEEE.
33.	Plackett R.L.: Karl Pearson and the chi-squared test. International Statistical Review/Revue Internationale de Statistique 1983, 59-72.