簡易檢索 / 詳目顯示

研究生: 許冠宇
Shiu, Guan-Yu
論文名稱: 組合數個順式調控模組預測器來改善順式調控模組預測準確率
Improving Cis-regulatory Module Prediction by Integrating Multiple Predictors
指導教授: 張天豪
Chang, Tien-Hao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 45
中文關鍵詞: 順式調控模組轉錄因子轉錄因子結合位點
外文關鍵詞: cis-regulator module, transcription factor, transcription factor binding site
相關次數: 點閱:73下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 順式調控模組(Cis-regulatory module, CRM)是一小段的去氧核醣核酸序列(deoxyribonucleic acid, DNA),這段序列有3-5個轉錄因子結合位點(transcription factors binding site, TFBS)讓轉錄因子(transcription factors, TFs)們結合,這段序列長度大約100-1000個鹼基對(base pair, bp)不等。順式調控模組轉錄下游基因是非常關鍵步驟,了解順式調控模組有助於了解基因調控和相關的生物機制。
    過去十年已經有許多研究提出各種不同類型的順式調控模組預測器,這些CRM的預測大致可以分成四類:窗口集群(window clustering)、概率模型(probabilistic modeling)、系統發生的足跡(phylogenetic footprinting)和辨別建模(discriminative modeling),然而,目前沒有研究為結合兩個種類的預測器。本篇研究分析四種現有的順式調控模組預測器(ClusterBuster、MSCAN、CisModule與MultiModule),找尋預測器最佳組合方式,組合後能提升順式調控模組預測的準確率。本篇研究使用從REDfly資料庫下載的465個黑腹果蠅(drosophila melanogaster)順式調控模組來評估已提出的順式調控模組預測器。實驗結果證實改善了四種合併方式,預測準確率分別提升1.29%、1.2%、0.58%和0.45%。

    A cis-regulatory module (CRM) is a stretch of deoxyribonucleic acid (DNA) of 10 to 1000 base pairs (bp) that contains 3 ~ 5 transcription factor binding sites (TFBSs). A CRM is critical to the transcription of its downstream genes. Understanding CRM helps to know gene regulation and the related biological mechanisms.
    Various CRM predictors have been proposed in the last decade. These CRM predictors can be roughly split into four categories: window clustering, probabilistic modeling, phylogenetic footprinting and discriminative modeling. However, there is no studies that combines two kinds of predictors. This study analyzed four existing CRM predictors (ClusterBuster, MSCAN, CisModule and MultiModule) to seek a predictor combination that delivers a higher accuracy than individual CRM predictors. 465 CRMs across 140 Drosophila melanogaster genes from the REDfly database were used to evaluate the proposed CRM predictor. The experiment results show that the prediction accuracy of these four merge methods increase by 1.29%、1.2%、0.58% and 0.45%.

    目錄 1 表目錄 3 圖目錄 4 第一章 緒論 5 第二章 相關研究 7 2.1 順式調控模組(cis-regulatory module) 7 2.2 順式調控模組預測器 9 2.2.1 CisModule 9 2.2.2 CisPlusFinder 9 2.2.3 ClusterBuster 9 2.2.4MCAST 10 2.2.5Morph 10 2.2.6MSCAN 11 2.2.7MultiModule 11 2.2.8順式調控模組預測器統整 11 第三章 資料集與方法 13 3.1 資料收集 13 3.1.1 FlyBase 13 3.1.2 iDMMPMM 13 3.1.3 JASPAR 13 3.1.4 REDfly 14 3.1.5 UCSC 14 3.2 資料處理與方法 15 3.2.1 序列 15 3.2.2 轉錄因子結合位點 17 3.2.3 方法 18 第四章 實驗結果與討論分析 20 4.1 預測效能評估準則 20 4.1.1 Precision 20 4.1.2 Sensitivity 22 4.1.3 Specificity 22 4.1.4 Accuracy 22 4.1.5 F-measure 23 4.1.6 AUC(Area under the Roc curve) 23 4.2不同序列長度對於順式調控模組預測準確性的影響 25 4.3 合併預測器與結果 27 4.3.1 兩種順式調控模組預測器組合 27 4.3.2 三種順式調控模組預測器組合 32 4.3.3 四種順式調控模組預測器組合 36 4.4 與其他方法比較 37 4.4.1 合併結果與現有順式調控模組預測器比較 37 4.4.2 分數權重調整 39 第五章 結論與未來展望 43 5.1 結論 43 5.2 未來展望 43 參考文獻 44

    1. 敖世洲, 真核基因转录调控因子的研究. 生物工程进展, 1991. 11(4).
    2. Su, J., S.A. Teichmann, and T.A. Down, Assessing computational methods of cis-regulatory module prediction. PLoS Computational Biology, 2010. 6(12): p. e1001020.
    3. Davidson, E.H., The regulatory genome: gene regulatory networks in development and evolution2010: Academic Press.
    4. Levine, M. and E.H. Davidson, Gene regulatory networks for development. Proc Natl Acad Sci U S A, 2005. 102(14): p. 4936-42.
    5. Kazemian, M., M.H. Brodsky, and S. Sinha, Genome Surveyor 2.0: cis-regulatory analysis in Drosophila. Nucleic Acids Res, 2011. 39(Web Server issue): p. W79-85.
    6. Zhou, Q. and W.H. Wong, CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc Natl Acad Sci U S A, 2004. 101(33): p. 12114-9.
    7. Pierstorff, N., C.M. Bergman, and T. Wiehe, Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA. Bioinformatics, 2006. 22(23): p. 2858-64.
    8. Frith, M.C., Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nucleic Acids Research, 2003. 31(13): p. 3666-3668.
    9. Bailey, T.L., et al., MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res, 2006. 34(Web Server issue): p. W369-73.
    10. Bailey, T.L. and W.S. Noble, Searching for statistically significant regulatory modules. Bioinformatics, 2003. 19(Suppl 2): p. ii16-ii25.
    11. Sinha, S. and X. He, MORPH: Probabilistic alignment combined with hidden Markov models of cis-regulatory modules. PLoS Computational Biology, 2005. preprint(2007): p. e216.
    12. Johansson, O., et al., Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm. Bioinformatics, 2003. 19(Suppl 1): p. i169-i176.
    13. Zhou, Q. and W.H. Wong, Coupling hidden Markov models for the discovery of Cis -regulatory modules in multiple species. The Annals of Applied Statistics, 2007. 1(1): p. 36-65.
    14. Drysdale, R.A. and M.A. Crosby, FlyBase: genes and gene models. Nucleic Acids Res, 2005. 33(Database issue): p. D390-5.
    15. Marygold, S.J., et al., FlyBase: improvements to the bibliography. Nucleic Acids Res, 2013. 41(Database issue): p. D751-7.
    16. Kulakovskiy, I.V. and V.J. Makeev, Discovery of DNA motifs recognized by transcription factors through integration of different experimental sources. Biophysics, 2010. 54(6): p. 667-674.
    17. Portales-Casamar, E., et al., JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res, 2010. 38(Database issue): p. D105-10.
    18. Gallo, S.M., et al., REDfly: a Regulatory Element Database for Drosophila. Bioinformatics, 2006. 22(3): p. 381-3.
    19. Gallo, S.M., et al., REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila. Nucleic Acids Res, 2011. 39(Database issue): p. D118-23.
    20. Fujita, P.A., et al., The UCSC Genome Browser database: update 2011. Nucleic Acids Res, 2011. 39(Database issue): p. D876-82.
    21. Karolchik, D., The UCSC Genome Browser Database. Nucleic Acids Research, 2003. 31(1): p. 51-54.
    22. Rhead, B., et al., The UCSC Genome Browser database: update 2010. Nucleic Acids Res, 2010. 38(Database issue): p. D613-9.
    23. Altschul, S.F., et al., Basic local alignment search tool. Journal of molecular biology, 1990. 215(3): p. 403-410.
    24. Venn, J., I. On the diagrammatic and mechanical representation of propositions and reasonings. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1880. 10(59): p. 1-18.
    25. Goodenough, D.J., K. Rossmann, and L.B. Lusted, Radiographic applications of receiver operating characteristic (ROC) curves. Radiology, 1974. 110(1): p. 89-95.
    26. Hanley, J.A., Characteristic (ROC) Curvel. Radiology, 1982. 743: p. 29-36.
    27. 林億雄 and 張嘉倩, 應用 ROC 曲線分析於評估不同篩檢工具分類準確性的研究. 台灣首府大學學報, 2011(2): p. 125-134.
    28. Hanley, J.A. and B.J. McNeil, The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982. 143(1): p. 29-36.

    下載圖示 校內:2023-01-01公開
    校外:2023-01-01公開
    QR CODE