研究生: |
郭祐安 Kou, You-An |
---|---|
論文名稱: |
利用深度學習識別順式調控模組的交互作用 Genomic identification of cis-regulatory module interactions using deep learning |
指導教授: |
吳謂勝
Wu, Wei-Sheng |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 86 |
中文關鍵詞: | 順式調控模組 、表關遺傳 、深度學習 、接收器操作特性曲線 、精確率對召回率曲線 |
外文關鍵詞: | Cis-regulatory modules, epigenetics, deep learning, achieved area under the receiver operating characteristic curve, area under the precision-recall curve |
相關次數: | 點閱:34 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
順式調控模組(Cis-regulatory modules ,CRM)之間的相互作用在後生動物基因表達調控中扮演著關鍵的角色。這些相互作用通過轉錄因子或調控蛋白精確地結合到CRM內的特定DNA序列來協調。這些分子參與者的協同作用形成了一個複雜的調控網絡,進而影響基因的轉錄活性。儘管我們對CRM的基本功能已有較為深入的了解,但要全面掌握它們如何在全基因組範圍內互動,仍然是一個挑戰。針對這一挑戰,我們開發了一個深度學習預測模型,旨在揭示CRM之間潛在的互動關係。該模型整合了三個關鍵的實驗性轉錄調控數據集。具體而言,這三個數據集分別為組蛋白修飾模型-seq信號、染色質結合蛋白模型-seq信號以及核小體變異體信號。這些數據集提供了關於基因表達調控的多層次信息,幫助我們更好地理解CRM的功能和相互作用。研究中所設計的模型特別考慮了來自表觀遺傳數據集且皆為胚胎時期,並使用了一種具有共享參數的多頭注意力機制,這種機制最初源自自然語言處理(NLP)領域。多頭注意力機制能夠處理沒有順序依賴性的輸入序列,並捕捉輸入數據與位置特徵之間的緊密關係。這種設計顯著增強了模型分析數據中複雜模式的能力,使我們能夠更全面地理解調控元件之間的互動。我們在基於染色體劃分的4重交叉驗證中對模型進行了評估,防止訓練集與測試集有依賴性。結果顯示,我們的模型在識別CRM之間的相互作用方面表現出色。具體來說,測試集在接收器操作特性曲線圖(auROC)和精確率對召回率曲線圖(auPRC)中分別達到了88.5%和93.4%的AUC。這些結果不僅表明我們的模型具有卓越的鑑別能力,也顯示了它在預測和理解CRMs間複雜相互作用方面的潛力,且有助於更進一步解開細胞中動態調控網絡的謎題。
Cis-regulatory modules (CRMs) play a crucial role in gene expression regulation in metazoans through their interactions. These interactions are coordinated by the precise binding of transcription factors or regulatory proteins to specific DNA sequences within the CRMs. The synergistic action of these molecular participants forms a complex regulatory network that influences transcriptional activity. Despite significant insights into the basic functions of CRMs, fully understanding how they interact across the genome remains challenging. To address this challenge, we have developed a deep learning predictive model designed to uncover potential interactions between CRMs. The model integrates three key experimental transcriptional regulatory datasets: histone modification ChIP-seq signals, chromatin-binding protein ChIP-seq signals, and nucleosome variant signals. These datasets provide multi-layered information on gene expression regulation, helping us better understand CRM functions and interactions. The model is specifically tailored to epigenetic datasets derived from embryonic stages and employs a multi-head attention mechanism with shared parameters, originally adapted from natural language processing (NLP). The multi-head attention mechanism can process input sequences without sequential dependencies, capturing tight relationships between input data and positional features. This design significantly enhances the model's ability to analyze complex patterns in the data, allowing for a more comprehensive understanding of CRM interactions. We evaluated the model using chromosome-partitioned 4-fold cross-validation, preventing dependency between training and test sets. The results demonstrate that our model excels at identifying interactions between CRMs. Specifically, the test set achieved area under the receiver operating characteristic curve (auROC) and area under the precision-recall curve (auPRC) scores of 88.5% and 93.4%, respectively. These findings indicate not only the model's excellent discriminatory power but also its potential to predict and understand complex CRM interactions, contributing to the further unraveling of dynamic regulatory networks within cells.
[1] Yang, T.H., Transcription factor regulatory modules provide the molecular mechanisms for functional redundancy observed among transcription factors in yeast. Bmc Bioinformatics, 2019. 20(1): p. 16.
[2] Wang, J., et al., Genome-Wide Analysis of the Distinct Types of Chromatin Interactions in Arabidopsis thaliana. Plant Cell Physiol, 2017. 58(1): p. 57-70.
[3] Hong, Z., et al., Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism. Bioinformatics, 2020. 36(4): p. 1037-1043.
[4] Schoenfelder, S. and P. Fraser, Long-range enhancer-promoter contacts in gene expression control. Nat Rev Genet, 2019. 20(8): p. 437-455.
[5] Xiao, M., Z. Zhuang, and W. Pan, Local Epigenomic Data are more Informative than Local Genome Sequence Data in Predicting Enhancer-Promoter Interactions Using Neural Networks. Genes (Basel), 2019. 11(1).
[6] Schoenfelder, S., et al., The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements. Genome Res, 2015. 25(4): p. 582-97
[7] Hardison, R.C. and J. Taylor, Genomic approaches towards finding cis-regulatory modules in animals. Nat Rev Genet, 2012. 13(7): p. 469-83.
[8] Villar, D., P. Flicek, and D.T. Odom, Evolution of transcription factor binding in metazoans - mechanisms and functional implications. Nat Rev Genet, 2014. 15(4): p. 221-33.
[9] Xu, H., et al., Exploring 3D chromatin contacts in gene regulation: The evolution of approaches for the identification of functional enhancer-promoter interaction. Comput Struct Biotechnol J, 2020. 18: p. 558-570.
[10] Brettmann, E.A., I.Y. Oh, and C. de Guzman Strong, High-throughput Identification of Gene Regulatory Sequences Using Next-generation Sequencing of Circular Chromosome Conformation Capture (4C-seq). J Vis Exp, 2018(140).
[11] Belton, J.M., et al., Hi-C: a comprehensive technique to capture the conformation of genomes. Methods, 2012. 58(3): p. 268-76.
[12] Wang, Z.J., Y.F. Zhang, and C.Z. Zang, BART3D: inferring transcriptional regulators associated with differential chromatin interactions from Hi-C data. Bioinformatics, 2021. 37(18): p. 3075-3078.
[13] Wang, H., B.B. Huang, and J.R. Wang, Predict long-range enhancer regulation based on protein-protein interactions between transcription factors. Nucleic Acids Research, 2021. 49(18): p. 10347-10368.
[14] Dzida, T., et al., Predicting stimulation-dependent enhancer-promoter interactions from 模型-Seq time course data. PeerJ, 2017. 5: p. e3742.
[15] Lecun, Y., et al., Gradient-based learning applied to document recognition. Proceedings of the Ieee, 1998. 86(11): p. 2278-2324.
[16] Rivera, J., et al., REDfly: the transcriptional regulatory element database for Drosophila. Nucleic Acids Research, 2019. 47(D1): p. D828-D834.
[17] Marygold, S.J., et al., FlyBase: improvements to the bibliography. Nucleic Acids Research, 2013. 41(D1): p. D751-D757.
[18] Albig, C., et al., JASPer controls interphase histone H3S10 phosphorylation by chromosomal kinase JIL-1 in Drosophila. Nat Commun, 2019. 10(1): p. 5343.
[19] 尤毓淮。「以可解釋的深度學習方法辨識黑腹果蠅上順式調控模組之轉錄功能」。碩士論文,國立高雄大學資訊管理學系碩士班,2022。https://hdl.handle.net/11296/2c387v