簡易檢索 / 詳目顯示

研究生: 郭祐安
Kou, You-An
論文名稱: 利用深度學習識別順式調控模組的交互作用
Genomic identification of cis-regulatory module interactions using deep learning
指導教授: 吳謂勝
Wu, Wei-Sheng
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 86
中文關鍵詞: 順式調控模組表關遺傳深度學習接收器操作特性曲線精確率對召回率曲線
外文關鍵詞: Cis-regulatory modules, epigenetics, deep learning, achieved area under the receiver operating characteristic curve, area under the precision-recall curve
相關次數: 點閱:34下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 順式調控模組(Cis-regulatory modules ,CRM)之間的相互作用在後生動物基因表達調控中扮演著關鍵的角色。這些相互作用通過轉錄因子或調控蛋白精確地結合到CRM內的特定DNA序列來協調。這些分子參與者的協同作用形成了一個複雜的調控網絡,進而影響基因的轉錄活性。儘管我們對CRM的基本功能已有較為深入的了解,但要全面掌握它們如何在全基因組範圍內互動,仍然是一個挑戰。針對這一挑戰,我們開發了一個深度學習預測模型,旨在揭示CRM之間潛在的互動關係。該模型整合了三個關鍵的實驗性轉錄調控數據集。具體而言,這三個數據集分別為組蛋白修飾模型-seq信號、染色質結合蛋白模型-seq信號以及核小體變異體信號。這些數據集提供了關於基因表達調控的多層次信息,幫助我們更好地理解CRM的功能和相互作用。研究中所設計的模型特別考慮了來自表觀遺傳數據集且皆為胚胎時期,並使用了一種具有共享參數的多頭注意力機制,這種機制最初源自自然語言處理(NLP)領域。多頭注意力機制能夠處理沒有順序依賴性的輸入序列,並捕捉輸入數據與位置特徵之間的緊密關係。這種設計顯著增強了模型分析數據中複雜模式的能力,使我們能夠更全面地理解調控元件之間的互動。我們在基於染色體劃分的4重交叉驗證中對模型進行了評估,防止訓練集與測試集有依賴性。結果顯示,我們的模型在識別CRM之間的相互作用方面表現出色。具體來說,測試集在接收器操作特性曲線圖(auROC)和精確率對召回率曲線圖(auPRC)中分別達到了88.5%和93.4%的AUC。這些結果不僅表明我們的模型具有卓越的鑑別能力,也顯示了它在預測和理解CRMs間複雜相互作用方面的潛力,且有助於更進一步解開細胞中動態調控網絡的謎題。

    Cis-regulatory modules (CRMs) play a crucial role in gene expression regulation in metazoans through their interactions. These interactions are coordinated by the precise binding of transcription factors or regulatory proteins to specific DNA sequences within the CRMs. The synergistic action of these molecular participants forms a complex regulatory network that influences transcriptional activity. Despite significant insights into the basic functions of CRMs, fully understanding how they interact across the genome remains challenging. To address this challenge, we have developed a deep learning predictive model designed to uncover potential interactions between CRMs. The model integrates three key experimental transcriptional regulatory datasets: histone modification ChIP-seq signals, chromatin-binding protein ChIP-seq signals, and nucleosome variant signals. These datasets provide multi-layered information on gene expression regulation, helping us better understand CRM functions and interactions. The model is specifically tailored to epigenetic datasets derived from embryonic stages and employs a multi-head attention mechanism with shared parameters, originally adapted from natural language processing (NLP). The multi-head attention mechanism can process input sequences without sequential dependencies, capturing tight relationships between input data and positional features. This design significantly enhances the model's ability to analyze complex patterns in the data, allowing for a more comprehensive understanding of CRM interactions. We evaluated the model using chromosome-partitioned 4-fold cross-validation, preventing dependency between training and test sets. The results demonstrate that our model excels at identifying interactions between CRMs. Specifically, the test set achieved area under the receiver operating characteristic curve (auROC) and area under the precision-recall curve (auPRC) scores of 88.5% and 93.4%, respectively. These findings indicate not only the model's excellent discriminatory power but also its potential to predict and understand complex CRM interactions, contributing to the further unraveling of dynamic regulatory networks within cells.

    中文摘要 III SUMMARY IV 致謝 VII 目錄 1 表目錄 4 圖目錄 5 第一章 緒論 9 1-1 研究背景 9 1-2 研究動機 9 1-3 研究目的 10 第二章 文獻回顧與探討 12 2-1 轉錄因子與CRM 12 2-2 轉錄因子與順式調控模組之間的相互作用 13 2-3 從 HI-C 數據推斷與差異染色質相互作用 14 2-4 表關遺傳相互作用來預測增強子-啟動子相互作用 16 2-5 深度學習演算法 17 2-5-1 卷積神經網路 18 2-5-2 多頭注意力機制 18 第三章 研究方法 19 3-1 基底資料集介紹 19 3-1-1 順式調控模組資料集介紹 19 3-1-2 資料前處理 19 3-2 HI-C 資料來源 20 3-2-1 染色體交互作用相關資料與前處理 ( 圖3.2-1 ) 20 3-2-2 PAIR 檔產生與處理 22 3-3 資料前處理 27 3-3-1 順式調控模組相互作用的資料 27 3-3-2 篩選出具有信心之交互作用的順式調控模組配對 28 3-3-3 表觀遺傳資料 33 3-4 CRMI (CHIP-BASE) MODEL模型介紹 42 3-4-1 特徵萃取層 42 3-4-2 順式調控模組對有無相互作用預測層 43 3-4-3 模型概述 44 第四章 研究結果 46 4-1 CRMI (CHIP-BASED) MODEL學習曲線 46 4-2 CRMI (CHIP-BASED) MODEL實驗結果 47 4-3 TEST FUNCTION 49 4-3-1 產生CRMI (ChIP-based) model測試集的功能 49 4-3-2 各功能資料來源 49 4-3-3 標記原則 50 4-3-4 座標原則 50 4-4 TEST FUNCTION結果 51 4-5 不同數據集比較 53 4-5-1 CRMI (TFBS-based) model學習曲線 54 4-5-2 CRMI (ChIP&TFBS-based) model 學習曲線 55 4-5-3 實驗結果 57 4-6 不同長度在CRMI (CHIP-BASED) MODEL比較 59 4-6-1 長度256在 CRMI (ChIP-based) model學習曲線 59 4-6-2 長度1024在 CRMI (ChIP-based) model 學習曲線 60 4-6-3 實驗結果 62 4-7 與現有比較工具 64 4-7-1 CRMI (ChIP-based) model 測試集與其他工具比較結果 64 4-7-2 EPI測試集與其他工具比較結果 66 第五章 可解釋性 68 5-1 透過 SHAP 解釋模型步驟介紹 68 5-2 各個功能交互作用前五高最具解釋性表觀遺傳子分析 69 5-3 在測試集中加強子、啟動子和絕緣子前五高最具解釋性表觀遺傳子分析 71 第六章 討論 72 6-1 有無多頭注意力機制的影響 72 6-1-1 無多頭注意力機制學習曲線 73 6-1-2 與原模型比較結果 74 第七章 結論與貢獻 76 7-1 結論 76 7-2 貢獻 76 參考文獻 77

    [1] Yang, T.H., Transcription factor regulatory modules provide the molecular mechanisms for functional redundancy observed among transcription factors in yeast. Bmc Bioinformatics, 2019. 20(1): p. 16.
    [2] Wang, J., et al., Genome-Wide Analysis of the Distinct Types of Chromatin Interactions in Arabidopsis thaliana. Plant Cell Physiol, 2017. 58(1): p. 57-70.
    [3] Hong, Z., et al., Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism. Bioinformatics, 2020. 36(4): p. 1037-1043.
    [4] Schoenfelder, S. and P. Fraser, Long-range enhancer-promoter contacts in gene expression control. Nat Rev Genet, 2019. 20(8): p. 437-455.
    [5] Xiao, M., Z. Zhuang, and W. Pan, Local Epigenomic Data are more Informative than Local Genome Sequence Data in Predicting Enhancer-Promoter Interactions Using Neural Networks. Genes (Basel), 2019. 11(1).
    [6] Schoenfelder, S., et al., The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements. Genome Res, 2015. 25(4): p. 582-97
    [7] Hardison, R.C. and J. Taylor, Genomic approaches towards finding cis-regulatory modules in animals. Nat Rev Genet, 2012. 13(7): p. 469-83.
    [8] Villar, D., P. Flicek, and D.T. Odom, Evolution of transcription factor binding in metazoans - mechanisms and functional implications. Nat Rev Genet, 2014. 15(4): p. 221-33.
    [9] Xu, H., et al., Exploring 3D chromatin contacts in gene regulation: The evolution of approaches for the identification of functional enhancer-promoter interaction. Comput Struct Biotechnol J, 2020. 18: p. 558-570.
    [10] Brettmann, E.A., I.Y. Oh, and C. de Guzman Strong, High-throughput Identification of Gene Regulatory Sequences Using Next-generation Sequencing of Circular Chromosome Conformation Capture (4C-seq). J Vis Exp, 2018(140).
    [11] Belton, J.M., et al., Hi-C: a comprehensive technique to capture the conformation of genomes. Methods, 2012. 58(3): p. 268-76.
    [12] Wang, Z.J., Y.F. Zhang, and C.Z. Zang, BART3D: inferring transcriptional regulators associated with differential chromatin interactions from Hi-C data. Bioinformatics, 2021. 37(18): p. 3075-3078.
    [13] Wang, H., B.B. Huang, and J.R. Wang, Predict long-range enhancer regulation based on protein-protein interactions between transcription factors. Nucleic Acids Research, 2021. 49(18): p. 10347-10368.
    [14] Dzida, T., et al., Predicting stimulation-dependent enhancer-promoter interactions from 模型-Seq time course data. PeerJ, 2017. 5: p. e3742.
    [15] Lecun, Y., et al., Gradient-based learning applied to document recognition. Proceedings of the Ieee, 1998. 86(11): p. 2278-2324.
    [16] Rivera, J., et al., REDfly: the transcriptional regulatory element database for Drosophila. Nucleic Acids Research, 2019. 47(D1): p. D828-D834.
    [17] Marygold, S.J., et al., FlyBase: improvements to the bibliography. Nucleic Acids Research, 2013. 41(D1): p. D751-D757.
    [18] Albig, C., et al., JASPer controls interphase histone H3S10 phosphorylation by chromosomal kinase JIL-1 in Drosophila. Nat Commun, 2019. 10(1): p. 5343.
    [19] 尤毓淮。「以可解釋的深度學習方法辨識黑腹果蠅上順式調控模組之轉錄功能」。碩士論文,國立高雄大學資訊管理學系碩士班,2022。https://hdl.handle.net/11296/2c387v

    無法下載圖示 校內:2029-08-27公開
    校外:2029-08-27公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE