簡易檢索 / 詳目顯示

研究生: 何冠良
He, Guan-Liang
論文名稱: 基於深度學習辨別順式調控模組之目標基因
A deep learning approach to identify the target genes of cis-regulatory modules
指導教授: 吳謂勝
Wu, Wei-Sheng
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 79
中文關鍵詞: 深度學習順式調控模組基因調控加強子RNA
外文關鍵詞: deep learning, cis-regulatory modules, gene regulation, enhancer RNA
相關次數: 點閱:52下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在多細胞生物的基因組中,轉錄調控是通過轉錄因子(transcription factor, TF)或調控蛋白與特定的去氧核醣核酸(DNA)序列結合來實現基因表達調控的過程。這些特定的DNA序列被稱為順式調控模組(cis-regulatory module, CRM),而受調控的基因則被稱為目標基因(target gene)。本研究旨在開發一種全面性的工具,用於預測CRM是否調控特定基因。在以往的研究中,CRM的目標基因主要基於CRM與基因啟動子之間的相互作用來確定。然而,最近的研究表明,加強子可以通過加強子RNA(eRNAs)的轉錄來遠距離目標基因,而不僅僅是通過加強子與啟動子的相互作用。同時發現,CRM和基因之間的距離並不能直接解釋它們的調控關係。例如,CRM可能與其目標基因的啟動子近側區域,且基因可能不受其最近的CRM調控。本研究開發了兩個模型,分別預測CRM和基因的交互作用以及CRM和基因在機制RNA上的調控關係,從而綜合考慮交互作用和加強子RNA的調控機制。本研究要求:需同時預測出存在交互作用且在機制RNA上存在調控關係,才能預測給定的CRM能調控特定基因。本研究開發的預測管線在auROC上的表現達到84.1%,在auPRC上的表現達到97.4%,在辨別CRM是否調控某基因方面,優於現有的計算方法和工具。這項研究不僅為理解CRM之目標基因之間的關係提供了新的見解,還為未來基因調控研究提供了有力的工具。

    Transcriptional regulation in the genomes of multicellular organisms involves the interaction of transcription factors (TFs) or regulatory proteins with specific DNA sequences called cis-regulatory modules (CRMs) to control gene expression. This study aims to develop a comprehensive tool for predicting whether a CRM regulates a particular gene. Previous studies primarily determined the target genes of CRMs based on the interaction between CRMs and gene promoters. However, recent findings suggest that enhancers can target distant genes via the transcription of enhancer RNAs (eRNAs). It has also been discovered that the distance between a CRM and a gene does not directly explain their regulatory relationship. This study developed two models to predict the interactions between CRMs and genes and the regulatory relationships of CRMs on eRNAs, thereby integrating the interaction and enhancer RNA regulatory mechanisms. The study requires both interaction presence and regulatory relationships on eRNAs to predict that a given CRM can regulate a specific gene. The predictive pipeline developed in this study achieved an auROC of 84.1% and an auPRC of 97.4%, outperforming existing computational methods and tools in identifying CRM-gene regulatory relationships. This research provides new insights into CRM-target gene relationships and offers a powerful tool for future gene regulation studies.

    摘要 3 SUMMARY 5 Introduction 6 Materials and Methods 6 Result and Discussion 7 Conclusion 7 致謝 8 目錄 9 表目錄 13 圖目錄 15 第一章 研究介紹 18 1.1 研究背景 18 1.2 研究動機 19 1.3 研究目的 19 第二章 文獻回顧 21 2.1 順式調控模組如何調控目標基因 21 2.2 有關加強子目標基因的文獻回顧 21 2.3 加強子RNA的文獻回顧 23 2.4 基於演算法方式解法的回顧 24 2.5 問題總結 24 2.6 有關本研究的深度學習相關技術 25 2.6.1 概論 25 2.6.2 一維型態的卷積層簡介 26 2.6.3 多頭注意力簡介 26 第三章 研究方法 28 3.1 順式調控模組相關資料集介紹 28 3.1.1 順式調控模組資料集簡介 28 3.1.2 基因座標資料集介紹 28 3.2 染色體交互作用相關資料 28 3.2.1 染色質現象捕獲相關資料 28 3.2.2 pair檔產生與處理 30 3.3 取得可信任之交互作用配對資料 33 3.3.1 篩選出具有信心之交互作用的染色體交互作用配對 33 3.4 表關遺傳特徵資料與轉錄因子結合位點資料介紹 35 3.4.1 核小體變種與去氧核醣核酸可接近性資訊 35 3.4.2 組蛋白修飾資訊 36 3.4.3 染色質蛋白結合位點資訊 38 3.4.4 核小體與染色質以及核小體之表關遺傳特徵資料 40 3.5 模型訓練 42 3.5.1 預測管線流程 42 3.5.2 CRM-gene交互作用模型 43 3.5.3 eRNA調控模型 49 3.5.4 交叉驗證方法 53 第四章 研究結果 55 4.1 驗證方法 55 4.1.1 模型效能評估方法 55 4.2 CRM-gene交互作用模型結果 56 4.2.1 CRM-gene交互作用模型之預訓練模型 56 4.2.2 CRM-gene交互作用模型學習曲線與測試集驗證 57 4.3 eRNA調控模型結果 59 4.3.1 eRNA調控模型之預訓練模型 59 4.3.2 eRNA調控模型學習曲線與測試集驗證 61 4.4 測試集在預測管線的結果 63 4.5 藉由交互作用探討CRM與基因的調控關係 64 4.6 預測管線與其他方法比較 66 4.6.1 DNase信號的相關係數 67 4.6.2 計算轉錄起始點位與CRM的距離評估是否調控 67 4.6.3 近鄰目標基因 68 4.6.4 Enhancer Atlas 68 4.6.5 EnTDefs 69 4.6.5 Targetfinder 70 4.6.6 本研究與其他作法的比較結果 72 第五章 結論與貢獻 74 5.1 研究結論與貢獻 74 5.2 未來展望 74 參考文獻 75 Research Acknowledgement 79

    [1]  H. Xu, S. Zhang, X. Yi, D. Plewczynski, and M. J. Li, “Exploring 3D chromatin contacts in gene regulation: The evolution of approaches for the identification of functional enhancer-promoter interaction,” Computational and Structural Biotechnology Journal, vol. 18, pp. 558–570, 2020.
    [2]  J.E. Moore, H. E. Pratt, M. J. Purcaro, and Z. Weng, "A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods," Genome Biol., vol. 21, no. 1, p. 17, Dec. 2020.
    [3]  K. Suryamohan and M. S. Halfon, "Identifying transcriptional cis-regulatory modules in animal genomes: Identifying transcriptional cis-regulatory modules," WIREs Dev. Biol., vol. 4, no. 2, pp. 59–84, Mar. 2015.
    [4]  J.E. Moore, H. E. Pratt, M. J. Purcaro, and Z. Weng, "A curated benchmark of enhancer-gene interactions for evaluating enhancer-target gene prediction methods," Genome Biol., vol. 21, no. 1, p. 17, Dec. 2020.
    [5]  Z. Wang, L. Zhou, S. Jiang, and W. Huang, "EPnet: A general network to predict enhancer-promoter interactions," in 2021 11th International Conference on Information Science and Technology (ICIST), Chengdu, China: IEEE, May 2021.
    [6]  Chen, Ken et al. “Capturing large genomic contexts for accurately predicting enhancer-promoter interactions.” Briefings in bioinformatics vol. 23,2 (2022)
    [7]  Andersson, Robin et al. “A unified architecture of transcriptional regulatory elements.” Trends in genetics : TIG vol. 31,8 (2015)
    [8]  T. O'Connor, C. E. Grant, M. Bodén, and T. L. Bailey, "T-Gene: improved target gene prediction," Bioinformatics, vol. 36, no. 12, pp. 3902–3904, Jun. 2020.
    [9]  A. Agarwal and L. Chen, "DeepPHiC: Predicting promoter-centered chromatin interactions using a novel deep learning approach." Bioinformatics, Volume 39, Issue 1, January 2023
    [10]  D. Özdemir and G. Gambetta, "The Role of Insulation in Patterning Gene Expression," Genes, vol. 10, no. 10, p. 767, Sep. 2019.
    [11]  S. Thomas et al., "Dynamic reprogramming of chromatin accessibility during Drosophila embryo development," Genome Biol., vol. 12, no. 5, pp. 1-17, 2011.
    [12]  T. N. Mavrich et al., "Nucleosome organization in the Drosophila genome," Nature, vol. 453, no. 7193, pp. 358-362, 2008.
    [13]  T. H. Yang, C. C. Wang, P. C. Hung, and W. S. Wu, "cis MEP: an integrated repository of genomic epigenetic profiles and cis-regulatory modules in Drosophila," BMC Syst. Biol., vol. 8, no. 4, p. S8, 2014.
    [14]  G. J. Filion et al., "Systematic Protein Location Mapping Reveals Five Principal Chromatin Types in Drosophila Cells," Cell, vol. 143, no. 2, pp. 212-224, 2010.
    [15]  W. A. Kellner, E. Ramos, K. VanBortle, N. Takenaka, and V. G. Corces, "Genome-wide phosphoacetylation of histone H3 at Drosophila enhancers and promoters," Genome Res., vol. 22, no. 6, pp. 1081-1088, 2012.
    [16]  O. Fornes et al., "JASPAR 2020: Update of the open-Access database of transcription factor binding profiles," Nucleic Acids Res., vol. 48, no. D1, pp. D87-D92, 2020.
    [17]  R. J. Schmitz, E. Grotewold, and M. Stam, "Cis-regulatory sequences in plants: Their importance, discovery, and future challenges," The Plant Cell, vol. 34, no. 2, pp. 718–741, Feb. 2022.
    [18]  D. Özdemir and G. Gambetta, "The Role of Insulation in Patterning Gene Expression," Genes, vol. 10, no. 10, p. 767, Sep. 2019.
    [19]  M. Chen, X. Liu, Q. Liu, D. Shi, and H. Li, "3D genomics and its applications in precision medicine," Cell Mol Biol Lett, vol. 28, no. 1, p. 19, Mar. 2023.
    [20]  P. J. Farnham, "Insights from genomic profiling of transcription factors," Nat. Rev. Genet., vol. 10, no. 9, pp. 605–616, Sep. 2009.
    [21]  T.-H. Yang, Y.-C. Yang, and K.-C. Tu, "regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs," Computational and Structural Biotechnology Journal, vol. 20, pp. 296–308, 2022.
    [22]  T. H. Yang, Y. H. Yu, S. H. Wu, and F. Y. Zhang, "CFA: An explainable deep learning model for annotating the transcriptional roles of cis-regulatory modules based on epigenetic codes," Comput. Biol. Med., vol. 152, p. 106375, Jan. 2023.
    [23]  B. Batut et al., "Community-driven data analysis training for biology," Cell Syst., vol. 6, no. 6, pp. 752-758, 2018.
    [24]  L. S. Gramates et al., "FlyBase: a guided tour of highlighted features," Genetics, vol. 220, no. 4, p. iyac035, 2022.
    [25]  S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. J. Inman, "1D convolutional neural networks and applications: A survey," J. Name, vol. Volume, no. Issue, pp. Page Range, Year.
    [26]  A. Vaswani et al., "Attention is all you need," in Adv. Neural Inf. Process. Syst., pp. 5998-6008, 2017.
    [27]  V. Sartorelli and S. M. Lauberth, "Enhancer RNAs are an important regulatory layer of the epigenome," Nat. Struct. Mol. Biol., vol. 27, no. 6, pp. 521-528, Jun. 2020.
    [28]  T. Gao and J. Qian, "EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine species," Nucleic Acids Res., 2019.
    [29]   R. P. Welch et al., "ChIP-Enrich: Gene set enrichment testing for ChIP-seq data," Nucleic Acids Res., vol. 42, no. 13, p. e105, 2014.
    [30]  C. T. Lee et al., "Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions," NAR Genomics Bioinform., vol. 2, no. 1, p. lqz019, Mar. 2020.
    [31]  C. T. Lee, K. Wang, T. Qin, and M. A. Sartor, "Testing proximity of genomic regions to transcription start sites and enhancers complements gene set enrichment testing," Front. Genet., vol. 11, p. 199, 2020.
    [32]  T. Qin et al., "Comprehensive enhancer-target gene assignments improve gene set level interpretation of genome-wide regulatory data."Genome Biol 23, 105 (2022).
    [33]  R. E. Thurman et al., "The accessible chromatin landscape of the human genome," Nature, vol. 489, pp. 75–82, 2012.
    [34]   Nyandwi, Jean. “The Transformer Blueprint: A Holistic Guide to the Transformer Neural Network Architecture.” Deep Learning Revision, July 2023.

    無法下載圖示 校內:2029-08-27公開
    校外:2029-08-27公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE