簡易檢索 / 詳目顯示

研究生: 林永泰
Lin, Yong-Tai
論文名稱: 基於跨步卷積預測第四位點胞嘧啶甲基化
DNA 4mC Site Prediction Method based on Strided Convolution
指導教授: 張天豪
Chang, Tien-Hao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 39
中文關鍵詞: 第四位點胞嘧啶甲基化深度學習跨步卷積
外文關鍵詞: DNA N4-Methylcytosine, Deep Learning, Strided Convolution
相關次數: 點閱:103下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 第四位點的胞嘧啶甲基化是涉及各種生物過程的重要表觀遺傳修飾,對該位點進行準確的識別可以幫助研究人員了解表觀遺傳功能和機制。第四位點的胞嘧啶甲基化可以透過DNA甲基化定序等生物實驗方式進行鑑定,然而這樣的生物實驗方法繁瑣且成本昂貴,因此近年來有許多研究開發準確的計算方法來識別第四位點的胞嘧啶甲基化。
    過去提出的計算方法大多使用機器學習方式預測,這些方法大多是依據DNA物理化學特性來設計輸入模型的特徵,然而這樣的方式需要大量的生物專業知識。本研究提出一個基於卷積神經網路的模型,能夠讓模型自行學習DNA序列的特徵。本研究中提出的模型與其他現有的預測方式比較,最終本研究在線蟲、果蠅、阿拉伯芥和大腸桿菌上有最好馬修斯相關係數0.530、0.534、0.566、0.120.

    DNA N4-methylcytosine (4mC) is an important epigenetic modification involved in various biological processes. Accurate identification of 4mC site is essential to improve understanding of its biological functions and mechanisms. 4mC can be identified through biological experiments such as DNA methylation sequencing. However, biological experiments are cumbersome and expensive. Developing an accurate computational method to identify DNA N4-methylcytosine in multiple species is necessary.
    Most of the computational methods proposed in the past use machine learning methods. The features used in these methods are mostly designed according to the physical and chemical properties of DNA. However, these methods require a lot of expertise. In this work, a deep learning model is proposed. The proposed model is based on a strided convolution network, which allows the model to learn the features of DNA sequences on its own, without the need for artificial design features.
    The model proposed in this work is compared with other existing prediction methods of DNA N4-methylcytosine (4mC). According to our results, our work has the best Matthews correlation coefficient on multiple species of C.elegans (0.530)、D.melanogaster (0.534)、A.thaliana.(0.566) and E.coli (0.120)

    第一章 緒論 1 第二章 相關研究 2 2.1 DNA 甲基化 3 2.2 DNA N4甲基胞嘧啶預測方法 4 2.2.1 iDNA4mc 4 2.2.2 4mcPred 5 2.2.3 4mcPred-SVM 7 2.2.4 4mcPred-IFL 8 2.2.5 Meta-4mcPred 9 2.2.6 4mCCNN 10 2.2.7 DNA4mc-LIP 11 2.2.8 Comparison and Analysis of Web based N4-Methylcytosine Site Prediction Tools 12 2.3 卷積神經網路 (Convolutional Neural Network, CNN) 12 2.3.1 卷積層 (Convolutional Layer) 13 2.3.2 跨步卷積層 (Strided Convolutional Layer) 14 2.3.3 全連接層 (Fully Connected Layer) 15 第三章 研究方法 16 3.1 資料集 16 3.1.1 Chen 資料集 16 3.1.2 Manavalan 資料集 17 3.2 資料編碼 18 3.3 模型訓練與測試流程 19 3.4 模型架構 19 3.5 模型訓練配置 21 第四章 實驗結果 22 4.1 效能評估標準 23 4.2 物種表現評估 24 4.2.1 Chen 資料集 24 4.2.2 Manavalan 資料集 27 4.3 跨步卷積與池化層比較 31 4.4 跨物種預測分析 33 4.5 合併物種樣本訓練模型 35 4.6 訓練Manavalan資料集並測試在Chen資料集 36 第五章 結論 37 5.1 結論 37 5.2 未來展望 38 參考文獻 38

    [1]. X. Cheng, "DNA modification by methyltransferases," , Current opinion in structural biology, vol. 5, no. 1, pp. 4-10, Feb 1995
    [2]. P. Modrich, "Mechanisms and biological effects of mismatch repair," , Annual review of genetics, vol. 25, pp. 229-53, 1991
    [3]. W. Chen, H. Yang, P. Feng, H. Ding, and H. Lin, "iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties," , Bioinformatics (Oxford, England), vol. 33, no. 22, pp. 3518-3523, Nov 15 2017
    [4]. W. He, C. Jia, and Q. Zou, "4mCPred: machine learning methods for DNA N4-methylcytosine sites prediction," , Bioinformatics (Oxford, England), vol. 35, no. 4, pp. 593-601, Feb 15 2019
    [5]. A. S. Nair and S. P. Sreenadhan, "A coding measure scheme employing electron-ion interaction pseudopotential (EIIP)," , Bioinformation, vol. 1, no. 6, pp. 197-202, Oct 7 2006
    [6]. L. Wei, S. Luan, L. A. E. Nagai, R. Su, and Q. Zou, "Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species," , Bioinformatics (Oxford, England), vol. 35, no. 8, pp. 1326-1333, Apr 15 2019
    [7]. L. Wei et al., "Iterative feature representations improve N4-methylcytosine site prediction," , Bioinformatics (Oxford, England), vol. 35, no. 23, pp. 4930-4937, Dec 1 2019.
    [8]. B. Manavalan, S. Basith, T. H. Shin, L. Wei, and G. Lee, "Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation," , Molecular therapy. Nucleic acids, vol. 16, pp. 733-744, Jun 7 2019
    [9]. L. Wei et al., "Iterative feature representations improve N4-methylcytosine site prediction," , Bioinformatics (Oxford, England), vol. 35, no. 23, pp. 4930-4937, Dec 1 2019
    [10]. J. Khanal, I. Nazari, H. Tayara, and K. T. Chong, "4mCCNN: Identification of N4-Methylcytosine Sites in Prokaryotes Using Convolutional Neural Network," Ieee Access, vol. 7, pp. 145455-145461, 2019.
    [11]. Q. Tang et al., "DNA4mC-LIP: a linear integration method to identify N4-methylcytosine site in multiple species," , Bioinformatics (Oxford, England), vol. 36, no. 11, pp. 3327-3335, Jun 1 2020
    [12]. B. Manavalan, M. M. Hasan, S. Basith, V. Gosu, T. H. Shin, and G. Lee, "Empirical Comparison and Analysis of Web-Based DNA N (4)-Methylcytosine Site Prediction Tools," , Molecular therapy. Nucleic acids, vol. 22, pp. 406-420, Dec 4 2020
    [13]. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the Ieee, vol. 86, no. 11, pp. 2278-2324, Nov 1998

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE