| 研究生: |
林永泰 Lin, Yong-Tai |
|---|---|
| 論文名稱: |
基於跨步卷積預測第四位點胞嘧啶甲基化 DNA 4mC Site Prediction Method based on Strided Convolution |
| 指導教授: |
張天豪
Chang, Tien-Hao |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 39 |
| 中文關鍵詞: | 第四位點胞嘧啶甲基化 、深度學習 、跨步卷積 |
| 外文關鍵詞: | DNA N4-Methylcytosine, Deep Learning, Strided Convolution |
| 相關次數: | 點閱:103 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
第四位點的胞嘧啶甲基化是涉及各種生物過程的重要表觀遺傳修飾,對該位點進行準確的識別可以幫助研究人員了解表觀遺傳功能和機制。第四位點的胞嘧啶甲基化可以透過DNA甲基化定序等生物實驗方式進行鑑定,然而這樣的生物實驗方法繁瑣且成本昂貴,因此近年來有許多研究開發準確的計算方法來識別第四位點的胞嘧啶甲基化。
過去提出的計算方法大多使用機器學習方式預測,這些方法大多是依據DNA物理化學特性來設計輸入模型的特徵,然而這樣的方式需要大量的生物專業知識。本研究提出一個基於卷積神經網路的模型,能夠讓模型自行學習DNA序列的特徵。本研究中提出的模型與其他現有的預測方式比較,最終本研究在線蟲、果蠅、阿拉伯芥和大腸桿菌上有最好馬修斯相關係數0.530、0.534、0.566、0.120.
DNA N4-methylcytosine (4mC) is an important epigenetic modification involved in various biological processes. Accurate identification of 4mC site is essential to improve understanding of its biological functions and mechanisms. 4mC can be identified through biological experiments such as DNA methylation sequencing. However, biological experiments are cumbersome and expensive. Developing an accurate computational method to identify DNA N4-methylcytosine in multiple species is necessary.
Most of the computational methods proposed in the past use machine learning methods. The features used in these methods are mostly designed according to the physical and chemical properties of DNA. However, these methods require a lot of expertise. In this work, a deep learning model is proposed. The proposed model is based on a strided convolution network, which allows the model to learn the features of DNA sequences on its own, without the need for artificial design features.
The model proposed in this work is compared with other existing prediction methods of DNA N4-methylcytosine (4mC). According to our results, our work has the best Matthews correlation coefficient on multiple species of C.elegans (0.530)、D.melanogaster (0.534)、A.thaliana.(0.566) and E.coli (0.120)
[1]. X. Cheng, "DNA modification by methyltransferases," , Current opinion in structural biology, vol. 5, no. 1, pp. 4-10, Feb 1995
[2]. P. Modrich, "Mechanisms and biological effects of mismatch repair," , Annual review of genetics, vol. 25, pp. 229-53, 1991
[3]. W. Chen, H. Yang, P. Feng, H. Ding, and H. Lin, "iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties," , Bioinformatics (Oxford, England), vol. 33, no. 22, pp. 3518-3523, Nov 15 2017
[4]. W. He, C. Jia, and Q. Zou, "4mCPred: machine learning methods for DNA N4-methylcytosine sites prediction," , Bioinformatics (Oxford, England), vol. 35, no. 4, pp. 593-601, Feb 15 2019
[5]. A. S. Nair and S. P. Sreenadhan, "A coding measure scheme employing electron-ion interaction pseudopotential (EIIP)," , Bioinformation, vol. 1, no. 6, pp. 197-202, Oct 7 2006
[6]. L. Wei, S. Luan, L. A. E. Nagai, R. Su, and Q. Zou, "Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species," , Bioinformatics (Oxford, England), vol. 35, no. 8, pp. 1326-1333, Apr 15 2019
[7]. L. Wei et al., "Iterative feature representations improve N4-methylcytosine site prediction," , Bioinformatics (Oxford, England), vol. 35, no. 23, pp. 4930-4937, Dec 1 2019.
[8]. B. Manavalan, S. Basith, T. H. Shin, L. Wei, and G. Lee, "Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation," , Molecular therapy. Nucleic acids, vol. 16, pp. 733-744, Jun 7 2019
[9]. L. Wei et al., "Iterative feature representations improve N4-methylcytosine site prediction," , Bioinformatics (Oxford, England), vol. 35, no. 23, pp. 4930-4937, Dec 1 2019
[10]. J. Khanal, I. Nazari, H. Tayara, and K. T. Chong, "4mCCNN: Identification of N4-Methylcytosine Sites in Prokaryotes Using Convolutional Neural Network," Ieee Access, vol. 7, pp. 145455-145461, 2019.
[11]. Q. Tang et al., "DNA4mC-LIP: a linear integration method to identify N4-methylcytosine site in multiple species," , Bioinformatics (Oxford, England), vol. 36, no. 11, pp. 3327-3335, Jun 1 2020
[12]. B. Manavalan, M. M. Hasan, S. Basith, V. Gosu, T. H. Shin, and G. Lee, "Empirical Comparison and Analysis of Web-Based DNA N (4)-Methylcytosine Site Prediction Tools," , Molecular therapy. Nucleic acids, vol. 22, pp. 406-420, Dec 4 2020
[13]. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the Ieee, vol. 86, no. 11, pp. 2278-2324, Nov 1998