簡易檢索 / 詳目顯示

研究生: 陳冠宇
Chen, Kuan-Yu
論文名稱: 基於端到端深度學習的抗癌肽預測器
A Novel Anticancer Peptide Predictor using End-to-end Deep Learning
指導教授: 張天豪
Chang, Tien-Hao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 110
語文別: 中文
論文頁數: 65
中文關鍵詞: 偽局部標籤偽兩階段模型端到端訓練深度學習抗癌肽
外文關鍵詞: Pseudo Local Label, Pseudo Two-stage Model, End-to-end Training, Deep Learning, Anticancer Peptide
相關次數: 點閱:167下載:107
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 癌症,又稱惡性腫瘤,是一種由控制細胞分裂增殖機制失常而引起的疾病。目前癌症的治療方法包含:化學療法、放射性治療、生物療法、外科手術等,這些療法受到副作用與高昂醫療費用的限制,是有效治療癌症的一大障礙。
    在過去幾年中,基於肽的療法已成為治療癌症的一種新穎策略,這些具有抗癌活性的肽被稱為抗癌肽(Anticancer Peptide)。由於抗癌肽本身的作用機制,它具有靶點特異性高、療效好、對人體毒性低、易於化學修飾和合成等優點。為了開發新型的抗癌肽並進一步探索抗癌肽的其他作用機制,能夠準確的預測抗癌肽是必須的,但是使用生物實驗來進行驗證耗時且昂貴,因此目前許多基於傳統機器學習或深度學習的抗癌肽預測研究已經被提出。
    本研究提出了一個基於端到端深度學習的抗癌肽預測器,我們將提出的方法分別在CancerGram資料集與AntiCP2.0資料集上進行驗證。在CancerGram資料集上,本研究達到目前最佳的效能,而在AntiCP2.0資料集上,本研究的表現則與目前最佳的研究相近。本研究提出的方法包含了一種使用端到端方式訓練的偽兩階段模型架構,我們發現該架構是整體效能提升的主因。在後續的討論中,我們建構一個實驗來驗證端到端訓練可以讓偽兩階段模型的表現更加穩定。最終,我們找出適合偽兩階段模型的任務特性,並給予使用偽兩階段模型的建議。

    Cancer is a disease caused by a malfunction in the mechanism that controls cell division and proliferation. Current cancer treatment methods include chemotherapy, radiotherapy, biological therapy, surgery, etc. These therapies are limited by side effects and high medical expenses, which are an obstacle to effective cancer treatment.
    In the past few years, peptide-based therapy has become a novel strategy for the treatment of cancer. These peptides with anticancer activity are called anticancer peptides (ACPs). ACPs have many advantages, such as high target specificity, good curative effect, low toxicity, and easy for chemical modification and synthesis. In order to develop new ACPs and further explore their mechanisms, it is necessary to be able to accurately predict ACPs. However, it is time-consuming and expensive to use biological experiments for verification. Therefore, many traditional machine learning-based and deep learning-based studies of ACP prediction have been proposed.
    In this study, we proposed an ACP predictor based on end-to-end deep learning. We verified the proposed method on two datasets, CancerGram dataset and AntiCP2.0 dataset. The proposed method reached SOTA performance on CancerGram dataset and achieve similar performance to current SOTA research on AntiCP2.0 dataset. The proposed method contains a pseudo two-stage structure trained with the end-to-end fashion, which is the main reason for overall performance improvement. An experiment is conducted to show the stability of pseudo two-stage structure trained with the end-to-end training strategy. Finally, we point out what kind of tasks are suitable for a pseudo two-stage model and gave some suggestions on the use of a pseudo two-stage model.

    致謝 XII 圖目錄 XV 表目錄 XVI 第一章 緒論 1 第二章 相關研究 4 2.1 抗癌肽 (Anticancer Peptide) 4 2.2 基於計算方法的抗癌肽預測 5 2.2.1.1 ACPred-FL 6 2.2.1.2 AntiCP2.0 7 2.2.1.3 CancerGram 8 2.2.2 基於深度學習之研究 9 2.2.2.1 ACP-DL 9 2.2.2.2 ACP-MHCNN 10 2.2.2.3 DeepACP 11 2.3 深度學習 13 2.3.1.1 卷積層 (Convolutional Layer) 14 2.3.1.2 跳躍連接 (Skip Connections) 15 2.3.2 循環神經網路 (Recurrent Neural Network, RNN) 16 2.3.2.1 循環單元 (Recurrent Unit) 17 2.3.2.2 注意力機制 (Attention Mechanism) 18 第三章 研究方法 20 3.1 資料編碼 20 3.1.1 序列編碼 (Sequence Encoding) 20 3.1.2 特徵編碼 (Feature Encoding) 20 3.2 模型架構 21 3.2.1 深度模體模型 (DeepMotif) 22 3.2.1.1 偽兩階段模型 (Pseudo Two-Stage Model) 23 3.2.1.2 模體嵌入模組 (Motif Embedding Module) 26 3.2.1.3 序列分類模組 (Sequence Classification Module) 27 3.2.2 深度特徵模型 (DeepFeature) 29 3.2.3 集成學習模組 (Ensemble Learning Module) 29 3.2.3.1 K折集成 (K-fold Ensemble) 30 3.2.3.2 調和平均集成 (Harmonic Mean Ensemble) 30 3.3 模型訓練與驗證流程 31 第四章 研究結果 33 4.1 資料集 33 4.1.1 AntiCP2.0資料集 33 4.1.2 CancerGram資料集 34 4.2 效能評估標準 34 4.3 與現行其他方法之比較 35 4.4 模型消融實驗 37 4.4.1 完整模型各部分之重要性 38 4.4.2 深度模體模型各部分之重要性 39 4.4.3 深度特徵模型各部分之重要性 42 4.4.4 集成學習模組之重要性 44 第五章 討論 46 5.1 各方法對資料集的依賴性 46 5.2 不同模體編碼方式的比較 48 5.3 端到端訓練對偽兩階段模型的影響 50 5.4 適合偽兩階段模型的任務特性 54 第六章 結論 62 6.1 結論 62 6.2 未來展望 62 參考文獻 63

    [1] Cancer. Available online: https://www.who.int/news-room/fact-sheets/detail/cancer (accessed on 16 September, 2021)
    [2] Mahassni, S. H., & Al-Reemi, R. M. (2013). Apoptosis and necrosis of human breast cancer cells by an aqueous extract of garden cress (Lepidium sativum) seeds. Saudi journal of biological sciences, 20(2), 131–139.
    [3] Gerber, B., Freund, M., & Reimer, T. (2010). Recurrent breast cancer: treatment strategies for maintaining and prolonging good quality of life. Deutsches Arzteblatt international, 107(6), 85–91.
    [4] Thundimadathil, J. (2012). Cancer treatment using peptides: current therapies and future prospects. Journal of amino acids, 2012.
    [5] Wei, L., Zhou, C., Chen, H., Song, J., & Su, R. (2018). ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics, 34(23), 4007-4016.
    [6] Agrawal, P., Bhagat, D., Mahalwal, M., Sharma, N., & Raghava, G. P. (2021). AntiCP 2.0: an updated model for predicting anticancer peptides. Briefings in Bioinformatics, 22(3), bbaa153.
    [7] Burdukiewicz, M., et al., CancerGram: An Effective Classifier for Differentiating Anticancer from Antimicrobial Peptides. Pharmaceutics, 2020. 12(11): p. 1045.
    [8] Rao, H. B., Zhu, F., Yang, G. B., Li, Z. R., & Chen, Y. Z. (2011). Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic acids research, 39(suppl_2), W385-W390.
    [9] You, Z. H., Zhou, M., Luo, X., & Li, S. (2016). Highly efficient framework for predicting interactions between proteins. IEEE transactions on cybernetics, 47(3), 731-743.
    [10] Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
    [11] Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63(1), 3-42.
    [12] Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28.
    [13] Yi, H. C., You, Z. H., Zhou, X., Cheng, L., Li, X., Jiang, T. H., & Chen, Z. H. (2019). ACP-DL: a deep learning long short-term memory model to predict anticancer peptides using high-efficiency feature representation. Molecular Therapy-Nucleic Acids, 17, 1-9.
    [14] Yu, L., Jing, R., Liu, F., Luo, J., & Li, Y. (2020). DeepACP: a novel computational
    approach for accurate identification of anticancer peptides by deep learning algorithm. Molecular Therapy-Nucleic Acids, 22, 862-870.
    [15] Liao, T.-Y. (2021). Anticancer Peptide Prediction Using Multi-scale Feature.
    [16] Schaduangrat, N., Nantasenamat, C., Prachayasittikul, V., & Shoombuatong, W. (2019). ACPred: a computational tool for the prediction and analysis of anticancer peptides. Molecules, 24(10), 1973.
    [17] Marqus, S., Pirogova, E., & Piva, T. J. (2017). Evaluation of the use of therapeutic peptides for cancer treatment. Journal of biomedical science, 24(1), 1-15.
    [18] Sok, M., Šentjurc, M., & Schara, M. (1999). Membrane fluidity characteristics of human lung cancer. Cancer letters, 139(2), 215-220.
    [19] Felício, M. R., Silva, O. N., Gonçalves, S., Santos, N. C., & Franco, O. L. (2017). Peptides with dual antimicrobial and anticancer activities. Frontiers in chemistry, 5, 5.
    [20] Jing, R., Li, Y., Xue, L., Liu, F., Li, M., & Luo, J. (2020). autoBioSeqpy: a deep learning tool for the classification of biological sequences. Journal of Chemical information and Modeling, 60(8), 3755-3764.
    [21] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
    [22] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
    [23] Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).
    [24] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
    [25] Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
    [26] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112).
    [27] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
    [28] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 91-99.
    [29] Tyagi, A., Tuknait, A., Anand, P., Gupta, S., Sharma, M., Mathur, D., ... & Raghava, G. P. (2015). CancerPPD: a database of anticancer peptides and proteins. Nucleic acids research, 43(D1), D837-D843.
    [30] Wang, G., Li, X., & Wang, Z. (2016). APD3: the antimicrobial peptide database as a tool for research and education. Nucleic acids research, 44(D1), D1087-D1093..
    [31] Kang, X., Dong, F., Shi, C., Liu, S., Sun, J., Chen, J., ... & Zheng, H. (2019). DRAMP 2.0, an updated data repository of antimicrobial peptides. Scientific data, 6(1), 1-10.
    [32] Jhong, J. H., Chi, Y. H., Li, W. C., Lin, T. H., Huang, K. Y., & Lee, T. Y. (2019). dbAMP: an integrated resource for exploring antimicrobial peptides with functional activities and physicochemical properties on transcriptome and proteome data. Nucleic acids research, 47(D1), D285-D297.
    [33] Fu, L., Niu, B., Zhu, Z., Wu, S., & Li, W. (2012). CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 28(23), 3150-3152.
    [34] Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.
    [35] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee.
    [36] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
    [37] Xie, M., Liu, D., & Yang, Y. (2020). Anti-cancer peptides: classification, mechanism of action, reconstruction and modification. Open Biology, 10(7), 200004.
    [38] Ahmed, S., Muhammod, R., Adilina, S., Khan, Z. H., Shatabda, S., & Dehzangi, A. (2020). ACP-MHCNN: An Accurate Multi-Headed Deep-Convolutional Neural Network to Predict Anticancer peptides. bioRxiv
    [39] Cao, R., Wang, M., Bin, Y., & Zheng, C. (2021). DLFF-ACP: prediction of ACPs based on deep learning and multi-view features fusion. PeerJ, 9, e11906.
    [40] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 8026-8037.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE