| 研究生: |
林攸蓁 Lin, You-Chen |
|---|---|
| 論文名稱: |
使用基於卷積神經網路的特徵提取預測抗血管生成肽 Anti-angiogenic Peptides Prediction Using CNN Based Feature Extraction |
| 指導教授: |
張天豪
Chang, Tien-Hao |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 53 |
| 中文關鍵詞: | 卷積神經網路 、特徵提取 、抗血管生成肽 |
| 外文關鍵詞: | CNN, Feature Extraction, Anti-angiogenic Peptide |
| 相關次數: | 點閱:103 下載:27 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
抗血管生成肽 (Anti-angiogenic Peptide)是一種氨基酸短鍊,透過與血管內皮生長因子結合來抑制血管生成,從而抑制腫瘤的生長和擴散。抗血管生成肽已經被證實是一種很有前途的癌症治療方法,並且可以與其他治療方法聯合使用來提高治療的效果。在開發抗血管生成肽的過程中,需要對候選肽進行篩選。而計算方法就是一種能快速對大量抗血管生成肽候選物進行篩選的方法。
隨著近年來機器學習在各個領域的蓬勃發展,已經有許多論文利用機器學習來預測抗血管生成肽。本研究將這些論文中使用的所有特徵分為兩類,即有序列性和無序列性特徵。有序列性特徵是指保留位置訊息的特徵,代表順序不可互換。反之,無序列性特徵是指缺乏位置訊息的特徵,但具有大部分綜合資訊。在這些論文中,一些被編碼成二維向量的有序列性特徵會需要先經過堆疊轉換成一維向量後才能進行分類。但是,堆疊會導致該向量失去序列方向的資訊,因此,如何正確的處理有序列性特徵是本研究主要的探討重點。
本研究提出一個基於卷積神經網路的特徵提取來預測抗血管生成肽,該模型使用更能夠善用序列特性的卷積神經網路來取代推疊,進行特徵萃取。相較於其他預測抗血管生成肽的研究,該模型達到了目前最好的準確率、靈敏度、特異度、馬修斯相關係數以及曲線下面積。
Anti-angiogenic peptide (AAP) is a short chain of amino acids that binds to vascular endothelial growth factor to inhibit angiogenesis, which results in the inhibition of tumor growth and spread. Such peptides have been shown to be a promising treatment for cancer and can be used in combination with other therapies to improve therapeutic efficacy. In developing AAPs, candidates require screening, while computational methods offer a speedy solution for this.
With the recent explosion of machine learning in various fields, there have been many papers utilizing machine learning to predict AAPs. We classify all features used in these papers into two categories, namely sequential and non-sequential. Sequential features are those that retain positional information, meaning that the order of the features is not interchangeable. Conversely, non-sequential features lack positional information, with most of the aggregate characteristics. In these papers, some sequential features encoded as 2D vectors will need to be concatenated. However, concatenation leads to the loss of sequential information of these features. Therefore, how to handle sequential features properly is the main focus in this study.
In this study, we propose a Convolution Neural Network (CNN)-based feature extraction to predict AAPs, which is more capable of utilizing sequential properties, instead of concatenating them. Compared to other AAPs studies, this study achieves the best accuracy, sensitivity, specificity, Mathews correlation coefficient, and area under the curve.
[1] V. Laengsri, C. Nantasenamat, N. Schaduangrat, P. Nuchnoi, V. Prachayasittikul, and W. Shoombuatong, “TargetAntiAngio: A Sequence-Based Tool for the Prediction and Analysis of Anti-Angiogenic Peptides,” International Journal of Molecular Sciences, vol. 20, no. 12, pp. 26, Jun, 2019.
[2] A. S. E. Ramaprasad, S. Singh, P. S. R. Gajendra, and S. Venkatesan, “AntiAngioPred: A Server for Prediction of Anti-Angiogenic Peptides,” Plos One, vol. 10, no. 9, pp. 13, Sep, 2015.
[3] U. Consortium, “Activities at the universal protein resource (UniProt),” Nucleic acids research, vol. 42, no. D1, pp. D191-D198, 2014.
[4] C. H. Lin, L. Wang, and L. Shi, “AAPred-CNN: Accurate predictor based on deep convolution neural network for identification of anti-angiogenic peptides,” Methods, vol. 204, pp. 442-448, Aug, 2022.
[5] S. L. Zhang, and X. J. Li, “Pep-CNN: An improved convolutional neural network for predicting therapeutic peptides,” Chemometrics and Intelligent Laboratory Systems, vol. 221, pp. 9, Feb, 2022.
[6] "Latest global cancer data," December 15, 2020; https://www.iarc.who.int/news-events/latest-global-cancer-data-cancer-burden-rises-to-19-3-million-new-cases-and-10-0-million-cancer-deaths-in-2020/.
[7] J. Folkman, “Tumor angiogenesis: therapeutic implications,” New england journal of medicine, vol. 285, no. 21, pp. 1182-1186, 1971.
[8] E. V Rosca, J. E Koskimaki, C. G Rivera, N. B Pandey, A. P Tamiz, and A. S Popel, “Anti-angiogenic peptides for cancer therapeutics,” Current pharmaceutical biotechnology, vol. 12, no. 8, pp. 1101-1116, 2011.
[9] S. Marqus, E. Pirogova, and T. J. Piva, “Evaluation of the use of therapeutic peptides for cancer treatment,” Journal of biomedical science, vol. 24, no. 1, pp. 1-15, 2017.
[10] P. Charoenkwan, W. Chiangjong, M. M. Hasan, C. Nantasenamat, and W. Shoombuatong, “Review and Comparative Analysis of Machine Learning-based Predic-tors for Predicting and Analyzing Anti-angiogenic Peptides,” Current Medicinal Chemistry, vol. 29, no. 5, pp. 849-864, 2022.
[11] D. Varshni, K. Thakral, L. Agarwal, R. Nijhawan, and A. Mittal, "Pneumonia detection using CNN based feature extraction." pp. 1-7.
[12] K.-C. Chou, “Some remarks on protein attribute prediction and pseudo amino acid composition,” Journal of theoretical biology, vol. 273, no. 1, pp. 236-247, 2011.
[13] K.-C. Chou, “Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes,” Bioinformatics, vol. 21, no. 1, pp. 10-19, 2005.
[14] Y. Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882, 2014.
[15] L. Grinsztajn, E. Oyallon, and G. Varoquaux, “Why do tree-based models still outperform deep learning on typical tabular data?,” Advances in Neural Information Processing Systems, vol. 35, pp. 507-520, 2022.
[16] L. Breiman, “Random forests,” Machine learning, vol. 45, pp. 5-32, 2001.
[17] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189-1232, 2001.
[18] T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho, K. Chen, R. Mitchell, I. Cano, and T. Zhou, “Xgboost: extreme gradient boosting,” R package version 0.4-2, vol. 1, no. 4, pp. 1-4, 2015.
[19] P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550-1560, 1990.
[20] Y. LeCun, and Y. Bengio, “Convolutional networks for images, speech, and time series,” The handbook of brain theory and neural networks, vol. 3361, no. 10, pp. 1995, 1995.
[21] S. Ioffe, and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift." pp. 448-456.
[22] Y. Wu, and K. He, "Group normalization." pp. 3-19.
[23] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition." pp. 770-778.
[24] R. Kumar, K. Chaudhary, J. Singh Chauhan, G. Nagpal, R. Kumar, M. Sharma, and G. P. Raghava, “An in silico platform for predicting, screening and designing of antihypertensive peptides,” Scientific reports, vol. 5, no. 1, pp. 12512, 2015.
[25] Z. Chen, P. Zhao, F. Li, A. Leier, T. T. Marquez-Lago, Y. Wang, G. I. Webb, A. I. Smith, R. J. Daly, and K.-C. Chou, “iFeature: a python package and web server for features extraction and selection from protein and peptide sequences,” Bioinformatics, vol. 34, no. 14, pp. 2499-2502, 2018.
[26] Z.-H. Zhang, Z.-H. Wang, Z.-R. Zhang, and Y.-X. Wang, “A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine,” FEBS letters, vol. 580, no. 26, pp. 6169-6174, 2006.
[27] T.-Y. Lee, Z.-Q. Lin, S.-J. Hsieh, N. A. Bretaña, and C.-T. Lu, “Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences,” Bioinformatics, vol. 27, no. 13, pp. 1780-1787, 2011.
[28] S. Kawashima, P. Pokarowski, M. Pokarowska, A. Kolinski, T. Katayama, and M. Kanehisa, “AAindex: amino acid index database, progress report 2008,” Nucleic acids research, vol. 36, no. suppl_1, pp. D202-D205, 2007.
[29] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 58, no. 1, pp. 267-288, 1996.
[30] H. Zou, and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 67, no. 2, pp. 301-320, 2005.
[31] A. E. Hoerl, and R. W. Kennard, “Ridge regression: applications to nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 69-82, 1970.
[32] S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, “From local explanations to global understanding with explainable AI for trees,” Nature machine intelligence, vol. 2, no. 1, pp. 56-67, 2020.
[33] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, "Learning deep features for discriminative localization." pp. 2921-2929.
[34] W. Li, and A. Godzik, “Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences,” Bioinformatics, vol. 22, no. 13, pp. 1658-1659, 2006.