| 研究生: |
葛淯承 Ko, Yu-Cheng |
|---|---|
| 論文名稱: |
利用擴散模型進行資料增強並用於預測抗糖尿病肽 Utilizing Diffusion-based Data Augmentation in Anti-Diabetic Peptide Prediction |
| 指導教授: |
張天豪
Chang, Tien-Hao |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 中文 |
| 論文頁數: | 64 |
| 中文關鍵詞: | Word2Vec 、穩定擴散 、深度學習 、資料增強 、抗糖尿病肽 |
| 外文關鍵詞: | Word2Vec, Stable Diffusion, Deep Learning, Data Augmentaion, Antidiabetic Peptide |
| 相關次數: | 點閱:73 下載:7 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
糖尿病 (Diabetes Mellitus) 是一種慢性疾病,當胰臟無法生產足夠的胰島素或身體無法有效利用所產生的胰島素時,會造成血糖過高。當血糖超過腎臟的負荷時,血液中的糖分便會經由尿液排出,故稱為糖尿病。現有的治療方法包括胰島素注射和服用抗糖尿病藥物,但這些療法不僅成本高,還會對健康產生負面影響,。由於現有治療方法存在明顯缺點,開發對於人體較為無害的抗糖尿病療法為迫切需求。
近年來,基於肽的治療方法因其高目標選擇性和低毒性,在各種醫療領域得到廣泛應用。許多生物活性肽已被證明具有顯著的抗糖尿病效果,有望成為糖尿病管理的替代療法。然而,傳統的生物實驗方法驗證肽的抗糖尿病活性既昂貴又耗時。為了加速抗糖尿病肽的開發,現在已經有研究開始利用計算方法來預測具有抗糖尿病活性的肽。
本研究提出了一個抗糖尿病肽的分類器,該分類器使用深度學習模型Word2Vec來提取胺基酸序列的特徵,並利用穩定擴散(Stable Diffusion)技術對胺基酸序列進行資料增強,以提升分類器效能。我們在 AntiDMPpred 資料集上進行驗證,準確率達到0.8572,相較於現行最好的AntiDMPpred的表現上升了10.82個百分點,證實了我們的方法在抗糖尿病肽分類中的優越性能。此外,本研究也比較了不同資料增強方法的效果,證明我們的資料增強方法對於胺基酸序列來說是現行效果最好的。
Diabetes Mellitus is a chronic disease that occurs when the pancreas cannot produce enough insulin or when the body cannot effectively use the insulin it produces. Current treatments include insulin injections and antidiabetic drugs, which are not only costly but also have negative health impacts. Due to the significant drawbacks of existing treatments, there is an urgent need to develop new antidiabetic medications.
In recent years, peptide-based therapies have gained widespread application in various medical fields due to their high selectivity and low toxicity. Many bioactive peptides have been proven to have significant antidiabetic effects and show promise as alternative therapies for diabetes management. However, traditional biological experiments to verify the antidiabetic activity of peptides are both expensive and time-consuming. To accelerate the development of antidiabetic peptides, research has begun to utilize computational methods to predict peptides with antidiabetic activity.
This study proposes an antidiabetic peptide classifier that uses the Word2Vec language model to extract features from amino acid sequences and employs stable diffusion for data augmentation to enhance classifier performance. We validated our method on the AntiDMPpred dataset, achieving an accuracy of 0.8572, demonstrating the superior performance of our approach in antidiabetic peptide classification. In subsequent experiments, we compared different amino acid data augmentation methods and confirmed that our data augmentation method is the most effective currently available for amino acid sequences.
[ 1 ] Diabetes. Available online: https://www.who.int/news-room/fact-sheets/detail/diabetes (accessed on 13 March, 2024)
[ 2 ] Husain, M., Birkenfeld, A. L., Donsmark, M., Dungan, K., Eliaschewitz, F. G., Franco, D. R., ... & Bain, S. C. (2019). Oral semaglutide and cardiovascular outcomes in patients with type 2 diabetes. New England Journal of Medicine, 381(9), 841-851.
[ 3 ] Aqib, A. I., Kulyar, M. F. E. A., Ashfaq, K., Bhutta, Z. A., Shoaib, M., & Ahmed, R. (2019). Camel milk insuline: Pathophysiological and molecular repository. Trends in Food Science & Technology, 88, 497-504.
[ 4 ] Daliri, E. B. M., Oh, D. H., & Lee, B. H. (2017). Bioactive peptides. Foods, 6(5), 32.
[ 5 ] Toroski, M., Kebriaeezadeh, A., Esteghamati, A., Karyani, A. K., Abbasian, H., & Nikfar, S. (2019). Patient and physician preferences for type 2 diabetes medications: a systematic review. Journal of Diabetes & Metabolic Disorders, 18, 643-656.
[ 6 ] Saeedi, P., Petersohn, I., Salpea, P., Malanda, B., Karuranga, S., Unwin, N., ... & IDF Diabetes Atlas Committee. (2019). Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas. Diabetes research and clinical practice, 157, 107843.
[ 7 ] Acquah, C., Dzuvor, C. K., Tosh, S., & Agyei, D. (2022). Anti-diabetic effects of bioactive peptides: recent advances and clinical implications. Critical Reviews in Food Science and Nutrition, 62(8), 2158-2171.
[ 8 ] Pratley, R. E., Aroda, V. R., Lingvay, I., Lüdemann, J., Andreassen, C., Navarria, A., & Viljoen, A. (2018). Semaglutide versus dulaglutide once weekly in patients with type 2 diabetes (SUSTAIN 7): a randomised, open-label, phase 3b trial. The lancet Diabetes & endocrinology, 6(4), 275-286.
[ 9 ] Lu, Y., Lu, P., Wang, Y., Fang, X., Wu, J., & Wang, X. (2019). A novel dipeptidyl peptidase IV inhibitory tea peptide improves pancreatic β-cell function and reduces α-cell proliferation in streptozotocin-induced diabetic mice. International journal of molecular sciences, 20(2), 322.
[ 10 ] Hatanaka, T., Inoue, Y., Arima, J., Kumagai, Y., Usuki, H., Kawakami, K., ... & Mukaihara, T. (2012). Production of dipeptidyl peptidase IV inhibitory peptides from defatted rice bran. Food Chemistry, 134(2), 797-802.
[ 11 ] Mojica, L., & De Mejía, E. G. (2016). Optimization of enzymatic production of anti-diabetic peptides from black bean (Phaseolus vulgaris L.) proteins, their characterization and biological potential. Food & function, 7(2), 713-727.
[ 12 ] Mudgil, P., Jobe, B., Kamal, H., Alameri, M., Al Ahbabi, N., & Maqsood, S. (2019). Dipeptidyl peptidase-IV, α-amylase, and angiotensin I converting enzyme inhibitory properties of novel camel skin gelatin hydrolysates. LWT, 101, 251-258.
[ 13 ] Chen, X., Huang, J., & He, B. (2022). AntiDMPpred: a web service for identifying anti-diabetic peptides. PeerJ, 10, e13581.
[ 14 ] Basith, S., Pham, N. T., Song, M., Lee, G., & Manavalan, B. (2023). ADP-Fuse: A novel two-layer machine learning predictor to identify antidiabetic peptides and diabetes types using multiview information. Computers in Biology and Medicine, 165, 107386.
[ 15 ] Rao, H. B., Zhu, F., Yang, G. B., Li, Z. R., & Chen, Y. Z. (2011). Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic acids research, 39(suppl_2), W385-W390.
[ 16 ] Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
[ 17 ] Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28.
[ 18 ] Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
[ 19 ] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.
[ 20 ] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[ 21 ] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684-10695).
[ 22 ] Kurata, H., Tsukiyama, S., & Manavalan, B. (2022). iACVP: markedly enhanced identification of anti-coronavirus peptides using a dataset-specific word2vec model. Briefings in bioinformatics, 23(4), bbac265.
[ 23 ] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[ 24 ] Elnaggar, A., Heinzinger, M., Dallago, C., Rehawi, G., Wang, Y., Jones, L., ... & Rost, B. (2021). Prottrans: Toward understanding the language of life through self-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 44(10), 7112-7127.
[ 25 ] Minot, M., & Reddy, S. T. (2023). Nucleotide augmentation for machine learning-guided protein engineering. Bioinformatics Advances, 3(1), vbac094.
[ 26 ] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
[ 27 ] Lin, W., & Xu, D. (2016). Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types. Bioinformatics, 32(24), 3745-3752.
[ 28 ] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840-6851.
[ 29 ] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 (pp. 234-241). Springer International Publishing.
[ 30 ] Roy, S., & Teron, R. (2019). BioDADPep: A Bioinformatics database for anti diabetic peptides. Bioinformation, 15(11), 780.
[ 31 ] Fu, L., Niu, B., Zhu, Z., Wu, S., & Li, W. (2012). CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 28(23), 3150-3152.
[ 32 ] Qureshi, A., Thakur, N., Tandon, H., & Kumar, M. (2014). AVPdb: a database of experimentally validated antiviral peptides targeting medically important viruses. Nucleic acids research, 42(D1), D1147-D1153.
[ 33 ] Shahidi, F., & Zhong, Y. (2008). Bioactive peptides. Journal of AOAC international, 91(4), 914-931.
[ 34 ] Jahandideh, F., Bourque, S. L., & Wu, J. (2022). A comprehensive review on the glucoregulatory properties of food-derived bioactive peptides. Food chemistry: X, 13, 100222.
[ 35 ] Tang, W., Dai, R., Yan, W., Zhang, W., Bin, Y., Xia, E., & Xia, J. (2022). Identifying multi-functional bioactive peptide functions using multi-label deep learning. Briefings in Bioinformatics, 23(1), bbab414.