簡易檢索 / 詳目顯示

研究生: 林泓鈞
Lin, Hung-Chun
論文名稱: 透過控制成分組成建立強健的抗微生物肽資料集
Build a Robust Antimicrobial Peptide Dataset by Controlling Composition
指導教授: 張天豪
Chang, Tien-Hao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 39
中文關鍵詞: 資料集深度學習抗微生物肽胺基酸成分組成
外文關鍵詞: Dataset, Deep Learning, Antimicrobial Peptide, Amino Acid Composition
相關次數: 點閱:82下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 抗微生物肽 (Antimicrobial Peptide, AMP) ,又稱宿主防禦肽 (Host Defense Peptide),廣泛存在在各種生物體中,為先天免疫系統中非常重要的一部分。許多生物研究實驗證實,抗微生物肽能有效殺死革蘭氏陰性菌、革蘭氏陽性菌、真菌、病毒、癌細胞以及寄生蟲。隨著具有多重抗藥性的超級細菌逐漸引起關注,抗微生物肽因為其廣譜 (broad spectrum) 以及強效的殺菌力,被視為傳統抗生素的重要替代方案。
    現行有許多預測抗微生物肽的研究,透過建立抗微生物肽資料集,設計特徵編碼,搭配機器學習模型,來幫助尋找新的抗微生物肽。本研究發現,現行的資料集正負樣本間的胺基酸成分組成有很明顯的差異,使得在訓練模型時,模型只學習到胺基酸成分組成而忽略了胺基酸順序的影響,甚至會誤判一條序列完全洗亂的抗微生物肽還具有功能。
    基於此問題,本研究提出了一個製備抗微生物肽資料集的取樣增強方法,用來平衡正負樣本之間胺基酸成分組成的差異。此方法可以套用在現有的製備流程上,使其製備的資料集可以訓練出更好的模型。本研究將現今廣為使用的資料集作為基本資料集,重新製備並且比較前後差異。實驗結果證明,加上取樣增強方法所訓練出來的模型,在各個獨立測試集都表現得更加優秀,且能學習到更全面性的特徵。

    Antimicrobial Peptides (AMP), also known as host defense peptides, are widely present in all kinds of life and play an important role of innate immune systems. Many biological researches have confirmed that AMPs can kill Gram-negative bacteria, Gram-positive bacteria, fungi, viruses, and parasites. As multi-drug resistant super bacteria gradually threaten our life, AMPs are regarded as an important alternative to traditional antibiotics because of their broad spectrum and potent bactericidal power.
    There are currently many studies on predicting AMPs, which establish antimicrobial peptide datasets, design feature encoding and construct machine learning models to help find novel AMPs. However, this study found that current antimicrobial peptide datasets have obvious amino acid composition bias between the positive and negative samples. This composition bias makes models focus on amino acid composition and ignore sequential features. Models trained under this circumstance are prone to incorrectly predict shuffled AMP sequences as real ones.
    To solve this problem, this study proposes a sampling boosting method for AMP datasets to balance the amino acid composition between positive and negative samples. The proposed boosting method can be added to existing preparation processes. This study reconstructs widely used AMP datasets with the proposed sampling boosting method. The results prove that models trained with the boosted datasets perform better in every independent testing dataset and learn from more comprehensive features.

    第一章 緒論 (1) 第二章 相關研究 (4) 2.1 抗微生物肽 (Antimicrobial Peptide) (4) 2.2 提出資料集的抗微生物肽預測研究 (4) 2.2.1 iAMP-2L (5) 2.2.2 Empirical comparison of web-based antimicrobial peptide prediction tools (6) 2.2.3 iAMPpred (7) 2.2.4 AmPEP (8) 2.2.5 AMP Scanner (10) 2.2.6 AMP Gram (11) 2.3 其他抗微生物肽預測研究 (13) 2.3.1 Antimicrobial Peptide Identification Using Multi-scale Convolutional Network (13) 2.3.2 Using Heterogeneous Convolutional Neural Network to Predict Antimicrobial Peptide (14) 2.4 序列比對與分群工具 (15) 2.4.1 BLAST (15) 2.4.2 CD-HIT (16) 第三章 研究方法 (18) 3.1 取樣增強方法 (18) 3.2 Xiao 資料集 (19) 3.3 Veltri 資料集 (20) 3.4 模型架構及訓練方式 (20) 3.5 獨立測試集 (21) 3.5.1 Meher 資料集 (21) 3.5.2 GabereNoble 資料集 (22) 3.5.3 Manavalan 資料集 (22) 3.5.4 Michal 資料集 (22) 3.6 針對胺基酸組成設計的測試集 (23) 第四章 研究結果 (25) 4.1 評估標準 (25) 4.2 Xiao 資料集 (25) 4.2.1 Xiao 資料集與 Xiao Plus 資料集的比較 (26) 4.2.2 使用「序列片段做長度控制」的影響 (27) 4.2.3 放寬「長度限制」的影響 (29) 4.2.4 「隨機取樣所造成」的影響 (30) 4.3 Veltri 資料集 (32) 4.3.1 Veltri 資料集與 Veltri Plus 資料集的比較 (32) 4.3.2 「隨機取樣所造成」的影響 (33) 4.4 分析與討論 (34) 第五章 結論 (37) 5.1 結論 (37) 5.2 未來與展望 (37) 參考文獻 (38)

    1. Lata, S., B. Sharma, and G.P. Raghava, Analysis and prediction of antibacterial peptides. BMC bioinformatics, 2007. 8(1): p. 1-10.
    2. Su, X., et al., Antimicrobial peptide identification using multi-scale convolutional network. BMC bioinformatics, 2019. 20(1): p. 1-10.
    3. Chou, K.C., Prediction of protein cellular attributes using pseudo‐amino acid composition. Proteins: Structure, Function, and Bioinformatics, 2001. 43(3): p. 246-255.
    4. Xiao, X., et al., iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Analytical biochemistry, 2013. 436(2): p. 168-177.
    5. Meher, P.K., et al., Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC. Scientific reports, 2017. 7(1): p. 1-12.
    6. UniProt: the universal protein knowledgebase. Nucleic acids research, 2017. 45(D1): p. D158-D169.
    7. Burdukiewicz, M., et al., Proteomic screening for prediction and design of antimicrobial peptides with AmpGram. International journal of molecular sciences, 2020. 21(12): p. 4310.
    8. Veltri, D., U. Kamath, and A. Shehu, Deep learning improves antimicrobial peptide recognition. Bioinformatics, 2018. 34(16): p. 2740-2747.
    9. 沈柏妤, 使用異質卷積神經網路預測抗微生物肽. 2020.
    10. Bhadra, P., et al., AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Scientific reports, 2018. 8(1): p. 1-10.
    11. Dubos, R.J., Studies on a bactericidal agent extracted from a soil bacillus: I. Preparation of the agent. Its activity in vitro. The Journal of experimental medicine, 1939. 70(1): p. 1.
    12. Wang, G., X. Li, and Z. Wang, APD3: the antimicrobial peptide database as a tool for research and education. Nucleic acids research, 2016. 44(D1): p. D1087-D1093.
    13. Li, W. and A. Godzik, Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 2006. 22(13): p. 1658-1659.
    14. Huang, Y., et al., CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics, 2010. 26(5): p. 680-682.
    15. Fu, L., et al., CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012. 28(23): p. 3150-3152.
    16. Consortium, U., UniProt: a hub for protein information. Nucleic acids research, 2015. 43(D1): p. D204-D212.
    17. Consortium, U., UniProt: a worldwide hub of protein knowledge. Nucleic acids research, 2019. 47(D1): p. D506-D515.
    18. Seshadri Sundararajan, V., et al., DAMPD: a manually curated antimicrobial peptide database. Nucleic acids research, 2012. 40(D1): p. D1108-D1112.
    19. Thomas, S., et al., CAMP: a useful resource for research on antimicrobial peptides. Nucleic acids research, 2010. 38(suppl_1): p. D774-D780.
    20. Waghu, F.H., et al., CAMP: Collection of sequences and structures of antimicrobial peptides. Nucleic acids research, 2014. 42(D1): p. D1154-D1158.
    21. Lata, S., N.K. Mishra, and G.P. Raghava, AntiBP2: improved version of antibacterial peptide prediction. BMC bioinformatics, 2010. 11(1): p. 1-7.
    22. Zhao, X., et al., LAMP: a database linking antimicrobial peptides. PloS one, 2013. 8(6): p. e66557.
    23. Thakur, N., A. Qureshi, and M. Kumar, AVPpred: collection and prediction of highly effective antiviral peptides. Nucleic acids research, 2012. 40(W1): p. W199-W204.
    24. Torrent, M., et al., Connecting peptide physicochemical and antimicrobial properties by a rational prediction model. PloS one, 2011. 6(2): p. e16968.
    25. Kent, W.J., BLAT—the BLAST-like alignment tool. Genome research, 2002. 12(4): p. 656-664.
    26. Jhong, J.-H., et al., dbAMP: an integrated resource for exploring antimicrobial peptides with functional activities and physicochemical properties on transcriptome and proteome data. Nucleic acids research, 2019. 47(D1): p. D285-D297.
    27. He, K., et al. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
    28. Iandola, F., et al., Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869, 2014.
    29. Hu, J., L. Shen, and G. Sun. Squeeze-and-excitation networks. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
    30. Johnson, M., et al., NCBI BLAST: a better web interface. Nucleic acids research, 2008. 36(suppl_2): p. W5-W9.
    31. Ye, J., S. McGinnis, and T.L. Madden, BLAST: improvements for better sequence analysis. Nucleic acids research, 2006. 34(suppl_2): p. W6-W9.
    32. Manavalan, B., et al., AIPpred: sequence-based prediction of anti-inflammatory peptides using random forest. Frontiers in pharmacology, 2018. 9: p. 276.
    33. Marie, C., et al., Regulation by anti-inflammatory cytokines (IL-4, IL-10, IL-13, TGFβ) of interleukin-8 production by LPS-and/or TNFα-activated human polymorphonuclear cells. Mediators of inflammation, 1996. 5(5): p. 334-340.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE