簡易檢索 / 詳目顯示

研究生: 廖庭暘
Liao, Ting-Yang
論文名稱: 使用多尺度序列特徵預測抗癌肽
Anticancer Peptide Prediction Using Multi-scale Feature
指導教授: 張天豪
Chang, Tien-Hao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 27
中文關鍵詞: 深度學習抗菌肽抗癌肽集成模型
外文關鍵詞: Deep Learning, Antimicrobial Peptide, Anticancer Peptide, Ensemble
相關次數: 點閱:107下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 抗菌肽 (Antimicrobial Peptide),也稱為宿主防禦肽 (Host Defense Peptide),存在於各種生物體,是先天免疫反應中不可或缺的一部分。在現今的研究當中,抗菌肽已經被證明可以殺死革蘭氏陰性菌、革蘭氏陽性菌、病毒、真菌…等,其中能夠殺死癌細胞的抗菌肽被稱之為抗癌肽 (Anticancer Peptide)。
    傳統的癌症治療方式如:手術治療、放射治療和化學藥物治療,容易提高腫瘤的抗藥性,且可能使正常細胞受到攻擊,從而引發許多副作用。抗癌肽作為一種天然的抗生素,除了不易提升抗藥性外,其特殊的抗癌機制能破壞癌細胞的細胞膜、促發癌細胞的凋零機制,在不傷害到正常細胞的情況下殺死癌細胞。
    近年來,有許多利用機器學習預測抗癌肽的研究 ,這些研究大多使用單一序列尺度的組成資訊作為機器學習模型的輸入特徵值。然而只使用單一序列尺度的特徵,容易失去不同序列尺度下所帶有的資訊,造成模型效能不佳。
    本研究參考過往的研究提出一個集成模型,其包含兩個處理不同序列尺度的子模型。實驗結果顯示,本研究提出的模型在AntiCP2.0資料集取得最好的馬修斯相關係數(0.539)。而於正負資料集不平衡的Cancergram資料集中,亦取得最好的馬修斯相關係數(0.195)。

    Antimicrobial peptides (AMPs), also called host defense peptides (HDPs), are part of the innate immune response found among all classes of life. Recent researches have shown that AMPs are able to kill various microbes, such as Gram negative or Gram positive bacteria, viruses and fungi. AMPs that kill cancer cell are called anticancer peptides (ACPs).
    Traditional therapies of cancer, such as surgery, radiotherapy and chemotherapy, might lead to drug resistance. Moreover, these therapies would kill cancer cells and normal cells indiscriminately, resulting in serious side effects. On the other hand, ACPs are natural antibiotics. They show low drug resistance and can attack cancer cells without hurting normal cell at the same time.
    In recent years, more and more machine learning-based methods have been proposed to predict anticancer peptides. Most of these methods use compositions of amino acids as features to train machine learning models. However, using only compositions with single scale as features will lack the information of different scale, leading to bad performance.
    This study proposes an ensemble deep learning model, which combines two sub-models focusing on different scales of sequence features. The experimental results show that the proposed model achieves the best Matthews correlation coefficient (0.539) on the AntiCP2.0 dataset. On the Cancergram dataset, which is an imbalanced dataset, the proposed model achieves the best Matthews correlation coefficient (0.195).

    摘要 I Summary II Introduction III Material and methods IV Results and discussion VI Conclusion VII 目錄 VIII 圖目錄 X 表目錄 XI 第一章 緒論 1 第二章 相關研究 2 2.1 抗癌肽 (ANTICANCER PEPTIDE) 2 2.2 抗癌肽預測研究 3 2.2.1 AntiCP 2.0 3 2.2.2 Cancergram 3 2.2.3 ACP-DL 5 2.2.4 ACPred-FL 5 2.3 類神經網路 ( NEURAL NETWORK, NN ) 6 2.3.1 卷積層 ( Convolutional Layer ) 6 2.3.2 全連接層 (Fully Connected Layer, FC) 7 2.3.3 Multi Task Learning with Homoscedastic Uncertainty 8 2.4 基於樹模型 (TREE-BASED MODELS) 8 2.4.1 極度隨機樹 (Extremely randomized trees) 8 2.4.2 極限梯度增強 ( eXtreme Gradient Boosting, XGBoost ) 8 第三章 研究方法 9 3.1 資料集 9 3.2 資料前處理 10 3.3 資料編碼 10 3.3.1 獨熱編碼 (One-Hot Encoding) 11 3.3.2 二肽組分 (Dipeptide Composition) 12 3.4 模型架構 12 3.4.1 肽聚合模型 13 3.4.2 五肽導向模型 13 3.5 模型訓練與驗證流程 16 第四章 研究結果 16 4.1 評估標準 16 4.2 與現行方法之比較 17 4.3 模型各部分重要性 18 4.4 模型特徵重要度 19 4.5 五肽序列模體分析 21 4.6 真實統計值與近似統計值的差異 24 4.7 視野域對模型效能的影響 25 第五章 結論 26 5.1 結論 26 5.2 未來展望 26 參考文獻 27

    1. Bray, F., et al., Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians, 2018. 68(6): p. 394-424.
    2. Rao, H., et al., Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic acids research, 2011. 39(suppl_2): p. W385-W390.
    3. You, Z.-H., et al., Highly efficient framework for predicting interactions between proteins. IEEE transactions on cybernetics, 2016. 47(3): p. 731-743.
    4. Breiman, L., Random forests. Machine learning, 2001. 45(1): p. 5-32.
    5. Geurts, P., D. Ernst, and L. Wehenkel, Extremely randomized trees. Machine learning, 2006. 63(1): p. 3-42.
    6. Hearst, M.A., et al., Support vector machines. IEEE Intelligent Systems and their applications, 1998. 13(4): p. 18-28.
    7. Wei, L., et al., ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics, 2018. 34(23): p. 4007-4016.
    8. Agrawal, P., et al., AntiCP 2.0: an updated model for predicting anticancer peptides. Briefings in Bioinformatics, 2021. 22(3): p. bbaa153.
    9. Burdukiewicz, M., et al., CancerGram: An Effective Classifier for Differentiating Anticancer from Antimicrobial Peptides. Pharmaceutics, 2020. 12(11): p. 1045.
    10. Yi, H.-C., et al., ACP-DL: a deep learning long short-term memory model to predict anticancer peptides using high-efficiency feature representation. Molecular Therapy-Nucleic Acids, 2019. 17: p. 1-9.
    11. Kendall, A., Y. Gal, and R. Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
    12. Chen, T. and C. Guestrin. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.
    13. Tyagi, A., et al., CancerPPD: a database of anticancer peptides and proteins. Nucleic acids research, 2015. 43(D1): p. D837-D843.
    14. Wang, G., X. Li, and Z. Wang, APD3: the antimicrobial peptide database as a tool for research and education. Nucleic acids research, 2016. 44(D1): p. D1087-D1093.
    15. Kang, X., et al., DRAMP 2.0, an updated data repository of antimicrobial peptides. Scientific data, 2019. 6(1): p. 1-10.
    16. Jhong, J.-H., et al., dbAMP: an integrated resource for exploring antimicrobial peptides with functional activities and physicochemical properties on transcriptome and proteome data. Nucleic acids research, 2019. 47(D1): p. D285-D297.
    17. Fu, L., et al., CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012. 28(23): p. 3150-3152.
    18. Bailey, T.L., STREME: Accurate and versatile sequence motif discovery. Biorxiv, 2020.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE