簡易檢索 / 詳目顯示

研究生: 黃子芸
Huang, Tzu-Yun
論文名稱: 使用自監督學習預測噬菌體宿主關係
Predicting Phage-bacterial Contig Association with Self-supervised Learning
指導教授: 張天豪
Chang, Tien-Hao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 32
中文關鍵詞: 自監督學習深度學習噬菌體與宿主交互關係
外文關鍵詞: Self-supervised learning, deep learning, phage-host association
相關次數: 點閱:47下載:11
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 噬菌體(Bacteriophage or phage)為一種只會感染細菌細胞的病毒,噬菌體與宿主細菌之間的關聯在微生物社群中扮演著重要的角色,可以用作研究微生物學和分子生物學的工具。而在醫學上,噬菌體可以用來治療細菌感染,尤其在抗生素耐藥性問題愈加嚴重的背景下,噬菌體療法等應用呈現出重要的潛力。
    隨著近年來深度學習的蓬勃發展,越來越多研究將深度學習應用於預測噬菌體與宿主交互關係,深度學習的好處是不需經由人工設計特徵,可以直接以原始的噬菌體或宿主之序列資料作為輸入,讓深度學習模型自行從序列中萃取出有用的特徵。這些基於各種序列相似性測量或深度學習模型來預測噬菌體與宿主之交互關係的方法常需要使用完整的噬菌體和宿主基因組,而近期研究提出在宏基因組(metagenomics)的研究中難以獲得完整的基因組,僅能取得一些片段序列(contigs),因此使用contigs來預測噬菌體與宿主之交互關係可能更貼進現實情況。
    本研究參考過往的研究使用噬菌體contig與宿主contig的序列作為輸入來進行訓練,並提出一個多損失模型,在噬菌體與宿主之交互關係預測任務之外,加入了自監督學習任務之損失進行訓練。實驗結果顯示,相較於過往之研究,本研究提出的模型取得更好的AUC。

    Bacteriophages (or phages) are viruses that exclusively infect bacterial cells. The interactions between phages and their host bacteria play a crucial role in microbial communities. Phages serve as valuable tools in studying microbiology and molecular biology. In the field of medicine, phages are able to treat bacterial infections. With their diverse applications in the medical domain, bacteriophage therapy holds significant promise, particularly in the context of growing antibiotic resistance.
    In recent years, with the rapid advancement of deep learning, many researchers have incorporated deep learning techniques to predict phage-host interactions. The advantage of deep learning lies in its ability to automatically extract relevant features from raw sequences and eliminates the need for manual feature engineering. However, most existing methods for predicting phage-host interactions rely on complete phage and host genomes. In metagenomics studies, complete genomes are often lacking. It is important to develop methods that work with shorter DNA fragments, known as contigs, instead of complete genome sequences.
    This work proposes a phage-host interaction predictor based on phage contigs and host contigs. The proposed predictor is a multi-loss model that use cosine similarity as self-supervised learning loss and binary cross entropy as phage-host interactions prediction loss. The experimental results conducted in this study demonstrate that the proposed model achieves superior Area Under the Curve (AUC) scores compared to previous methods.

    致謝 IX 目錄 X 圖目錄 XII 表目錄 XIII 第一章 緒論 1 第二章 相關研究 2 2.1 噬菌體(BACTERIOPHAGE) 2 2.2 噬菌體與宿主關係預測研究 3 2.2.1 VirHostMatcher 3 2.2.2 WiSH 3 2.2.3 ContigNet 3 2.3 卷積神經網路(CONVOLUTION NEURAL NETWORK) 4 2.3.1 卷積層(Convolutional Layer) 5 2.3.2 全域平均池化層(Global Average Pooling Layer) 6 2.4 全連接網路(FULLY CONNECTED NETWORK) 7 2.5 自監督學習 (SELF-SUPERVISED LEARNING) 8 2.5.1 SimCLR 8 2.5.2 MoCo 9 2.5.3 Clustering-Based Methods 10 2.5.4 Contrastive Learning without negative 10 第三章 研究方法 12 3.1 資料編碼 12 3.1.1 獨熱編碼 12 3.1.2 密碼子編碼 13 3.2 模型架構 14 3.2.1 特徵萃取器(Feature Extractor) 15 3.2.2 預測器(Predictor) 17 3.2.3 全連接層分類器(Classifier) 17 3.3 模型訓練與驗證流程 17 第四章 研究結果 19 4.1 資料集 19 4.2 評估標準 19 4.3 與現行其他方法之比較 20 4.4 不同重疊群長度下之表現 20 4.5 消融實驗 22 第五章 討論 24 5.1 不同訓練策略與方法之影響 24 5.1.1 使用不同訓練策略 24 5.1.2 使用不同對比學習方法 25 5.2 視覺化之比較 26 5.2.1 使用不同訓練策略 26 5.2.2 使用不同對比學習方法 27 第六章 結論 29 6.1 結論 29 6.2 未來展望 29 參考文獻 30

    [1] AHLGREN, Nathan A., et al. Alignment-free oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucleic acids research, 2017, 45.1: 39-53.
    [2] GALIEZ, Clovis, et al. WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs. Bioinformatics, 2017, 33.19: 3113-3114.
    [3] AMGARTEN, Deyvid, et al. vHULK, a new tool for bacteriophage host prediction based on annotated genomic features and deep neural networks. bioRxiv, 2020, 2020.12. 06.413476.
    [4] COUTINHO, Felipe Hernandes, et al. RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content. Patterns, 2021, 2.7.
    [5] Breiman, L., Random forests. Machine learning, 2001. 45(1): p. 5-32.
[6] SHANG, Jiayu; SUN, Yanni. Predicting the hosts of prokaryotic viruses using GCN-
    based semi-supervised learning. BMC biology, 2021, 19: 1-15.
    [7] Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations(ICLR). Toulon: Engineering and Technology organization; 2017.
    [8] TANG, Tianqi, et al. Phage–bacterial contig association prediction with a convolutional neural network. Bioinformatics, 2022, 38.Supplement_1: i45-i52.
    [9] CHEN, Ting, et al. A simple framework for contrastive learning of visual representations. In: International conference on machine learning. PMLR, 2020. p. 1597- 1607.
    [10] HE, Kaiming, et al. Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. p. 9729-9738.
    [11] CARON, Mathilde, et al. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 2020, 33: 9912-9924.
    [12] GRILL, Jean-Bastien, et al. Bootstrap your own latent-a new approach to self- supervised learning. Advances in neural information processing systems, 2020, 33: 21271- 21284.
    [13] CHEN, Xinlei; HE, Kaiming. Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021. p. 15750-15758
    [14] Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P: Natural language processing(almost)from scratch. Journal of Machine Learning Research 2011, 12(Aug):2493-2537.
    [15] Mikolov T, Chen K, Corrado G, Dean J: Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781 2013.
    [16] Johnson R, Zhang T: Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:14121058 2014
    [17] HU, Jie; SHEN, Li; SUN, Gang.: Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. p. 7132-7141.
    [18] DEVLIN, Jacob, et al.,: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
    [19] Lin M, Chen Q, Yan S: Network in network. arXiv preprint arXiv:13124400 2013
    [20] Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P: Natural language processing(almost)from scratch. Journal of Machine Learning Research 2011, 12(Aug):2493-2537.
    [21] Mikolov T, Chen K, Corrado G, Dean J: Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781 2013.
    [22] Johnson R, Zhang T: Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:14121058 2014.
    [23] MIHARA, Tomoko, et al. Linking virus genomes with host taxonomy. Viruses, 2016, 8.3: 66.
    [24] BENSON, Dennis A., et al. GenBank. Nucleic acids research, 2012, 41.D1: D36-D42.
    [25] FU, Limin, et al. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012, 28.23: 3150-3152.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE