簡易檢索 / 詳目顯示

研究生: 張子千
Chang, Tzu-Chien
論文名稱: 利用領域對抗神經網路預測信號肽切割位點
Using Domain-Adversarial Neural Network to Predict Cleavage Sites of Signal Peptides
指導教授: 張天豪
Chang, Tien-Hao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 40
中文關鍵詞: 信號肽切割點預測語意分割領域對抗訓練
外文關鍵詞: cleavage site of signal peptide, semantic segmentation, domain-adversarial training
相關次數: 點閱:56下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 信號肽是新生蛋白質氨基端帶有的一段胺基酸序列,它主要的功能與蛋白質運送機制有關,識別信號肽是了解蛋白質功能與運送目的地的先決條件,也有助於藥物設計。但是由於信號肽的序列保守性極低,內部結構變動較大,因此很難直接透過胺基酸排序來識別信號肽是否存在。早期主要核心的演算法大多是統計學與類神經網路,但是近年來也開始有人利用深度學習來識別信號肽,並且取得相當不錯的成果。
    本研究將由深度學習中的卷積神經網路所發展出來的語義分割技術以及領域對抗訓練應用至信號肽預測中,語義分割技術原本是用於識別出圖片中特定物體所涵蓋的區域,而這個概念剛好與信號肽預測相當接近,我們可以透過語義分割技術來識別蛋白質序列中信號肽所涵蓋的區段。
    現實生活中,有標註的信號肽資料非常稀少,而未標註的信號肽資料會比較多,在本研究中我們將改良現有的語義分割模型的架構,並模擬現實的情形,將未標註的資料運用至信號肽預測,並且再搭配領域對抗訓練的概念,使得語意分割模型所識別出來的區段更加精確。而實驗結果顯示,這個做法在真核生物資料集上確實有效;再者,我們也更換了網路中的特徵提取器、混和兩種資料集,也都收到了成效。另外,將細菌資料做為有標註資料、真核生物資料做為未標註資料來一同訓練,我們發現加上這些未標註的真核生物資料也能使原先僅使用有標註的細菌資料所建出來的模型提升效能。

    Signal peptides are amino acid sequence fragment located at N-termini of newborn proteins. Their main functionality is related to protein sorting. Identifying signal peptides is pre-requirement to discover protein function and destination and good at drug design. However, signal peptides are low conserved sequences and have high variations in amino acid compositions. It is difficult to recognize signal peptides by checking their sequence patterns. In the past, many researches started to use machine learning and statistical technology to recognize signal peptides. In recent years, some people started to use deep learning to predict signal peptides and got the state-of-the-art performance.
    In previous work, it applied semantic segmentation technology which is a kind of convolutional neural networks technology in signal peptide prediction. At the beginning, semantic segmentation technology is used to identify specific objects in images. We also used this technology to identify specific patterns – signal peptides in proteins, then predict the cleavage site of signal peptide.
    In reality, the annotated signal peptide data is very rare and hard to get, there are more unannotated data. In our work, we modified an existing neural network architecture to simulate the real situation. Apply unannotated data onto the model using the concept of domain-adversarial training to make the prediction more precise. The experimental results show that this method is indeed effective in the eukaryote dataset; in addition, we also replaced the feature extractor of the network or combined two datasets into training, the prediction become more accurate. Finally, using bacteria data as annotated data and eukaryote data as unannotated data, we found that adding these unannotated eukaryote data can increase the performance of models using only annotated bacteria data.

    圖目錄 XII 表目錄 XIV 第一章 緒論 1 第二章 相關研究 4 2.1 信號肽及切割位點 4 2.2 信號肽及切割位點的預測方法 5 2.2.1 SignalP 4.0 5 2.2.2 DeepSig 7 2.2.3 SigUnet 8 2.3 深度學習 (Deep Learning) 9 2.3.1 卷積神經網路 (Convolutional Neural Network) 9 2.3.2 U-Net 11 2.3.3 生成對抗式網路 (Generative Adversarial Network) 13 2.3.4 多任務學習 (Multi-task Learning) 14 2.3.5 領域對抗訓練 (Domain-Adversarial Training) 15 第三章 研究方法 18 3.1 資料集 18 3.1.1 SignalP 資料集 18 3.1.2 SPDS17 資料集 19 3.2 資料前處理 20 3.2.1 特徵編碼 20 3.2.2 標籤編碼 20 3.3 網路模型架構 21 3.4 模型取樣 22 3.5 模型訓練配置 22 第四章 研究結果 23 4.1 效能評估標準 23 4.1.1 均方誤差、均方根誤差 (MSE、RMSE) 23 4.1.2 標準差 ( Standard Deviation ) 23 4.2 對真核生物樣本的表現評估 24 4.2.1 當只有真核生物樣本時資料比例的表現評估 24 4.2.2 更換特徵提取器的效能表現 27 4.2.3 SPDS17資料集混入SignalP資料集對模型效能之探討 28 4.2.4 特徵提取器對真核生物樣本表現評估 29 4.2.5 細菌對於真核生物模型表現評估(多任務學習) 31 4.2.6 細菌對於真核生物模型表現評估(領域對抗訓練概念) 32 4.3 對細菌樣本的表現評估 34 4.3.1 真核生物對於細菌模型表現評估(多任務學習) 34 4.3.2 真核生物對細菌樣本的表現評估(領域對抗訓練概念) 35 第五章 結論 38 5.1 結果探討 38 5.2 未來展望 38 參考文獻 39

    1. Petersen TN, Brunak S, von Heijne G, Nielsen H: SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature methods 2011, 8(10):785.
    2. Savojardo C, Martelli PL, Fariselli P, Casadio R: DeepSig: deep learning improves signal peptide detection in proteins. Bioinformatics 2017, 34(10):1690-1696.
    3. Käll L, Krogh A, Sonnhammer EL: A combined transmembrane topology and signal peptide prediction method. Journal of molecular biology 2004, 338(5):1027-1036.
    4. Reynolds SM, Käll L, Riffle ME, Bilmes JA, Noble WS: Transmembrane topology and signal peptide prediction using dynamic bayesian networks. PLoS computational biology 2008, 4(11):e1000213.
    5. Von Heijne G: Patterns of amino acids near signal‐sequence cleavage sites. European journal of biochemistry 1983, 133(1):17-21.
    6. Von Heijne G: A new method for predicting signal sequence cleavage sites. Nucleic acids research 1986, 14(11):4683-4690.
    7. Campos D, Matos S, Oliveira JL: Gimli: open source and high-performance biomedical name recognition. BMC bioinformatics 2013, 14(1):54.
    8. Tsuruoka Y, Tateishi Y, Kim J-D, Ohta T, McNaught J, Ananiadou S, Tsujii Ji: Developing a robust part-of-speech tagger for biomedical text. In: Panhellenic Conference on Informatics: 2005. Springer: 382-392.
    9. Hochreiter S, Schmidhuber J: Long short-term memory. Neural computation 1997, 9(8):1735-1780.
    10. Wang Y, Mao H, Yi Z: Protein secondary structure prediction by using deep learning method. Knowledge-Based Systems 2017, 118:115-123.
    11. Wang D, Zeng S, Xu C, Qiu W, Liang Y, Joshi T, Xu D: MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics 2017, 33(24):3909-3916.
    12. Wu C-M: A Signal Peptide Prediction Method based on Sementic Segmentation. NCKU 2019.
    13. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition: 2016. 3213-3223.
    14. Long J, Shelhamer E, Darrell T: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition: 2015. 3431-3440.
    15. Ronneberger O, Fischer P, Brox T: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention: 2015. Springer: 234-241.
    16. Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla, Senior Member, IEEE: SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.CVPR 2016.
    17. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 2018, 40(4):834-848.
    18. Seunghoon Hong, Hyeonwoo Noh, Bohyung Han: Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation. CVPR 2015,
    19. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio: Generative Adversaial Network. 2014.
    20. Sutskever I, Vinyals O, Le QV: Sequence to sequence learning with neural networks. In: Advances in neural information processing systems: 2014. 3104-3112.
    21. Blobel G, Sabatini DD: Ribosome-membrane interaction in eukaryotic cells. In: Biomembranes. Springer; 1971: 193-195.
    22. Milstein C, Brownlee G, Harrison TM, Mathews M: A possible precursor of immunoglobulin light chains. Nature New Biology 1972, 239(91):117.
    23. von Heijne G: The signal peptide. The Journal of membrane biology 1990, 115(3):195-201.
    24. Y. LeCun , B. Boser , J. S. Denker , D. Henderson , R. E. Howard , W. Hubbard , L. D. Jackel: Backpropagation Applied to Handwritten Zip Code. Neural Computation 1989, 541-551.
    25. Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, Victor Lempitsky: Domain-Adversarial Training of Neural Network. JMLR 2015. arXiv:1505.07818
    26. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M: Tensorflow: a system for large-scale machine learning. In: OSDI: 2016. 265-283
    27. Kingma DP, Ba J: Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980 2014.
    28. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L: Imagenet: A large-scale hierarchical image database. In: Computer Vision and Pattern Recognition, 2009 CVPR 2009 IEEE Conference on: 2009. Ieee: 248-255.
    29. Consortium U: The universal protein resource (UniProt) in 2010. Nucleic acids research 2009, 38(suppl_1):D142-D148.

    下載圖示 校內:2024-07-30公開
    校外:2024-07-30公開
    QR CODE