研究生: |
張子千 Chang, Tzu-Chien |
---|---|
論文名稱: |
利用領域對抗神經網路預測信號肽切割位點 Using Domain-Adversarial Neural Network to Predict Cleavage Sites of Signal Peptides |
指導教授: |
張天豪
Chang, Tien-Hao |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 40 |
中文關鍵詞: | 信號肽切割點預測 、語意分割 、領域對抗訓練 |
外文關鍵詞: | cleavage site of signal peptide, semantic segmentation, domain-adversarial training |
相關次數: | 點閱:56 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
信號肽是新生蛋白質氨基端帶有的一段胺基酸序列,它主要的功能與蛋白質運送機制有關,識別信號肽是了解蛋白質功能與運送目的地的先決條件,也有助於藥物設計。但是由於信號肽的序列保守性極低,內部結構變動較大,因此很難直接透過胺基酸排序來識別信號肽是否存在。早期主要核心的演算法大多是統計學與類神經網路,但是近年來也開始有人利用深度學習來識別信號肽,並且取得相當不錯的成果。
本研究將由深度學習中的卷積神經網路所發展出來的語義分割技術以及領域對抗訓練應用至信號肽預測中,語義分割技術原本是用於識別出圖片中特定物體所涵蓋的區域,而這個概念剛好與信號肽預測相當接近,我們可以透過語義分割技術來識別蛋白質序列中信號肽所涵蓋的區段。
現實生活中,有標註的信號肽資料非常稀少,而未標註的信號肽資料會比較多,在本研究中我們將改良現有的語義分割模型的架構,並模擬現實的情形,將未標註的資料運用至信號肽預測,並且再搭配領域對抗訓練的概念,使得語意分割模型所識別出來的區段更加精確。而實驗結果顯示,這個做法在真核生物資料集上確實有效;再者,我們也更換了網路中的特徵提取器、混和兩種資料集,也都收到了成效。另外,將細菌資料做為有標註資料、真核生物資料做為未標註資料來一同訓練,我們發現加上這些未標註的真核生物資料也能使原先僅使用有標註的細菌資料所建出來的模型提升效能。
Signal peptides are amino acid sequence fragment located at N-termini of newborn proteins. Their main functionality is related to protein sorting. Identifying signal peptides is pre-requirement to discover protein function and destination and good at drug design. However, signal peptides are low conserved sequences and have high variations in amino acid compositions. It is difficult to recognize signal peptides by checking their sequence patterns. In the past, many researches started to use machine learning and statistical technology to recognize signal peptides. In recent years, some people started to use deep learning to predict signal peptides and got the state-of-the-art performance.
In previous work, it applied semantic segmentation technology which is a kind of convolutional neural networks technology in signal peptide prediction. At the beginning, semantic segmentation technology is used to identify specific objects in images. We also used this technology to identify specific patterns – signal peptides in proteins, then predict the cleavage site of signal peptide.
In reality, the annotated signal peptide data is very rare and hard to get, there are more unannotated data. In our work, we modified an existing neural network architecture to simulate the real situation. Apply unannotated data onto the model using the concept of domain-adversarial training to make the prediction more precise. The experimental results show that this method is indeed effective in the eukaryote dataset; in addition, we also replaced the feature extractor of the network or combined two datasets into training, the prediction become more accurate. Finally, using bacteria data as annotated data and eukaryote data as unannotated data, we found that adding these unannotated eukaryote data can increase the performance of models using only annotated bacteria data.
1. Petersen TN, Brunak S, von Heijne G, Nielsen H: SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature methods 2011, 8(10):785.
2. Savojardo C, Martelli PL, Fariselli P, Casadio R: DeepSig: deep learning improves signal peptide detection in proteins. Bioinformatics 2017, 34(10):1690-1696.
3. Käll L, Krogh A, Sonnhammer EL: A combined transmembrane topology and signal peptide prediction method. Journal of molecular biology 2004, 338(5):1027-1036.
4. Reynolds SM, Käll L, Riffle ME, Bilmes JA, Noble WS: Transmembrane topology and signal peptide prediction using dynamic bayesian networks. PLoS computational biology 2008, 4(11):e1000213.
5. Von Heijne G: Patterns of amino acids near signal‐sequence cleavage sites. European journal of biochemistry 1983, 133(1):17-21.
6. Von Heijne G: A new method for predicting signal sequence cleavage sites. Nucleic acids research 1986, 14(11):4683-4690.
7. Campos D, Matos S, Oliveira JL: Gimli: open source and high-performance biomedical name recognition. BMC bioinformatics 2013, 14(1):54.
8. Tsuruoka Y, Tateishi Y, Kim J-D, Ohta T, McNaught J, Ananiadou S, Tsujii Ji: Developing a robust part-of-speech tagger for biomedical text. In: Panhellenic Conference on Informatics: 2005. Springer: 382-392.
9. Hochreiter S, Schmidhuber J: Long short-term memory. Neural computation 1997, 9(8):1735-1780.
10. Wang Y, Mao H, Yi Z: Protein secondary structure prediction by using deep learning method. Knowledge-Based Systems 2017, 118:115-123.
11. Wang D, Zeng S, Xu C, Qiu W, Liang Y, Joshi T, Xu D: MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics 2017, 33(24):3909-3916.
12. Wu C-M: A Signal Peptide Prediction Method based on Sementic Segmentation. NCKU 2019.
13. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition: 2016. 3213-3223.
14. Long J, Shelhamer E, Darrell T: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition: 2015. 3431-3440.
15. Ronneberger O, Fischer P, Brox T: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention: 2015. Springer: 234-241.
16. Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla, Senior Member, IEEE: SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.CVPR 2016.
17. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 2018, 40(4):834-848.
18. Seunghoon Hong, Hyeonwoo Noh, Bohyung Han: Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation. CVPR 2015,
19. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio: Generative Adversaial Network. 2014.
20. Sutskever I, Vinyals O, Le QV: Sequence to sequence learning with neural networks. In: Advances in neural information processing systems: 2014. 3104-3112.
21. Blobel G, Sabatini DD: Ribosome-membrane interaction in eukaryotic cells. In: Biomembranes. Springer; 1971: 193-195.
22. Milstein C, Brownlee G, Harrison TM, Mathews M: A possible precursor of immunoglobulin light chains. Nature New Biology 1972, 239(91):117.
23. von Heijne G: The signal peptide. The Journal of membrane biology 1990, 115(3):195-201.
24. Y. LeCun , B. Boser , J. S. Denker , D. Henderson , R. E. Howard , W. Hubbard , L. D. Jackel: Backpropagation Applied to Handwritten Zip Code. Neural Computation 1989, 541-551.
25. Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, Victor Lempitsky: Domain-Adversarial Training of Neural Network. JMLR 2015. arXiv:1505.07818
26. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M: Tensorflow: a system for large-scale machine learning. In: OSDI: 2016. 265-283
27. Kingma DP, Ba J: Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980 2014.
28. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L: Imagenet: A large-scale hierarchical image database. In: Computer Vision and Pattern Recognition, 2009 CVPR 2009 IEEE Conference on: 2009. Ieee: 248-255.
29. Consortium U: The universal protein resource (UniProt) in 2010. Nucleic acids research 2009, 38(suppl_1):D142-D148.