簡易檢索 / 詳目顯示

研究生: 吳哲銘
Wu, Che-Ming
論文名稱: 基於語義分割的信號肽預測方法
A Signal Peptide Prediction Method based on Semantic Segmentation
指導教授: 張天豪
Chang, Tien-Hao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 40
中文關鍵詞: 信號肽預測語義分割卷積神經網路
外文關鍵詞: signal peptide prediction, semantic segmentation, convolutional neural network
相關次數: 點閱:145下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 信號肽是新生蛋白質氨基端帶有的一段胺基酸序列,它主要的功能與蛋白質運送機制有關,識別信號肽是了解蛋白質功能與運送目的地的先決條件,也有助於藥物設計。但是礙於信號肽的序列保守性極低,內部結構變動較大,因此很難直接透過胺基酸排序來識別信號肽是否存在。在過去識別信號肽需要透過實驗分離出新生蛋白質與成熟蛋白質進行序列比對,但是這個做法相當花時間,這也間接造成目前有標註的信號肽資料稀少,因此許多學者利用這些有限的樣本發展出各種基於統計學與機器學習的方法應用於識別信號肽,而這些方法主要核心的演算法大多是隱藏馬可夫鏈與類神經網路,但是近年來也開始有人利用深度學習來識別信號肽,並且取得相當不錯的成果。
    本研究將由深度學習中的卷積神經網路所發展出來的語義分割技術應用至信號肽預測中,語義分割技術原本是用於識別出圖片中特定物體所涵蓋的區域,而這個概念剛好與信號肽預測相當接近,我們可以透過語義分割技術來識別蛋白質序列中信號肽所涵蓋的區段,在本研究中我們改良了現有的語義分割模型的架構,將其運用至信號肽預測當中,就我們所知,這是第一個結合語義分割與分子生物學的應用,而實驗結果顯示,這個做法在真核生物資料集上非常有效。最後我們利用我們自己定義的視覺化方法將模型內部的數值用圖表的方式呈現出來,透過這些圖表我們更進一步分析我們的模型是如何區分信號肽的。

    Signal peptides are amino acid sequence fragment located at N-termini of newborn proteins. Their main functionality is related to protein sorting. Identifying signal peptides is pre-requirement to discover protein function and destination and good at drug design. However, signal peptides are low conserved sequences and have high variations in amino acid compositions. It is difficult to recognize signal peptides by checking their sequence patterns. In the past, we could identifying signal peptides by using protein sequencing to compare mature proteins with newborn proteins. But this way will spend much time, and this also indirectly causes few labeled data of signal peptides. So, many researches started to use machine learning and statistical technology to recognize signal peptides. And most of those researches suggested their algorithms which are based on neural networks or hidden Markov model to do this work. In recent years, some people started to use deep learning to solve this problem and got the state-of-the-art performance.
    In this work, we applied semantic segmentation technology which is a kind of convolutional neural networks technology in signal peptide prediction. At the beginning, semantic segmentation technology is used to identify specific objects in images. We used this technology to identify specific patterns – signal peptides in proteins. In our work, we modified an existing neural network architecture and make it can be input protein data. As our knowledge, this is the first application which combines semantic segmentation and bioinformatics. Our experiment displays a good result on the eukaryotes dataset. We also design a method to visualize weights in our semantic segmentation model. With this method, we can analysis how our model recognize signal peptides.

    圖目錄 XVIII 表目錄 XX 第一章 緒論 1 第二章 相關研究 6 2.1 信號肽 6 2.2 信號肽預測方法 7 2.2.1 Phobius 7 2.2.2 Philius 8 2.2.3 SignalP 4.0 10 2.2.4 DeepSig 11 2.3 深度學習 12 2.3.1 卷積神經網路 12 2.3.2 U-Net 15 第三章 研究方法 17 3.1 資料集 17 3.1.1 SignalP 資料集 17 3.1.2 SPDS17 資料集 18 3.2 資料前處理 19 3.2.1 特徵編碼 19 3.2.2 標籤編碼 19 3.3 語義分割模型架構 20 3.4 超參數最佳化 22 3.5 模型聚合 22 3.6 模型訓練配置 23 第四章 實驗結果 24 4.1 效能評估標準 24 4.2 對真核生物樣本表現評估 25 4.2.1 SignalP 資料集 25 4.2.2 SPDS17 資料集 26 4.3 對革蘭氏菌樣本表現評估 26 4.4 模型複雜度對評估表現的影響 28 4.5 合併真核生物、革蘭氏菌樣本訓練模型 30 4.6 模型視覺化 31 第五章 結論 34 5.1 結果探討 34 5.2 未來展望 34 參考文獻 35 附錄 38

    1. von Heijne G: The signal peptide. The Journal of membrane biology 1990, 115(3):195-201.
    2. Petersen TN, Brunak S, von Heijne G, Nielsen H: SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature methods 2011, 8(10):785.
    3. Consortium U: The universal protein resource (UniProt) in 2010. Nucleic acids research 2009, 38(suppl_1):D142-D148.
    4. Hobohm U, Scharf M, Schneider R, Sander C: Selection of representative protein data sets. Protein Science 1992, 1(3):409-417.
    5. Savojardo C, Martelli PL, Fariselli P, Casadio R: DeepSig: deep learning improves signal peptide detection in proteins. Bioinformatics 2017, 34(10):1690-1696.
    6. Käll L, Krogh A, Sonnhammer EL: A combined transmembrane topology and signal peptide prediction method. Journal of molecular biology 2004, 338(5):1027-1036.
    7. Reynolds SM, Käll L, Riffle ME, Bilmes JA, Noble WS: Transmembrane topology and signal peptide prediction using dynamic bayesian networks. PLoS computational biology 2008, 4(11):e1000213.
    8. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 2016. 2921-2929.
    9. Montavon G, Lapuschkin S, Binder A, Samek W, Müller K-R: Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recognition 2017, 65:211-222.
    10. Von Heijne G: Patterns of amino acids near signal‐sequence cleavage sites. European journal of biochemistry 1983, 133(1):17-21.
    11. Von Heijne G: A new method for predicting signal sequence cleavage sites. Nucleic acids research 1986, 14(11):4683-4690.
    12. Nielsen H, Krogh A: Prediction of signal peptides and signal anchors by a hidden Markov model. In: Ismb: 1998. 122-130.
    13. Nielsen H, Engelbrecht J, Brunak S, von Heijne G: Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein engineering 1997, 10(1):1-6.
    14. Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. Journal of molecular biology 2004, 340(4):783-795.
    15. Cai Y-D, Lin S-l, Chou K-C: Support vector machines for prediction of protein signal sequences and their cleavage sites. Peptides 2003, 24(1):159-161.
    16. Nugent T, Jones DT: Transmembrane protein topology prediction using support vector machines. BMC bioinformatics 2009, 10(1):159.
    17. Krizhevsky A, Sutskever I, Hinton GE: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems: 2012. 1097-1105.
    18. Simonyan K, Zisserman A: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556 2014.
    19. Hochreiter S, Schmidhuber J: Long short-term memory. Neural computation 1997, 9(8):1735-1780.
    20. Wang D, Zeng S, Xu C, Qiu W, Liang Y, Joshi T, Xu D: MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics 2017, 33(24):3909-3916.
    21. Wang Y, Mao H, Yi Z: Protein secondary structure prediction by using deep learning method. Knowledge-Based Systems 2017, 118:115-123.
    22. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition: 2016. 3213-3223.
    23. Long J, Shelhamer E, Darrell T: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition: 2015. 3431-3440.
    24. Ronneberger O, Fischer P, Brox T: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention: 2015. Springer: 234-241.
    25. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 2018, 40(4):834-848.
    26. Sutskever I, Vinyals O, Le QV: Sequence to sequence learning with neural networks. In: Advances in neural information processing systems: 2014. 3104-3112.
    27. Blobel G, Sabatini DD: Ribosome-membrane interaction in eukaryotic cells. In: Biomembranes. Springer; 1971: 193-195.
    28. Milstein C, Brownlee G, Harrison TM, Mathews M: A possible precursor of immunoglobulin light chains. Nature New Biology 1972, 239(91):117.
    29. Blobel G, Dobberstein B: Transfer of proteins across membranes. I. Presence of proteolytically processed and unprocessed nascent immunoglobulin light chains on membrane-bound ribosomes of murine myeloma. The Journal of cell biology 1975, 67(3):835-851.
    30. Blobel G, Dobberstein B: Transfer of proteins across membranes. II. Reconstitution of functional rough microsomes from heterologous components. The Journal of cell biology 1975, 67(3):852-862.
    31. Walter P, Blobel G: Purification of a membrane-associated protein complex required for protein translocation across the endoplasmic reticulum. Proceedings of the National Academy of Sciences 1980, 77(12):7112-7116.
    32. Walter P, Blobel G: Signal recognition particle contains a 7S RNA essential for protein translocation across the endoplasmic reticulum. Nature 1982, 299(5885):691.
    33. Krogh A, Larsson B, Von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. Journal of molecular biology 2001, 305(3):567-580.
    34. LeCun Y, Bottou L, Bengio Y, Haffner P: Gradient-based learning applied to document recognition. Proceedings of the IEEE 1998, 86(11):2278-2324.
    35. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L: Imagenet: A large-scale hierarchical image database. In: Computer Vision and Pattern Recognition, 2009 CVPR 2009 IEEE Conference on: 2009. Ieee: 248-255.
    36. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M: Tensorflow: a system for large-scale machine learning. In: OSDI: 2016. 265-283.
    37. Kingma DP, Ba J: Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980 2014.
    38. Matthews BW: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 1975, 405(2):442-451.
    39. Moosavi-Dezfooli S-M, Fawzi A, Frossard P: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 2016. 2574-2582.
    40. Papernot N, McDaniel P, Goodfellow I, Jha S, Celik ZB, Swami A: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security: 2017. ACM: 506-519.

    下載圖示 校內:立即公開
    校外:2021-01-01公開
    QR CODE