| 研究生: |
簡仲培 Jian, Zhong-Pei |
|---|---|
| 論文名稱: |
以模糊類神經網路從胺基酸序列預測蛋白質的結晶 Prediction of Protein Crystallization from Amino Acid Sequences by Fuzzy Neural Network |
| 指導教授: |
鄭智元
Cheng, Chu-Yuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 化學工程學系 Department of Chemical Engineering |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 中文 |
| 論文頁數: | 50 |
| 中文關鍵詞: | 生物資訊 、蛋白質結晶 、機器學習 、特徵選擇 、模糊-約略集合 、模糊類神經網路 |
| 外文關鍵詞: | bioinformatics, protein crystallization, machine learning, feature selection, fuzzy-rough set, fuzzy neural network |
| 相關次數: | 點閱:143 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
目前決定蛋白質三及結構最普遍的方法是將蛋白質結晶化後的X射線晶體分析法。決定蛋白質是否能結晶化的因素仍不明朗;然而,這是一個與蛋白質的胺基酸序列有關的特性;因此,一個利用胺基酸組成來評估蛋白質是否能生產繞射品質結晶的模糊類神經網路模式是建構。以51個特徵來表示胺基酸序列,模糊約略集合的特徵選擇演算法被用來縮減特徵及提高預測的準確度。84%的準確度及0.682 的Matthews修正係數顯示了我們的方法優於近期所發表的結果,此對於篩選可結晶的蛋白質是有幫助的。
The most popular method for determining tertiary structure of proteins is the X-ray crystallography that after proteins yield crystals. The factors that determine the successful crystallization of proteins are poorly understood. However, Crystallization is an individual trait of proteins is correlate with their amino acid sequence. Therefore, a fuzzy neural network model was constructed to assess the chance of a protein to produce diffraction-quality crystals base on their amino acid composition. Amino acid sequences are represented in the space of 51 features, a feature selection algorithm, fuzzy-rough set was used to reduce the dimensionality and improve the prediction accuracy. Based on the accuracy value of 84% and the Matthews correlation coefficient value of 0.682, it is shown to our approach better than previously published methods and can be helpful in screening crystallizable proteins.
1. H. Berman, K. Henrick, H. Nakamura, J. L. Markley, The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data, Nucleic Acids Res., 35, D301–D303 (2007).
2. A. Yee, K. Pardee, D. Christendat, A. Savchenko, A.M. Edwards, C.H. Arrowsmith, Structural proteomics: toward high-throughput structural biology as a tool in functional genomics, Acc. Chem. Res., 36, 183–189 (2003).
3. Y. C. Chen, C. C. Teng, A model reference control structure using a fuzzy neural network, Fuzzy Sets and Systems, 73, 291-312 (1995).
4. 王進德 蕭大全編著,類神經網路與模糊控制理論入門,全華科技圖書股份有限公司,台北市,修訂版,2-57, 136-183,(2003)。
5. Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, (1991).
6. Q. Hu, D. Yu, Z. Xie, Information-preserving hybrid data reduction based on fuzzy-rough techniques, Pattern Recognit. Lett., 27, 414–423 (2006).
7. P. Smialowski, T. Schmidt, J. Cox, A. Kirschner, D. Frishman, Will my protein crystallize? A sequence-based predictor, Proteins, 62, 343–355 (2006).
8. I. M. Overton, G. Padovani, M. A. Girolami, G. J. Barton, ParCrys: a Parzen window density estimation approach to protein crystallisation propensity prediction, Bioinformatics, 24, 901–907 (2008).
9. L. Chen, R. Oughtred, H. M. Berman, J. Westbrook, TargetDB: a target registration database for structural genomics projects, Bioinformatics, 20, 2860–2862 (2004).
10. http://pepcdb.pdb.org
11. R. Jensen, Q. Shen, Fuzzy rough sets assisted attribute selection, IEEE Transactions on Fuzzy Systems, 15, 73–89 (2007).
12. Q. Shen, A. Chouchoulas, A Fuzzy-Rough Approach for Generating Classification Rules, Pattern Recognition, 35, 341-354 (2002).
13. D. Li, C. Cheng, New similarity measures of intuitionistic fuzzy sets and application to pattern recognitions, Pattern Recognition Lett, 23, 221–225 (2002).
14. H. S. Lee, An optimal algorithm for computing the max–min transitive closure of a fuzzy similarity matrix, Fuzzy Sets Syst., 123, 129–136 (2001).
15. W. Sun, Segmentation method of MRI using fuzzy Gaussian basis neural network, Neural Information Processing, 8, 19-24 (2005).
16. J. Kyte, R. F. Doolittle, A simple method for displaying the hydropathic character of a protein, J. Mol. Biol, 157, 105-132 (1982).
17. A. J. Ikai, Thermostability and aliphatic index of globular proteins. J. Biochem, 88, 1895-1898 (1980).
18. K. Guruprasad, B. V. B. Reddy, M. W. Pandit, Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence, Protein Eng., 4, 155-161 (1990).
19. E. Gasteiger, C. Hoogland, A. Gattiker, S. Duvaud, M. R. Wilkins, R. D. Appel, A. Bairoch, Protein Identification and Analysis Tools on the ExPASy Server, The proteomics protocols handbook , Humana Press, 571-607 (2005).
20. P. Chakrabarti, D. Pal, The interrelationships of side-chain and main-chain conformations in proteins, Prog. Biophys. Mol. Biol, 76, 1–102 (2001).
21. G. D. Rose, A. R. Geselowtiz, G. J. Lesser, R. H. Lee, M. H. Zehfus, Hydrophobicity of amino acid residues in globular proteins, Science, 229, 834–838 (1985).
22. L. R. Murphy, A. Wallqvist, R. M. Levy, Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng., 13, 149–152 (2000).
23. J. L. Fauchere, M. Charton, L. B. Kier, A. Verloop, V. Pliska, Amino acid side chain parameters for correlation studies in biology and pharmacology, Int. J. Pept. Protein Res., 32, 269–278 (1988).
24. L. Adamian, J. Liang, Interhelical hydrogen bonds and spatial motifs in membrane proteins: polar clamps and serine zippers, Proteins, 47, 209–218 (2002).
25. W. R. Krigbaum, A. Komoriya, Local interactions as a structure determinant for protein molecules: II, Biochim Biophys Acta, 576, 204–248 (1979).
26. H. Liu, L. Yu, Toward Integrating Feature Selection Algorithms for Classification and Clustering, IEEE Trans. Knowledge and Data Eng., 17, 1–12 (2005).
27. Q. H. Hu, Z. X. Xie, D. R. Yu, Hybrid attribute reduction based on a novel fuzzy rough model and information granulation, Pattern Recognition, 40, 3509–3521 (2007).
28. S. Horikawa, T. Furuhashi, Y. Uchikawa, On fuzzy modeling using fuzzy neural networks with the back-propagation algorithm, IEEE Trans. Neural Networks, 3, 801-806 (1992).
29. P. Smialowski, T. Schmidt, J. Cox, A. Kirschner, D. Frishman, Will my protein crystallize? A sequence-based predictor, Proteins, 62, 343–355 (2006).
30. I. M. Overton, G. J. Barton, A normalised scale for structural genomics target ranking: The OB-Score, FEBS Letters, 580, 4005-4009 (2006).
31. P. Smialowski, T. Schmidt, J. Cox, A. Kirschner, D. Frishman, Will my protein crystallize? A sequence-based predictor, Proteins ,62 , 343–355 (2006).
32. k. Chen, L. Kurgan, M. Rahbari, Prediction of protein crystallization using collocation of amino acid pairs, Biochem. Biophys. Res. Commun., 355, 764-769 (2007).
33. C. E. SHANNON, A Mathematical Theory of Communication, Bell Syst. Tech. J., 27, 379-423 (1948).