簡易檢索 / 詳目顯示

研究生: 簡仲培
Jian, Zhong-Pei
論文名稱: 以模糊類神經網路從胺基酸序列預測蛋白質的結晶
Prediction of Protein Crystallization from Amino Acid Sequences by Fuzzy Neural Network
指導教授: 鄭智元
Cheng, Chu-Yuan
學位類別: 碩士
Master
系所名稱: 工學院 - 化學工程學系
Department of Chemical Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 中文
論文頁數: 50
中文關鍵詞: 生物資訊蛋白質結晶機器學習特徵選擇模糊-約略集合模糊類神經網路
外文關鍵詞: bioinformatics, protein crystallization, machine learning, feature selection, fuzzy-rough set, fuzzy neural network
相關次數: 點閱:143下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 目前決定蛋白質三及結構最普遍的方法是將蛋白質結晶化後的X射線晶體分析法。決定蛋白質是否能結晶化的因素仍不明朗;然而,這是一個與蛋白質的胺基酸序列有關的特性;因此,一個利用胺基酸組成來評估蛋白質是否能生產繞射品質結晶的模糊類神經網路模式是建構。以51個特徵來表示胺基酸序列,模糊約略集合的特徵選擇演算法被用來縮減特徵及提高預測的準確度。84%的準確度及0.682 的Matthews修正係數顯示了我們的方法優於近期所發表的結果,此對於篩選可結晶的蛋白質是有幫助的。

    The most popular method for determining tertiary structure of proteins is the X-ray crystallography that after proteins yield crystals. The factors that determine the successful crystallization of proteins are poorly understood. However, Crystallization is an individual trait of proteins is correlate with their amino acid sequence. Therefore, a fuzzy neural network model was constructed to assess the chance of a protein to produce diffraction-quality crystals base on their amino acid composition. Amino acid sequences are represented in the space of 51 features, a feature selection algorithm, fuzzy-rough set was used to reduce the dimensionality and improve the prediction accuracy. Based on the accuracy value of 84% and the Matthews correlation coefficient value of 0.682, it is shown to our approach better than previously published methods and can be helpful in screening crystallizable proteins.

    摘要 I Abstract II 目錄 III 表目錄 VI 圖目錄 VII 符號 VIII 第一章 緒論 1 1.1 研究動機與目的 1 1.2 文獻回顧 1 1.3 研究方法與流程 2 第二章 模糊-約略集合特徵選擇 4 2-1 約略集合特徵選擇 4 2-1-1 資訊系統(information system) 5 2-1-2 難以辨識的關係(indiscernibility relation) 5 2-1-3 下界與上界近似(lower and upper approximations) 5 2-1-4 依存(dependency) 6 2-1-5 特徵選擇(feature selection) 7 2-2 模糊-約略集合特徵選擇 7 2-2-1 模糊等價關係(fuzzy equivalence relation) 8 2-2-1 對模糊-約略集合模式的資訊評量(information measure) 9 2-2-3 約減演算法 11 第三章 模糊類神經網路 13 3-1 模糊類神經網路的結構 13 3-2 模糊類神經網路學習演算法 15 第四章 研究材料與方法 17 4-1 結晶與非結晶之蛋白質資料組 17 4-1-1 訓練用和特徵選擇資料組 17 4-1-2 測試用資料組 17 4-2 蛋白質序列特徵 18 4-2-1 一般特徵 18 4-2-2 殘基組成 19 4-2-3 殘基分類組成 19 4-3 特徵選擇 19 4-4 預測方法 20 第五章 結果與討論 22 5-1 特徵選擇 22 5-2 預測效能 23 5-2-1 FNN之網路架構 23 5-2-2 準確度與Matthews Correlation Coefficient 24 5-2-3 與其他方法之比較 24 5-2-4 模糊規則 25 5-2-5 歸屬函數 27 第六章 結論 33 附錄一 模糊-約略集合特徵選擇計算過程 34 附錄二 參數增加法執行過程 44 參考文獻 47

    1. H. Berman, K. Henrick, H. Nakamura, J. L. Markley, The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data, Nucleic Acids Res., 35, D301–D303 (2007).
    2. A. Yee, K. Pardee, D. Christendat, A. Savchenko, A.M. Edwards, C.H. Arrowsmith, Structural proteomics: toward high-throughput structural biology as a tool in functional genomics, Acc. Chem. Res., 36, 183–189 (2003).
    3. Y. C. Chen, C. C. Teng, A model reference control structure using a fuzzy neural network, Fuzzy Sets and Systems, 73, 291-312 (1995).
    4. 王進德 蕭大全編著,類神經網路與模糊控制理論入門,全華科技圖書股份有限公司,台北市,修訂版,2-57, 136-183,(2003)。
    5. Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, (1991).
    6. Q. Hu, D. Yu, Z. Xie, Information-preserving hybrid data reduction based on fuzzy-rough techniques, Pattern Recognit. Lett., 27, 414–423 (2006).
    7. P. Smialowski, T. Schmidt, J. Cox, A. Kirschner, D. Frishman, Will my protein crystallize? A sequence-based predictor, Proteins, 62, 343–355 (2006).
    8. I. M. Overton, G. Padovani, M. A. Girolami, G. J. Barton, ParCrys: a Parzen window density estimation approach to protein crystallisation propensity prediction, Bioinformatics, 24, 901–907 (2008).
    9. L. Chen, R. Oughtred, H. M. Berman, J. Westbrook, TargetDB: a target registration database for structural genomics projects, Bioinformatics, 20, 2860–2862 (2004).
    10. http://pepcdb.pdb.org
    11. R. Jensen, Q. Shen, Fuzzy rough sets assisted attribute selection, IEEE Transactions on Fuzzy Systems, 15, 73–89 (2007).
    12. Q. Shen, A. Chouchoulas, A Fuzzy-Rough Approach for Generating Classification Rules, Pattern Recognition, 35, 341-354 (2002).
    13. D. Li, C. Cheng, New similarity measures of intuitionistic fuzzy sets and application to pattern recognitions, Pattern Recognition Lett, 23, 221–225 (2002).
    14. H. S. Lee, An optimal algorithm for computing the max–min transitive closure of a fuzzy similarity matrix, Fuzzy Sets Syst., 123, 129–136 (2001).
    15. W. Sun, Segmentation method of MRI using fuzzy Gaussian basis neural network, Neural Information Processing, 8, 19-24 (2005).
    16. J. Kyte, R. F. Doolittle, A simple method for displaying the hydropathic character of a protein, J. Mol. Biol, 157, 105-132 (1982).
    17. A. J. Ikai, Thermostability and aliphatic index of globular proteins. J. Biochem, 88, 1895-1898 (1980).
    18. K. Guruprasad, B. V. B. Reddy, M. W. Pandit, Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence, Protein Eng., 4, 155-161 (1990).
    19. E. Gasteiger, C. Hoogland, A. Gattiker, S. Duvaud, M. R. Wilkins, R. D. Appel, A. Bairoch, Protein Identification and Analysis Tools on the ExPASy Server, The proteomics protocols handbook , Humana Press, 571-607 (2005).
    20. P. Chakrabarti, D. Pal, The interrelationships of side-chain and main-chain conformations in proteins, Prog. Biophys. Mol. Biol, 76, 1–102 (2001).
    21. G. D. Rose, A. R. Geselowtiz, G. J. Lesser, R. H. Lee, M. H. Zehfus, Hydrophobicity of amino acid residues in globular proteins, Science, 229, 834–838 (1985).
    22. L. R. Murphy, A. Wallqvist, R. M. Levy, Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng., 13, 149–152 (2000).
    23. J. L. Fauchere, M. Charton, L. B. Kier, A. Verloop, V. Pliska, Amino acid side chain parameters for correlation studies in biology and pharmacology, Int. J. Pept. Protein Res., 32, 269–278 (1988).
    24. L. Adamian, J. Liang, Interhelical hydrogen bonds and spatial motifs in membrane proteins: polar clamps and serine zippers, Proteins, 47, 209–218 (2002).
    25. W. R. Krigbaum, A. Komoriya, Local interactions as a structure determinant for protein molecules: II, Biochim Biophys Acta, 576, 204–248 (1979).
    26. H. Liu, L. Yu, Toward Integrating Feature Selection Algorithms for Classification and Clustering, IEEE Trans. Knowledge and Data Eng., 17, 1–12 (2005).
    27. Q. H. Hu, Z. X. Xie, D. R. Yu, Hybrid attribute reduction based on a novel fuzzy rough model and information granulation, Pattern Recognition, 40, 3509–3521 (2007).
    28. S. Horikawa, T. Furuhashi, Y. Uchikawa, On fuzzy modeling using fuzzy neural networks with the back-propagation algorithm, IEEE Trans. Neural Networks, 3, 801-806 (1992).
    29. P. Smialowski, T. Schmidt, J. Cox, A. Kirschner, D. Frishman, Will my protein crystallize? A sequence-based predictor, Proteins, 62, 343–355 (2006).
    30. I. M. Overton, G. J. Barton, A normalised scale for structural genomics target ranking: The OB-Score, FEBS Letters, 580, 4005-4009 (2006).
    31. P. Smialowski, T. Schmidt, J. Cox, A. Kirschner, D. Frishman, Will my protein crystallize? A sequence-based predictor, Proteins ,62 , 343–355 (2006).
    32. k. Chen, L. Kurgan, M. Rahbari, Prediction of protein crystallization using collocation of amino acid pairs, Biochem. Biophys. Res. Commun., 355, 764-769 (2007).
    33. C. E. SHANNON, A Mathematical Theory of Communication, Bell Syst. Tech. J., 27, 379-423 (1948).

    下載圖示 校內:2010-06-12公開
    校外:2010-06-12公開
    QR CODE