簡易檢索 / 詳目顯示

研究生: 莊博丞
Chuang, Po-Cheng
論文名稱: 基於胺基酸特性及表面資訊的蛋白質結構比對方法
A Protein Structure Alignment Method Based on Biological Characteristics of Amino Acid and Protein Surface Information
指導教授: 高宏宇
Kao, Hung-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 醫學資訊研究所
Institute of Medical Informatics
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 56
中文關鍵詞: 雌激素應答因子相對溶劑可接觸面積蛋白質結構比對蛋白質表面資訊置換矩陣
外文關鍵詞: Estrogen response elements, Relative solvent accessible area, Substitution matrix, Delaunay triangulation, Voronoi diagram, Protein structure alignment, Protein surface information
相關次數: 點閱:173下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 蛋白質研究一直以來都是非常受重視的研究課題,在後基因體時代,蛋白質區塊的結構資訊和研究被發表的相當迅速。蛋白質的研究也越來越多樣化,大致上有蛋白質表面,蛋白質序列資訊,蛋白質結構資訊,轉錄因子預測,去氧核醣核酸結合蛋白預測,以及蛋白質交互作用等研究。這些研究領域中,蛋白質結構比對的技術都佔有相當重要的地位。而目前為止已有許多的蛋白質比對方法被提出,但只有少數的研究針對蛋白質三級結構進行比對,而針對蛋白質表面的比對更是少數。因此,在本研究中,我們提出一個基於胺基酸特性及表面資訊的蛋白質結構的比對方法。在此機制內將利用Voronoi diagram以及Delaunay triangulation技術過濾構成蛋白質表面之胺基酸。此外,兩個矩陣-胺基酸置換矩陣以及二級結構置換矩陣會搭配相對溶劑可接觸面積應用於過濾胺基酸特性以及二級結構上差異過大之胺基酸。最後,一個特別的比對相似表面的方法將被用於比對兩蛋白質中是否結構上具有高相似度的三角平面存在,並用來評斷兩蛋白質是否有相似度高的表面。
    在實驗中,雌激素應答因子結合蛋白質與非雌激素應答因子結合蛋白質的表面資訊特徵量有2倍以上的差異。再者,我們針對八大類易與去氧核醣核酸結合的蛋白質做了比對,可發現組內及組外量化的數據也有2倍以上的差異。最後,我們將結果應用於雌激素應答因子的轉錄因子之預測。在此部分的實驗中,分類器提升了5%的平均精確度以及5.6%的平均查全率。

    Protein research has always been a critical territory viewed upon. In the post-genome era, the structures of protein domains have been published rapidly. Protein researches have also gained diversity, generally including researches of protein surfaces, protein sequences information, protein structures information, transcription factor prediction, DNA-binding protein prediction, and protein-protein interaction. To date, many protein alignment methods were brought up for the above mentioned researches, yet only few have concentrated on protein tertiary structure alignment, while alignments focused on protein surfaces are even rarer.
    In this research, we have developed a protein structure alignment method based on amino acid features and surface information, the Voronoi diagram and Delaunay triangulation techniques have been employed to distill the amino acid which forms the protein surface. Also, two matrices - Amino Acid Substitution Matrix and Metric SSE Exchange Matrix will collocate with RASA (relative solvent accessible area), and be further applied to sort amino acids that are outliers with respect to the amino acid features and secondary structure. Finally, a special alignment method by surface comparison shall be applied to compare whether there are high structure similarity triangle planes, and to assess if the two proteins exist high resembling surfaces.
    Among this research, a new method of protein alignment had been brought up. Experimental data showed that there existed upper than 2 times discrepancy of surface information between ERE (estrogen response element) and non-ERE binding proteins. Furthermore, as we concentrated on aligning eight major types of protein prone to bind with DNA, we discovered there was also upper than 2 times difference between the two groups. Finally, we proceeded to apply these results to ERE transcription factor predictions. Upon this part of research, classifiers increased 5% of average precision and 5.6% of average recall.

    中文摘要 III ABSTRACT IV TABLE LISTING VIII FIGURE LISTING IX 1. INTRODUCTION 1 1.1. MOTIVATION 1 1.2. METHOD 4 2. RELATED WORK 9 2.1 RELATED RESEARCH 9 2.1.1. Alignment 9 2.1.2. SAS and SES 10 2.1.3. Voronoi Diagram 12 2.1.4. Delaunay Triangulation 15 2.1.5. Distance Matrix 16 2.1.6. DSSP 17 2.1.7. RMSD 18 2.2 DATA RESOURCE 18 2.2.1. PDB 18 2.2.2. UniProt 19 2.2.3. TRANSFAC 20 3. METHOD 21 3.1 OVERVIEW 21 3.2 PROTEIN STRUCTURE TRIANGULATION 23 3.3 SURFACE TRIANGLE RETRIEVE 26 3.4 FILTERING OUT OUTLYING SURFACE TRIANGLES 28 3.5 SURFACE SIMILARITY AND EXTENSION 35 4. EXPERIMENTS 43 4.1 RECOGNIZING PROTEINS WITH SIMILAR SURFACES 43 4.2 MEASURING WHETHER OUR SYSTEM CAN ENHANCE THE OUTCOME ACCURACY OF OTHER RESEARCHES 48 5. CONCLUSION AND FUTURE WORK 53 6. REFERENCES 54

    1. BIOBASE. Available from: http://www.biobase-international.com/pages/index.php.
    2. DSSP. Available from: http://swift.cmbi.kun.nl/gv/dssp/.
    3. PDB annual report. 2008.
    4. The Protein Data Bank. Methods Biochem Anal, 2003. 44: p. 181-98.
    5. PyMOL. Available from: http://pymol.sourceforge.net/.
    6. The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res, 2009. 37(Database issue): p. D169-74.
    7. Vorinri diagram and Delaunay triangulation. Available from: http://www-ee.uta.edu/Online/Devarajan/EE6358/Triangulation_Tetrahedralization.ppt
    8. Alexander, C., PAM matrix for BLAST algorithm. 2002.
    9. Aytuna, A.S., A. Gursoy, and O. Keskin, Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces. Bioinformatics, 2005. 21(12): p. 2850-5.
    10. Berman, H., K. Henrick, and H. Nakamura, Announcing the worldwide Protein Data Bank. Nat Struct Biol, 2003. 10(12): p. 980.
    11. Chang, P.L., A.W. Rinne, and T.G. Dewey, Structure alignment based on coding of local geometric measures. BMC Bioinformatics, 2006. 7: p. 346.
    12. Chen, X.W. and J.C. Jeong, Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics, 2009. 25(5): p. 585-91.
    13. Ferre, F., et al., SURFACE: a database of protein surface regions for functional annotation. Nucleic Acids Res, 2004. 32(Database issue): p. D240-4.
    14. Gong, S., et al., A protein domain interaction interface database: InterPare. BMC Bioinformatics, 2005. 6: p. 207.
    15. Hobohm, U. and C. Sander, Enlarged representative set of protein structures. Protein Sci, 1994. 3(3): p. 522-4.
    16. Ilyin, V.A., A. Abyzov, and C.M. Leslin, Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point. Protein Sci, 2004. 13(7): p. 1865-74.
    17. Kabsch, W. and C. Sander, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983. 22(12): p. 2577-637.
    18. Krissinel, E. and K. Henrick, Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr, 2004. 60(Pt 12 Pt 1): p. 2256-68.
    19. Lee, B. and F.M. Richards, The interpretation of protein structures: estimation of static accessibility. J Mol Biol, 1971. 55(3): p. 379-400.
    20. Liu, Z.P., et al., Predicting gene ontology functions from protein's regional surface structures. BMC Bioinformatics, 2007. 8: p. 475.
    21. Luscombe, N.M., et al., An overview of the structures of protein-DNA complexes. Genome Biol, 2000. 1(1): p. REVIEWS001.
    22. Marabotti, A., G. Colonna, and A. Facchiano, New computational strategy to analyze the interactions of ERalpha and ERbeta with different ERE sequences. J Comput Chem, 2007. 28(6): p. 1031-41.
    23. Murzin, A.G., et al., SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol, 1995. 247(4): p. 536-40.
    24. Richards, F.M., Areas, volumes, packing and protein structure. Annu Rev Biophys Bioeng, 1977. 6: p. 151-76.
    25. Sacan, A., I.H. Toroslu, and H. Ferhatosmanoglu, Integrated search and alignment of protein structures. Bioinformatics, 2008. 24(24): p. 2872-9.
    26. Sanner, M.F., A.J. Olson, and J.C. Spehner, Reduced surface: an efficient way to compute molecular surfaces. Biopolymers, 1996. 38(3): p. 305-20.
    27. Shibuya, T., Efficient substructure RMSD query algorithms. J Comput Biol, 2007. 14(9): p. 1201-7.
    28. Stawiski, E.W., L.M. Gregoret, and Y. Mandel-Gutfreund, Annotating nucleic acid-binding function based on protein structure. J Mol Biol, 2003. 326(4): p. 1065-79.
    29. Thompson, J.D., D.G. Higgins, and T.J. Gibson, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, 1994. 22(22): p. 4673-80.
    30. Toroslu, A.S.a.I.H., Amino Acid Substitution Matrices Based on 4-Body Delaunay Contact Profiles, in IEEE 7th Intl Symp on Bioinformatics and Bioengineering (IEEE-BIBE2007). 2007. p. 796–802.
    31. Wallqvist, A., et al., Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases. Bioinformatics, 2000. 16(11): p. 988-1002.
    32. Watson, J.D., R.A. Laskowski, and J.M. Thornton, Predicting protein function from sequence and structural data. Curr Opin Struct Biol, 2005. 15(3): p. 275-84.
    33. Wei-Jhin Chen, P.-C.C.a.H.-Y.K., Hidden Markov Model Based DNA-binding Proteins Prediction by Mining on Sequence and Structure Information, in International Computer Symposium. 2008. p. 33-38.
    34. Wingender, E., et al., TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res, 1996. 24(1): p. 238-41.
    35. Yang, J.M. and C.H. Tung, Protein structure database search and evolutionary classification. Nucleic Acids Res, 2006. 34(13): p. 3646-59.

    下載圖示 校內:立即公開
    校外:2009-08-26公開
    QR CODE