簡易檢索 / 詳目顯示

研究生: 鄭博瀚
Cheng, Po-Han
論文名稱: 一個利用Delaunay三角化模型的蛋白質表面搜尋的去氧核糖核酸結合蛋白預測方法
Protein Surface Search in DNA-binding Protein Prediction by Delaunay Triangulation Modeling
指導教授: 高宏宇
Kao, Hung-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 54
中文關鍵詞: 蛋白質結構比對蛋白質表面搜尋Voronoi圖Delaunay三角化去氧核醣核酸結合蛋白預測相對溶劑可接觸面積雌激素應答因子
外文關鍵詞: Protein structure alignment, Protein surface search, Voronoi diagram, Delaunay triangulation, DNA-binding protein prediction, Relative solvent accessible area, Estrogen response elements
相關次數: 點閱:233下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,結構蛋白質組計畫及基因組定序的快速進展使得蛋白質結構和序列的資訊量急速提升。越來越多蛋白質的研究也被提出,例如轉錄因子預測,去氧核醣核酸結合蛋白預測,同源模擬,以及蛋白質交互作用等等。在這些研究中蛋白質結構比對的技術都扮演重要的角色。而目前為止已有許多的蛋白質比對方法被提出,但只有少數的研究針對蛋白質表面進行比對。
    在本研究中,我們提出一個基於蛋白質表面搜尋的去氧核醣核酸結合蛋白預測系統,利用Voronoi diagram以及Delaunay triangulation技術建立蛋白質表面之三角化模型。此外,我們設計一個疊代方法以建立連續且完整的蛋白質三角化表面。最後,一個整合表面結構資訊的系統將被用於搜尋蛋白質的共同表面,並用來預測去氧核醣核酸結合蛋白。
    與過去方法比較的實驗中,過濾掉的內部三角形數量有顯著提升。再者,尋找表面氨基酸的實驗中也比以相對溶劑可接觸面積為門檻的方法好。此外,針對整個PDB資料庫的搜尋可以鑒別出雌激素受體的蛋白家族。最後,我們將雌激素受體在整個PDB的搜尋結果與真實的化學實驗結果比較,證實預測高分的14-3-3β蛋白確實會與雌激素應答因子結合。

    In recent years, high-throughput structural proteomics and genome sequencing have lead to a burst in the amount of structure and sequence information available. Various protein researches have been published, such as transcription factor prediction, DNA-binding protein prediction, homology modeling, and protein-protein interaction. Structure alignment techniques have played an important role in those researches. Many protein alignment methods have been brought up, yet only few have concentrated on protein surfaces.
    Among this research, we have developed a DNA-binding protein prediction system based on protein surface search, using Voronoi diagram and Delaunay triangulation to model molecular surface. Also, we designed an iterative method to construct a consecutive and closed surface of protein. Finally, a system integrating surface structure information is applied to search common surface of proteins, and to predict protein-DNA interaction.
    While comparing to our previous method, there’s a significant increase of inner triangles identified and removed. Furthermore, experimental data showed that our method has better performance in identifying surface residues than the widely used method using an RASA cutoff. Besides, as we search the entire Protein Data Bank (PDB) for similar surface to Estrogen Receptor α (ERα), we discovered that the majority of the family of ERα in Pfam has been found with high scores. Finally, we compared our data with results of chemical experiment that suggest several proteins tending to bind to Estrogen Response Elements (ERE), finding an interesting result that the 14-3-3β protein with high score in our prediction is proven to bind to ERE in the further chemical experiment.

    中文摘要 III ABSTRACT IV Content VI Table listing X Figure listing XI 1. INTRODUCTION 1 1.1 Background 1 1.2 Motivation 1 1.3 Our approach 4 1.4 Paper structure 6 2. RELATED WORK 7 2.1 Related research 7 2.1.1 Alignment 7 2.1.2 Voronoi Tessellation 8 2.1.3 Delaunay Triangulation 11 2.1.4 DSSP 12 2.2 Data resource 14 2.2.1 PDB 14 2.2.2 UniProt 15 2.2.3 TRANSFAC 16 2.2.4 BAliBASE 3.0 16 3. METHOD 19 3.1 Overview 19 3.2 Protein Structure Triangulation 19 3.3 Protein Surface Retrieval 20 3.3.1 ISB (Initial Spherical Border): Method of Previous Work 20 3.3.2 ATR (Accumulated Tetrahedron Reduction): Improvement of ISB 21 3.3.3 IATR (Iterative Accumulated Tetrahedron Reduction): Improvement of ATR 24 3.4 Identifying Similar Triangles 28 3.4.1 Triangle Distance 28 3.4.2 Amino Acid Substitution Matrix 28 3.4.3 Metric SSE exchange matrix 29 3.4.4 RASA 30 4. EXPERIMENTS 31 4.1 Dataset 31 4.2 Experiment 33 4.2.1 Evaluation of Surface Extraction 33 4.2.2 Influence of Distance Threshold on Triangle Surface 35 4.2.3 Evaluation of Extracted Surface 36 4.2.4 Distinguishing Different Structure 43 4.2.5 Experiment on PDB 45 4.2.6 Prediction of Zf-C4 family 47 4.2.7 Compare to Results of Chemical Experiment 49 5. CONCLUSIONS AND FUTURE WORK 51 5.1 Conclusions 51 5.2 Future work 51 6. REFERENCES 52

    1. Alexander, C., PAM matrix for BLAST algorithm. 2002.
    2. Apweiler, R., et al., The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Research. 38: p. D142-D148.
    3. Aytuna, A.S., A. Gursoy, and O. Keskin, Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces. Bioinformatics, 2005. 21(12): p. 2850-2855.
    4. Berman, H., K. Henrick, and H. Nakamura, Announcing the worldwide Protein Data Bank. Nature Structural Biology, 2003. 10(12): p. 980-980.
    5. Berman, H.M., et al., The Protein Data Bank. Nucleic Acids Research, 2000. 28(1): p. 235-242.
    6. Bhardwaj, N., et al., Kernel-based machine learning protocol for predicting DNA-binding proteins. Nucleic Acids Research, 2005. 33(20): p. 6486-6493.
    7. Bostick, D. and Vaisman, II, A new topological method to measure protein structure similarity. Biochemical and Biophysical Research Communications, 2003. 304(2): p. 320-325.
    8. Chakravarty, S., A. Bhinge, and R. Varadarajan, A procedure for detection and quantitation of cavity volumes in proteins - Application to measure the strength of the hydrophobic driving force in protein folding. Journal of Biological Chemistry, 2002. 277(35): p. 31345-31353.
    9. Chang, P.L., A.W. Rinne, and T.G. Dewey, Structure alignment based on coding of local geometric measures. Bmc Bioinformatics, 2006. 7.
    10. Chen, W.-J., P.-C. Chuang, and H.-Y. Kao, Hidden Markov Model Based DNA-binding Proteins Prediction by Mining on Sequence and Structure Information. International Computer Symposium, 2008: p. 33-38.
    11. Chen, X.W. and J.C. Jeong, Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics, 2009. 25(5): p. 585-591.
    12. Chuang, P.-C., et al., Protein Surface Comparison Algorithm for Predicting Dna Binding Proteins - Using Estrogen Response Element (Ere) As a Model. RECOMB Satellite Conference on Computational Proteomics, 2010.
    13. Das, S., A. Kokardekar, and C.M. Breneman, Rapid Comparison of Protein Binding Site Surfaces with Property Encoded Shape Distributions. Journal of Chemical Information and Modeling, 2009. 49(12): p. 2863-2872.
    14. de Berg, M., et al., Computational Geometry: Algorithms and Applications. 2008.
    15. Ferre, F., et al., SURFACE: a database of protein surface regions for functional annotation. Nucleic Acids Research, 2004. 32: p. D240-D244.
    16. Fleming, P.J. and F.M. Richards, Protein packing: Dependence on protein size, secondary structure and amino acid composition. Journal of Molecular Biology, 2000. 299(2): p. 487-498.
    17. Gao, M. and J. Skolnick, DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions. Nucleic Acids Research, 2008. 36(12): p. 3978-3992.
    18. Gong, S., et al., A protein domain interaction interface database: InterPare. Bmc Bioinformatics, 2005. 6.
    19. Hobohm, U. and C. Sander, Enlarged representative set of protein structures. Protein Science, 1994. 3(3): p. 522-524.
    20. Ilyin, V.A., A. Abyzov, and C.M. Leslin, Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point. Protein Science, 2004. 13(7): p. 1865-1874.
    21. Kabsch, W. and C. Sander, Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983. 22(12): p. 2577-2637.
    22. Krishnamoorthy, B. and A. Tropsha, Development of a four-body statistical pseudo-potential to discriminate native from non-native protein conformations. Bioinformatics, 2003. 19(12): p. 1540-1548.
    23. Krissinel, E. and K. Henrick, Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallographica Section D-Biological Crystallography, 2004. 60: p. 2256-2268.
    24. Lee, B. and F.M. Richards, The interpretation of protein structures: estimation of static accessibility. Journal of Molecular Biology, 1971. 55(3): p. 379-&.
    25. Liu, Z.-P., et al., Predicting gene ontology functions from protein's regional surface structures. Bmc Bioinformatics, 2007. 8: p. Article No.: 475.
    26. Luscombe, N.M., et al., An overview of the structures of protein-DNA complexes. Genome Biology, 2000. 1(1): p. REVIEWS001.
    27. Marabotti, A., G. Colonna, and A. Facchiano, New computational strategy to analyze the interactions of ER alpha and ER beta with different ERE sequences. Journal of Computational Chemistry, 2007. 28(6): p. 1031-1041.
    28. Murzin, A.G., et al., SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 1995. 247(4): p. 537-540.
    29. Ofran, Y., V. Mysore, and B. Rost, Prediction of DNA-binding residues from sequence. Bioinformatics, 2007. 23(13): p. I347-I353.
    30. Richards, F.M., Areas, volumes, packing and protein structure. Mullins, L. J. (Ed.). Annual Review of Biophysics and Bioengineering, Vol. 6. X+565p. Illus. Annual Reviews Inc.: Palo Alto, Calif., USA. Isbn 0-8243-1806-4, 1977: p. 151-176.
    31. Sacan, A. and I.H. Toroslu, Amino Acid Substitution Matrices Based on 4-Body Delaunay Contact Profiles. IEEE 7th Intl Symp on Bioinformatics and Bioengineering (IEEE-BIBE 2007), 2007: p. 796-802.
    32. Sacan, A., I.H. Toroslu, and H. Ferhatosmanoglu, Integrated search and alignment of protein structures. Bioinformatics, 2008. 24(24): p. 2872-2879.
    33. Sanner, M.F., A.J. Olson, and J.C. Spehner, Reduced surface: An efficient way to compute molecular surfaces. Biopolymers, 1996. 38(3): p. 305-320.
    34. Soyer, A., et al., Voronoi tessellation reveals the condensed matter character of folded proteins. Physical Review Letters, 2000. 85(16): p. 3532-3535.
    35. Stawiski, E.W., L.M. Gregoret, and Y. Mandel-Gutfreund, Annotating nucleic acid-binding function based on protein structure. Journal of Molecular Biology, 2003. 326(4): p. 1065-1079.
    36. Szilágyi, A. and J. Skolnick, Efficient prediction of nucleic acid binding function from low-resolution protein structures. Journal of Molecular Biology, 2006. 358(3): p. 922-933.
    37. Thompson, J.D., D.G. Higgins, and T.J. Gibson, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 1994. 22(22): p. 4673-4680.
    38. Thompson, J.D., et al., BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark. Proteins-Structure Function and Bioinformatics, 2005. 61(1): p. 127-136.
    39. Wallqvist, A., et al., Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases. Bioinformatics, 2000. 16(11): p. 988-1002.
    40. Watson, J.D., R.A. Laskowski, and J.M. Thornton, Predicting protein function from sequence and structural data. Current Opinion in Structural Biology, 2005. 15(3): p. 275-284.
    41. Wingender, E., et al., TRANSFAC: A database on transcription factors and their DNA binding sites. Nucleic Acids Research, 1996. 24(1): p. 238-241.
    42. Yang, J.M. and C.H. Tung, Protein structure database search and evolutionary classification. Nucleic Acids Research, 2006. 34(13): p. 3646-3659.
    43. Zhao, H.Y., Y.D. Yang, and Y.Q. Zhou, Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function. Bioinformatics, 2010. 26(15): p. 1857-1863.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE