| 研究生: |
江秉錂 Chiang, Bing-Ling |
|---|---|
| 論文名稱: |
建構分析和預測結合蛋白質作用之iNucProt與pNucProt Development of iNucProt and pNucProt for analysis and prediction of the interactions between DNA-binding proteins and DNA |
| 指導教授: |
曾新穆
Tseng, Vincent Shin-Mu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 中文 |
| 論文頁數: | 61 |
| 中文關鍵詞: | 生物資訊 、資料探勘 、蛋白質結構分析 |
| 外文關鍵詞: | bioinformatics, data mining, protein structure analysis |
| 相關次數: | 點閱:105 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
為了瞭解核酸-蛋白質複合物(Nucleic Acid-protein Complex)的功能需要詳細的三維結構資訊和胺基酸與其作用物之間的特性。在本論文中,我們開發一個用來辨識並以圖表方式輸出DNA與胺基酸之間作用的工具:iNucProt、歸納DNA中鹼基(Base)、五碳糖(Sugar)與磷酸根(Phosphate)和胺基酸之間作用的資料庫:dNucProt和用於預測蛋白質與DNA之間作用狀況的工具:pNucProt。iNucProt伺服器可讓使用者上傳蛋白質-DNA複合物的PDB檔案,分析辨識出檔案內作用的資訊後,以圖和表的方式輸出這些作用。這些分析結果整理出蛋白質與DNA的作用,包括與鹼基、五碳糖和磷酸根骨架發生的氫鍵(Hydrogen Bond)、透過水或金屬發生的作用(Water- or Metal-mediated Contact)、凡得瓦力(van der Waals contact)和疏水性作用(Hydrophobic Interaction)。iNucProt並提供的概要式的作用圖表示蛋白質與DNA之間的作用。dNucProt是利用iNucProt分析978個PDB檔案得來的。根據結果顯示,凡得瓦力、氫鍵、透過水發生的氫鍵、透過金屬發生的氫鍵和疏水性作用在DNA與蛋白質之間的作用總數分別佔全部的68.59 %、21.03 %、7.11 %、2.85 %和0.42 %。胺基酸R和胺基酸K較偏好藉由凡得瓦力、氫鍵和透過水發生的氫鍵和鹼基G發生作用。相對地,我們辨識出胺基酸(胺基酸A、胺基酸T、胺基酸V和胺基酸K)較偏好藉由疏水性作用與鹼基(鹼基T和鹼基C)發生作用。pNucProt伺服器提供使用者上傳蛋白質-DNA複合物的蛋白質序列,利用iNucProt分析具有同源性(Homologous)蛋白質的三維結構後的結果和dNucProt,預測此蛋白質序列與DNA之間的作用。iNucProt、dNucProt和pNucProt都是用來幫助瞭解DNA與蛋白質之間作用的工具,其網頁位置在http://www.ncku-nucprot.idv.tw/。
Understanding the function of the nucleic acid-protein complex requires a detailed picture of 3D structure and the identity of crucial amino acid residues involved in its interactions with its targets. In this study, we developed iNucProt, a tool to identify and plot the interactions between DNA and protein, dNucProt, a database that summarized the interactions between the base, sugar, and phosphate of DNA and amino acids, and pNucProt, a tool to predict the interactions between DNA and protein. The iNucProt server enables users to submit the 3D coordinates of the protein-DNA complex from a PDB file, and then it identifies the interactions using schematic diagrams and tables. It provides a summary description of protein-DNA interactions, including hydrogen bonds, water- or metal-mediated contacts, van der Waals contacts, and hydrophobic interactions, from bases, phosphate backbones, and sugars. It also generates schematic diagrams representing DNA-binding interactions. The dNucProt database was built from the analysis of 978 PDB entries using iNucProt. The results showed that van der Waals contacts, hydrogen bonds, water-mediated hydrogen bonds, metal-mediated contacts, and hydrophobic interactions between DNA and protein were 68.59 %, 21.03 %, 7.11 %, 2.85 %, and 0.42 %, respectively. Arginine and lysine prefer to interact with guanine via van der Waals contacts, hydrogen bonds, and water-mediated hydrogen bonds. In contrast, hydrophobic interactions between amino acids (alanine, tyosine, valine, and lysine) and the bases (thymine and cytosine) were identified. The pNucProt server enables users to submit the amino acid sequence of DNA-binding protein, and then it predicts its DNA interactions using the results of 3D structures of homologous proteins analyzed by iNucProt and the dNucProt database. iNucProt, dNucProt, and pNucProt are ccomprehensive tools for understanding the interactions between DNA and protein and are available at http://www.ncku-nucprot.idv.tw/.
[1] S. Ahmad, M. M. Gromiha, and A. Sarai, “Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information”, Bioinformatics, Vol. 20, pp. 477-486, 2004.
[2] S. Ahmad, H. Kono, M. J. Arauzo-Bravo1, and A. Sarai, “ReadOut: structure-based calculation of direct and indirect readout energies and specificities for protein–DNA recognition,” Nucleic Acids Research, Vol. 34, pp. 124-127, 2006.
[3] S. Ahmad and A. Sarai, “PSSM-based prediction of DNA binding sites in proteins,” BMC Bioinformatics, Vol. 6, pp. 6-33, 2005.
[4] S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, Vol. 25, pp. 3389–3402, 1997.
[5] V. Atalay and R. Cetin-Atalay, “Implicit motif distribution based hybrid computational kernel for sequence classification,” Bioinformatics, Vol. 21, pp. 1429–1436, 2005.
[6] A. Bondi, “van der Waals Volumes and Radii,” Physical Chemistry, Vol. 68, pp. 441-451, 1964
[7] S. L. Cessie and J. C. Van Houwelingen, “Ridge estimators in logistic regression,” Applied statistics, Vol. 41, pp. 191-201, 1992.
[8] S. Henikoff and J. G. Henikoff, “Amino acid substitution matrices from protein blocks,” Proceeding of the National Academy of Sciences of the United States of America, Vol. 89, pp. 10915-10919, 1992.
[9] S. Hua and Z. Sun, “A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach,” Journal of Molecular Biology, Vol. 308, pp. 397-407, 2001.
[10] S. Hwang, Z. Gou, and I. B. Kuznetsov, “DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins,” Bioinformatics, Vol. 23, pp. 634-636, 2007.
[11] T. Joachims, “Making large scale SVM learning practical”, MIT Press, Cambridge, MA, USA, 1999.
[12] S. Jones, D. T. A. Daley, N. M. Luscombe, and H. M. Berman, “Protein-RNA interactions: a structural analysis,” Journal of Molecular Biology, Vol. 287, pp. 877-896, 1999.
[13] H. Kim and H. Park, “Prediction of Protein Relative Solvent Accessibility with Support Vector Machines and Long-Range Interaction 3D Local Descriptor,” PROTEINS: Structure, Function, and Bioinformatics, Vol. 54, pp. 557–562, 2004.
[14] I. B. Kuznetsov, Z. Gou, Run Li, and S. Hwang, “Using Evolutionary and Structural Information to Predict DNA-Binding Sites on DNA-Binding Proteins,” PROTEINS: Structure, Function, and Bioinformatics, Vol. 64, pp. 19-27, 2006.
[15] J. Kyte and R. F. Doolittle, “A Simple Method for Displaying the Hydropathic Character of a Protein,” Journal of Molecular Biology, Vol. 157, pp. 105-132, 1982.
[16] N. M. Luscombe, S. E. Austin, H. M. Berman, and J. M. Thornton, “An overview of the structures of protein-DNA complexes,” Genome Biology, Vol. 1, pp. 1-37, 2000.
[17] N. M. Luscombe, S. E. Austin, H. M. Berman, and J. M. Thornton, “NUCPLOT: a program to generate schematic diagrams of protein–nucleic acid interactions,” Nucleic Acids Research, Vol. 25, pp. 4940-4945, 1997.
[18] N. M. Luscombe, R. A. Laskowski, and J. M. Thornton, “Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level,” Nucleic Acids Research, Vol. 29, pp. 2860-2874, 2001.
[19] I. k. McDonald and J. M. Thornton, “Satisfying Hydrogen Bonding Potential in Proteins,” Journal of Molecular Biology, Vol. 238, pp. 777-793, 1994.
[20] D. L. Nelson and M. M. Cox, “Lehninger Principles of biochemistry,” W. H. Freeman and Company, 2000.
[21] W. S. Noble, “Support vector machine applications in computational biology,” Kernel Methods in Computational Biology, B. Schoelkopf, K. Tsuda and J.-P. Vert, ed. MIT Press, pp. 71-92, 2004.
[22] C. O. Pabo and R. T. Sauer, “Transcription factors : structural families and principles of DNA recognition,” Annual Review of Biochemistry, Vol. 61, pp. 1053-1095, 1992.
[23] P. Prabakaran, S. Ahmad, M. M. Gromiha, M. G. Singarayan, and A. Sarai, “Classification of protein-DNA complexes based on structural descriptors,” Structure, Vol. 14, pp. 1355-1367, 2006.
[24] J. D. Thompson, D. G. Higgins, and T. J. Gibson, “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic Acids Research, Vol. 22, pp. 4673-4680, 1994.
[25] L. Wang and S. J. Brown, “BindN:a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences,” Nucleic Acids Research, Vol. 34, pp.243-248, 2006.
[26] J. Zhu and T. Hastie, “Kernel Logistic Regression and the Import Vector Machine,” Journal of Computational & Graphical Statistics, Vol. 14, pp. 185-205, 2005.