| 研究生: |
吳建緯 Wu, Jian-Wei |
|---|---|
| 論文名稱: |
運用綜合方法預測功能相關蛋白質 Predicting functionally related proteins using a Hybrid Approach |
| 指導教授: |
張天豪
Chang, Tien-Hao |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2010 |
| 畢業學年度: | 98 |
| 語文別: | 中文 |
| 論文頁數: | 61 |
| 中文關鍵詞: | 功能相關蛋白質 、綜合方法 |
| 外文關鍵詞: | functionally related proteins, hybrid approach |
| 相關次數: | 點閱:42 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,在預測功能相關蛋白質方面,計算方法的研究可粗略分為兩類:基於基因體與演化資訊的方法以及基於序列資訊的方法。其中,基於基因組與演化資訊的方法不受資料數據規模限制,但是需要使用蛋白質序列以外的資料,而且已經被證實在真核生物上的預測效果不佳;而基於序列資訊的方法能達到較佳的精確度,但是在分析大量資料的情況下十分耗時。
本篇論文提出一套架構,試圖整合上述兩類演算法的特性來提升功能相關蛋白質預測的效能。我們預先以基於基因組與演化資訊的方法計算資料為陽性的可能,再利用基於序列資訊的方法進行第二階段的預測,以提高預測的準確性。實驗結果顯示,本篇論文的方法在前50名高度信心預測中,有84%為陽性資料。直到前兩千名高度信心預測為止,本方法預測的陽性資料比例仍有48%,比其他五種現有方法提升12%以上。
Recent functionally related proteins studies can be roughly categorized into two distinct groups: a) those based on the observation that interacting proteins co-evolve and have patterns of co-occurrence across organisms and b) those extracting features from protein sequences and exploiting machine learning techniques to build an abstract model of functionally related proteins. Evolutionary-based methods rely on the organisms of constructing the co-occurrence patterns and have been shown not suitable for eukaryotes. On the other hand, the machine learning-based methods suffer a high time complexity and might take months or years to perform an organism-wide analysis.
The proposed functionally related proteins predictor is a two-stage framework combining the two different groups of techniques. In the first stage, we used the score vector of a protein to a set of organisms as the occurrence pattern of that protein, where the score of a protein to an organism is defined as the highest bit score of the protein to proteins in the organism using PSI-BLAST. Protein pairs with similar score vectors were considered as functionally related and submitted to the second stage. In the second stage, the frequencies of conjoint triads of the two protein sequences were used to represent a protein pair, where a conjoint triad is a permutation of three continuous amino acids. These feature vectors were sent to a relaxed variable kernel density estimator for the second stage prediction. Experimental results show that the proposed method delivered 84% correct predictions in the top 50 highest confidence ones. This performance was much better than the five compared methods. This advantage was observed consistently in the top 100, 200, …, and 2000 predictions.
[1] E. Snitkin, et al., "Comparative assessment of performance and genome dependence among phylogenetic profiling methods," BMC bioinformatics, vol. 7, p. 420, 2006.
[2] H. Ge, et al., "Integrating 'omic' information: a bridge between genomics and systems biology," TRENDS in Genetics, vol. 19, pp. 551-560, 2003.
[3] V. Colizza, et al., "Characterization and modeling of protein¡Vprotein interaction networks," Physica A: Statistical Mechanics and its Applications, vol. 352, pp. 1-27, 2005.
[4] S. Fields and O. Song, "A novel genetic system to detect protein protein interactions," Nature, vol. 340, pp. 245-246, 1989.
[5] A. Gavin, et al., "Proteome survey reveals modularity of the yeast cell machinery," Nature, vol. 440, pp. 631-636, 2006.
[6] H. Zhu, et al., "Global analysis of protein activities using proteome chips," Science, vol. 293, p. 2101, 2001.
[7] B. Shoemaker and A. Panchenko, "Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners," PLoS Comput Biol, vol. 3, p. e43, 2007.
[8] V. Ruano-Rubio, et al., "Comparison of eukaryotic phylogenetic profiling approaches using species tree aware methods," BMC bioinformatics, vol. 10, p. 383, 2009.
[9] T. Gambin and K. Walczak, "A new classification method using array Comparative Genome Hybridization data, based on the concept of Limited Jumping Emerging Patterns," BMC bioinformatics, vol. 10, p. S64, 2009.
[10] P. Kensche, et al., "Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution," Journal of the Royal Society Interface, vol. 5, p. 151, 2008.
[11] S. Singh and D. Wall, "Testing the accuracy of eukaryotic phylogenetic profiles for prediction of biological function," Evolutionary Bioinformatics Online, vol. 4, p. 217, 2008.
[12] A. Van Dijk, et al., "Predicting and understanding transcription factor interactions based on sequence level determinants of combinatorial control," Bioinformatics, vol. 24, p. 26, 2008.
[13] L. Fokkens and B. Snel, "Cohesive versus flexible evolution of functional modules in eukaryotes," PLoS Computational Biology, vol. 5, 2009.
[14] B. Lee and D. Kim, "A new method for revealing correlated mutations under the structural and functional constraints in proteins," Bioinformatics, vol. 25, p. 2506, 2009.
[15] J. Gilley and M. Coleman, "Endogenous Nmnat2 Is an Essential Survival Factor for Maintenance of Healthy Axons," 2010.
[16] S. Gong and T. Blundell, "Structural and Functional Restraints on the Occurrence of Single Amino Acid Variations in Human Proteins," 2010.
[17] P. Bourne, "STRUCTURA L BIOINFORMATIC S."
[18] 蛋白質結構. Available: http://www.bio.fju.edu.tw/excel/content05/html/33.htm
[19] ATP. Available: http://upload.wikimedia.org/wikipedia/commons/0/07/ATP_structure.svg
[20] KEGG PATHWAY. Available: http://www.genome.jp/dbget-bin/www_bget?map00630
[21] 王. 劉. 曾文慶, "生化反應路徑資料庫的內涵式查詢."
[22] E. Marcotte, et al., "Detecting protein function and protein-protein interactions from genome sequences," Science, vol. 285, p. 751, 1999.
[23] A. Enright, et al., "Protein interaction maps for complete genomes based on gene fusion events," Nature, vol. 402, pp. 86-90, 1999.
[24] F. Enault, et al., "Annotation of bacterial genomes using improved phylogenomic profiles," Bioinformatics-Oxford, vol. 19, pp. 105-107, 2003.
[25] A. Walhout, et al., "Protein interaction mapping in C. elegans using proteins involved in vulval development," Science, vol. 287, p. 116, 2000.
[26] A. Ramani and E. Marcotte, "Exploiting the co-evolution of interacting proteins to discover interaction specificity," Journal of Molecular Biology, vol. 327, pp. 273-284, 2003.
[27] R. Jothi, et al., "Co-evolutionary analysis of domains in interacting proteins reveals insights into domain¡Vdomain interactions mediating protein¡Vprotein interactions," Journal of Molecular Biology, vol. 362, pp. 861-875, 2006.
[28] C. Goh, et al., "Co-evolution of proteins with their interaction partners," Journal of Molecular Biology, vol. 299, pp. 283-293, 2000.
[29] J. Bock and D. Gough, "Predicting protein-protein interactions from primary structure," Bioinformatics, vol. 17, p. 455, 2001.
[30] J. Shen, et al., "Predicting protein¡Vprotein interactions based only on sequences information," Proceedings of the National Academy of Sciences, vol. 104, p. 4337, 2007.
[31] Y. Oyang, et al., "Data classification with radial basis function networks based on a novel kernel density estimation algorithm," IEEE Transactions on Neural Networks, vol. 16, pp. 225-236, 2005.
[32] S. Altschul, et al., "Basic local alignment search tool," Journal of Molecular Biology, vol. 215, pp. 403-410, 1990.
[33] A. Goffeau, et al., "Life with 6000 genes," Science, vol. 274, p. 546, 1996.
[34] J. Cherry, et al., "SGD: Saccharomyces genome database," Nucleic acids research, vol. 26, p. 73, 1998.
[35] H. Mewes, et al., "MIPS: a database for genomes and protein sequences," Nucleic acids research, vol. 27, p. 44, 1999.
[36] J. Zhu and M. Zhang, "SCPD: a promoter database of the yeast Saccharomyces cerevisiae," Bioinformatics, vol. 15, p. 607, 1999.
[37] A. Bairoch, et al., "The universal protein resource (UniProt)," Nucleic acids research, vol. 33, p. D154, 2005.
[38] E. Wingender, et al., "TRANSFAC: an integrated system for gene expression regulation," Nucleic acids research, vol. 28, p. 316, 2000.
[39] Y. Yuan, et al., "Prediction of interactiveness of proteins and nucleic acids based on feature selections," Molecular Diversity, pp. 1-7.
[40] C. Yu, et al., "Predicting protein-protein interactions in unbalanced data using the primary structure of proteins," BMC bioinformatics, vol. 11, p. 167, 2010.
[41] Y. Guo, et al., "Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences," Nucleic acids research, 2008.
[42] S. Date and E. Marcotte, "Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages," Nature biotechnology, vol. 21, pp. 1055-1062, 2003.
[43] J. Sun, et al., "Refined phylogenetic profiles method for predicting protein-protein interactions," Bioinformatics, vol. 21, p. 3409, 2005.