簡易檢索 / 詳目顯示

研究生: 林子文
Lin, Tzu-wen
論文名稱: 利用蛋白質所包含之調控特性來預測蛋白質間交互作用
Predicting protein-protein interactions based on the regulatory characteristic of the gene sequences of the protein pairs
指導教授: 張天豪
Chang, Tien-Hao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 44
中文關鍵詞: 系統發育譜機器學習蛋白質交互作用
外文關鍵詞: Phylogenetic profile, Machine learning, Protein-protein interaction
相關次數: 點閱:138下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 蛋白質與蛋白質間的交互作用(protein-protein interaction,PPI)在生物所表現的功能中扮演重要的角色,找出這些PPI有助於瞭解分子層級中生物系統的反應機制。時至今日,有許多蛋白質固有的特性(包括;蛋白質序列、結構、功能等)被用來預測PPI。然而,沒有關於調控特性(舉例:調控蛋白質基因的轉錄因素)是否影響PPI的直接研究。本研究分析基因的調控特性是否會對PPI有影響,並建立一個基於調控特性的預測模組來預測PPI。
    本研究進行了一系列對調控特性相關的完整測試,蒐集了8種不同的調控特性,並將其轉錄成12種特徵向量,包含:DNA 彎曲度、基因距離、基因大小、核小體佔有率、TATA 盒、轉錄因子結合證據、轉錄因子破壞證據以及轉錄因子結合位點相似度。實驗結果顯示,基因距離對預測蛋白質對之間是否有PPI有顯著效益,而且,依此方法對釀酒酵母菌(Saccharomyces cerevisiae)預測的結果較其他預測器優秀。
    本實驗是第一個探討調控特性對PPI影響的研究,而且證實了調控特性應該被考慮在特徵中而不該被忽略。加入調控特性的特徵模組有助於幫研究者找到未知的分子機制。最後,本研究為往後的研究提供了一個新的往調控特性前進的研究方向。

    Protein-protein interaction (PPIs) are essential to diverse biological processes. Elucidating these PPIs helps our understanding of the mechanisms of biological systems at the molecular level. Nowadays, various protein intrinsic features have been studied to predict PPIs. However, no studies have analyzed the regulatory features between two interacting proteins. This study aims to answer whether regulatory features preserve effects on PPIs after the gap from gene to protein as well as to build a regulatory feature-based prediction model to predict PPIs.
    This study has conducted a comprehensive analysis of regulatory features. It collected eight kinds of transcriptional characteristics and encoded them to 12 transcriptional features: DNA bendability, gene size, gene distance, nucleosome occupancy, TATA box information, TF binding and knockout information and eight regulatory similarities based on TFBS data. The experimental results show that gene distance, improved the prediction performance and indicate that these regulatory features did influence the PPI prediction after the gap from gene to protein. In Saccharomyces cerevisiae, our method’s prediction is better than other methods.
    This work is the first study to discuss the regulatory features in predicting PPIs and the results suggest this category of features must be considered in the future. The pro-posed new regulatory characteristic encoding method has been shown capable to identify whether two proteins have interaction. The constructed prediction model is helpful to discover the unknown molecular mechanisms of specific regulatory functions. Finally, this study leads the following works in related research topics to consider regulatory features.

    目錄 Chapter 1. 緒論 5 Chapter 2. 相關研究 7 2.1去氧核醣核酸(DNA) 7 2.2蛋白質 8 2.3蛋白質與蛋白質交互作用(PPI)的原理 9 2.4預測功能相關蛋白質的計算方法 11 2.4.1 基因群聚法 11 2.4.2 序列共同演化法 11 2.4.3 Rosetta Stone法 12 2.4.4 基於分類器法 12 2.5分類器介紹 13 2.5.1可變式核心密度估計(RVKDE)分類器 13 2.3.2 支援向量機(SVM)分類器 14 2.3.3比較兩個分類器差異 15 2.4資料庫介紹 16 2.6.1 資料庫BioGRID 16 2.6.2 資料庫SGD 16 2.6.3 資料庫YEASTRACT 17 Chapter 3. 資料集與方法 18 3.1 資料集 18 3.1.1 使用TATA盒(TATA box) 18 3.1.2使用DNA 彎曲度(bendability) 19 3.1.3使用核小體佔有率(Nucleosome occupancy) 19 3.1.4使用基因距離(Gene distance)與基因大小(Gene size)資料 20 3.1.5 使用轉錄因子結合基因證據與破壞(knockout)證據 20 3.1.6使用轉錄因子結合位點相似度 21 3.2 方法 21 3.2.1系統發育譜法 21 3.2.2 特徵編碼(Feature Encode) 23 Chapter 4. 實驗結果與討論分析 25 4.1判斷蛋白質基因序列性質對預測PPI之重要度 25 4.2 用選定蛋白質基因序列性質預測PPI 28 4.3 將選定蛋白質基因序列性質實際應用以提升預測器準確率 30 4.4 討論與分析 33 4.4.1分析PPI預測器中序列性質的最適權重 33 4.4.2討論基因距離的特殊性 38 Chapter 5. 結論與未來展望 40 5.1 結論 40 5.2 未來展望 40 參考文獻 41

    1. Alberts, B., The cell as a collection of protein machines: preparing the next generation of molecular biologists. Cell, 1998. 92(3): p. 291-4.
    2. Jones, S. and J.M. Thornton, Principles of protein-protein interactions. Proc Natl Acad Sci U S A, 1996. 93(1): p. 13-20.
    3. Choo, K.H., T.W. Tan, and S. Ranganathan, SPdb--a signal peptide database. BMC Bioinformatics, 2005. 6: p. 249.
    4. Ashburner, M., et al., Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000. 25(1): p. 25-9.
    5. Fields, S. and O.K. Song, A Novel Genetic System to Detect Protein Protein Interactions. Nature, 1989. 340(6230): p. 245-246.
    6. Ito, T., et al., A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences of the United States of America, 2001. 98(8): p. 4569-4574.
    7. Gavin, A.C., et al., Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 2002. 415(6868): p. 141-147.
    8. Ho, Y., et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature, 2002. 415(6868): p. 180-183.
    9. Gavin, A.C., et al., Proteome survey reveals modularity of the yeast cell machinery. Nature, 2006. 440(7084): p. 631-636.
    10. Zhu, H., et al., Global analysis of protein activities using proteome chips. Science, 2001. 293(5537): p. 2101-2105.
    11. Tong, A.H.Y., et al., A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science, 2002. 295(5553): p. 321-324.
    12. Bader, G.D., D. Betel, and C.W.V. Hogue, BIND: the Biomolecular Interaction Network Database. Nucleic Acids Research, 2003. 31(1): p. 248-250.
    13. von Mering, C., et al., STRING: a database of predicted functional associations between proteins. Nucleic Acids Res, 2003. 31(1): p. 258-61.
    14. Salwinski, L., et al., The Database of Interacting Proteins: 2004 update. Nucleic Acids Research, 2004. 32: p. D449-D451.
    15. Guldener, U., et al., MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Research, 2006. 34: p. D436-D441.
    16. Stark, C., et al., BioGRID: a general repository for interaction datasets. Nucleic Acids Res, 2006. 34(Database issue): p. D535-9.
    17. Kerrien, S., et al., IntAct - open source resource for molecular interaction data. Nucleic Acids Research, 2007. 35: p. D561-D565.
    18. Keshava Prasad, T.S., et al., Human Protein Reference Database--2009 update. Nucleic Acids Res, 2009. 37(Database issue): p. D767-72.
    19. Matthews, L., et al., Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res, 2009. 37(Database issue): p. D619-22.
    20. Licata, L., et al., MINT, the molecular interaction database: 2012 update. Nucleic Acids Res, 2012. 40(Database issue): p. D857-61.
    21. Shoemaker, B.A. and A.R. Panchenko, Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS Computational Biology, 2007. 3(4): p. 595-601.
    22. Barkai, I.T.a.N., Two strategies for gene regulation by promoter nucleosomes. Genome Research, 2008. 18(1084-1901).
    23. Itay Tirosh, J.B.a.N.B., The pattern and evolution of yeast promoter bendability Trends Genet, 2007. 23: p. 318–321.
    24. Lin, Z., Wu,W.S., Liang,H., Woo,Y. and Li,W.H., The spatial distribution of cis regulatory elements in yeast promotersand its implications for transcriptional regulation. BMC Genomics, 2010. 11: p. 581.
    25. Bartel, D.P., MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell, 2004. 116(2): p. 281-297.
    26. Young, K.H., Yeast two-hybrid: so many interactions, (in) so little time. Biol Reprod, 1998. 58(2): p. 302-11.
    27. Dandekar, T., et al., Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci, 1998. 23(9): p. 324-8.
    28. Huynen, M., et al., Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res, 2000. 10(8): p. 1204-10.
    29. Teichmann, S.A. and M.M. Babu, Conservation of gene co-regulation in prokaryotes and eukaryotes. Trends Biotechnol, 2002. 20(10): p. 407-10; discussion 410.
    30. Goh, C.S., et al., Co-evolution of proteins with their interaction partners. J Mol Biol, 2000. 299(2): p. 283-93.
    31. Pazos, F. and A. Valencia, Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng, 2001. 14(9): p. 609-14.
    32. Fariselli, P., et al., Prediction of protein--protein interaction sites in heterocomplexes with neural networks. Eur J Biochem, 2002. 269(5): p. 1356-61.
    33. Qi, Y., J. Klein-Seetharaman, and Z. Bar-Joseph, Random forest similarity for protein-protein interaction prediction from multiple sources. Pac Symp Biocomput, 2005: p. 531-42.
    34. Shen, J., et al., Predicting protein-protein interactions based only on sequences information. Proc Natl Acad Sci U S A, 2007. 104(11): p. 4337-41.
    35. Yu, C.Y., L.C. Chou, and D.T. Chang, Predicting protein-protein interactions in unbalanced data using the primary structure of proteins. BMC Bioinformatics, 2010. 11: p. 167.
    36. Yen-Jen, O., et al. Data classification with a relaxed model of variable kernel density estimation. in Neural Networks, 2005. IJCNN '05. Proceedings. 2005 IEEE International Joint Conference on. 2005.
    37. Artin, E., The Gamma Function. 1964, New York: Holt, Rinehart and Winston.
    38. Vapnik, C.C.a.V., Support vector machine. pp, 1995. 20: p. 273-297.
    39. J. Michael Cherry*, C.A., Catherine Ball, Stephen A. Chervitz, Selina S. Dwight, Erich T. Hester, Yankai Jia, Gail Juvik, TaiYun Roe, Mark Schroeder, Shuai Weng and David Botstein SGD: Saccharomyces Genome Database. Nucleic Acids Research, 1998. 26: p. 73-79.
    40. Miguel C. Teixeira, P.M., Pooja Jain, Sandra Tenreiro,Alexandra R. Fernandes, Nuno P. Mira, Marta Alenquer, Ana T. Freitas, Arlindo L. Oliveira and Isabel Sa´-Correia, The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Nucleic Acids Research, 2006. 34: p. D446-D451.
    41. Lifton RP, G.M., Karp RW, Hogness DS, The organization of the histone genes in Drosophila melanogaster: functional and evolutionary implications. Cold Spring Harb Symp Quant Biol 1978. 42: p. 1047-1051.
    42. Pugh, F., Transcription from a TATA-Iess promoter requires a multisubunit TFIID complex. GENES & DEVELOPMEN, 2011. 5: p. 1935-1945.
    43. Russell, P., ed. iGenetics. 2001.
    44. Brukner, I., Sa´ nchez,R., Suck,D. and Pongor,S, Sequence-dependent bending propensity of DNA as revealed by DNase I: parameters for trinucleotides. EMBO 1995. 14: p. 1812-1818.
    45. Luger K, M.A., Richmond RK, Sargent DF, Richmond TJ Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature, 1997. 389(6648): p. 251.
    46. Noam Kaplan, I.K.M., Yvonne Fondufe-Mittendorf, Andrea J. Gossett, Desiree Tillo, Yair Field1, Emily M. LeProust, Timothy R. Hughes, Jason D. Lieb, Jonathan Widom and Eran Segal, The DNA-encoded nucleosome organization of a eukaryotic genome. Nature, 2009. 458: p. 362-366.
    47. Pedro T. Monteiro, N.D.M., Miguel C. Teixeira, Sofia d’Orey,Sandra Tenreiro, Nuno P. Mira, He´ lio Pais, Alexandre P.Francisco, Alexandra M. Carvalho, Artur B. Lourenc¸ o, Isabel Sa´ -Correia, Arlindo L. Oliveira and Ana T. Freitas, YEASTRACT-DISCOVERER: new tools to improve the analysis of transcriptional regulatory associations in Saccharomyces cerevisiae. Nucleic Acids Res, 2007. 36: p. D132–D136.
    48. Lee TI, Y.R., Transcription of eukaryotic protein-coding genes. Annual Review of Genetics, 2000. 34: p. 77-137.
    49. Kenzie D MacIsaac, T.W., D Benjamin Gordon, David K Gifford, Gary D Stormo and Ernest Fraenkel, An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics, 2006. 7: p. 113.
    50. De Santis, M., et al., Combining optimization and machine learning techniques for genome-wide prediction of human cell cycle-regulated genes. Bioinformatics, 2014. 30(2): p. 228-33.
    51. Garten, Y., S. Kaplan, and Y. Pilpel, Extraction of transcription regulatory signals from genome-wide DNA–protein interaction data. Nucleic Acids Research, 2005. 33(2): p. 605-615.
    52. Kim, R.S., H. Ji, and W.H. Wong, An improved distance measure between the expression profiles linking co-expression and co-regulation in mouse. BMC Bioinformatics, 2006. 7(1): p. 44.
    53. Veerla, S. and M. Höglund, Analysis of promoter regions of co-expressed genes identified by microarray analysis. BMC Bioinformatics, 2006. 7(1): p. 384.
    54. Shalgi, R., et al., Global and local architecture of the mammalian microRNA-transcription factor regulatory network. PLoS Comput Biol, 2007. 3(7): p. e131.
    55. Sexton, T., F. Bantignies, and G. Cavalli. Genomic interactions: chromatin loops and gene meeting points in transcriptional regulation. in Seminars in cell & developmental biology. 2009. Elsevier.
    56. Schoenfelder, S., I. Clay, and P. Fraser, The transcriptional interactome: gene expression in 3D. Current opinion in genetics & development, 2010. 20(2): p. 127-133.
    57. Schleif, R., DNA looping. Annual review of biochemistry, 1992. 61(1): p. 199-223.
    58. Grimes, D.A. and K.F. Schulz, Refining clinical diagnosis with likelihood ratios. Lancet, 2005. 365(9469): p. 1500-5.
    59. Lin, T.W., J.W. Wu, and D.T. Chang, Combining phylogenetic profiling-based and machine learning-based techniques to predict functional related proteins. PLoS One, 2013. 8(9): p. e75940.
    60. Tan-Wong, S.M., et al., Gene loops enhance transcriptional directionality. Science, 2012. 338(6107): p. 671-5.

    無法下載圖示 校內:2018-08-27公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE