簡易檢索 / 詳目顯示

研究生: 簡嘉姸
Chien, Chia-Yen
論文名稱: 相對距離蛋白質指紋演算法(RD-PFP)結合機器學習尋找DNA擬態蛋白質
Relative Distance Protein Fingerprint (RD-PFP) Algorithm Combined with Machine Learning for Searching DNA Mimic Proteins
指導教授: 謝孫源
Hsieh, Sun-Yuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 59
中文關鍵詞: 機器學習生物資訊DNA 擬態蛋白質相對距離蛋白質指紋智慧計算
外文關鍵詞: Machine learning, Bioinformatic, DNA mimic proteins, Relative distance (RD), Protein fingerprint, Intelligent computing
相關次數: 點閱:142下載:23
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • DNA 擬態蛋白質是較鮮為人知的控制因子,與 DNA 有一些相似之處。 他們通過使用帶負電荷的氨基酸,即天冬氨酸(ASP 或 D)和谷氨酸(GLU 或 E)來模擬DNA 磷酸鹽主鏈的負電荷分佈。 已知的 DNA 擬態蛋白質通過干預 DNA 與效應蛋白的結合來控制各種細胞機制,例如轉錄、DNA 修復和基因調控。 除了具有重要的功能外,DNA 模擬蛋白還具有用於生物技術應用的潛力。 例如,來自單增李斯特菌原噬菌體的 DNA 擬態蛋白質 AcrIIA4 可以通過控制 CRISPR-Cas9 的活性來提高基因編輯的準確性。 因此,DNA 擬態蛋白質值得做進一步的研究。 然而,由於其獨特的氨基酸序列和結構特徵,大多數 DNA 擬態蛋白質無法使用傳統的生物信息學方法進行識別。 因此,我們開發了一種新的蛋白質指紋,稱為相對距離蛋白質指紋 (RD-PFP),可用於分析蛋白質表面氨基酸的分佈。 我們通過使用機器學習和DNA 擬態蛋白質的特徵(即它們的與 DNA 磷酸鹽骨架相似的負電荷分佈)優化了我們的 RD-PFP,以更準確地預測蛋白質序列的 DNA 模擬。 我們的開創性研究有助於開發基於機器學習的生物信息學方法來篩選 DNA 擬態蛋白質。

    DNA mimic proteins are relatively obscure control factors that share some similarities with DNA. They emulate the negative charge distribution of the phosphate backbone of DNA by using negatively charged amino acids, namely, aspartic acids (ASP or D) and glutamic acids (GLU or E). Known DNA mimic proteins control various cellular mechanisms, such as transcription, DNA repair, and gene regulation, by intervening in the binding of DNA to effector proteins. In addition to being functionally important, DNA mimic proteins have potential for use in biotechnological applications. For example, the DNA mimic protein AcrIIA4 from Listeria monocytogenes prophages can improve the accuracy of gene editing by
    controlling the activity of CRISPR-Cas9. Therefore, DNA mimic proteins warrant further research. However, most DNA mimic proteins cannot be identified using traditional bioinformatics methods owing to their unique amino acid sequences and structural features. Accordingly, we developed a new protein fingerprint, called relative distance protein fingerprint (RDPFP), that can be used to analyze the distribution of amino acids on a protein surface. We optimized our RD-PFP by using machine learning and the characteristic feature of DNA mimic proteins (namely, their DNA-like negative charge distribution) to more accurately predict DNA mimicry from protein sequences. Our pioneering study contributes to the development of machine learning--based bioinformatics methods for screening DNA mimic proteins.

    摘要 i Abstract ii 誌謝 iii Contents v List of Tables vii List of Figures viii Chapter 1. Introduction 1 Chapter2. Method 7 2.1 Flow of Method 7 2.2 Datasets Preparation 9 2.3 Extraction of Basic Features 12 2.4 RD-PFP Algorithm 16 2.5 Model Candidates 18 2.6 Metrics 19 Chapter 3. Experiments and Results 21 3.1 Data Distribution 22 3.2 Fingerprint Generation 23 3.2.1 Baseline RD-PFP 24 3.2.2 RD-PFP Combination 26 3.2.3 RD-PFP Optimization 30 3.3 Ideal RD-PFP and Basic Features Performance 33 3.4 Protein Prediction 35 Chapter 4. Discussion 39 Chapter 5. Conclusion 48 References 50 Appendix A. 58

    [1] David TF Dryden. Dna mimicry by proteins and the control of enzymatic activity on dna. Trends in biotechnology, 24(8):378–382, 2006.
    [2] Christopher D Putnam and John A Tainer. Protein mimicry of dna and pathway regulation. DNA repair, 4(12):1410–1420, 2005.
    [3] Christopher D Putnam and John A Tainer. Protein mimicry of dna and pathway regulation. DNA repair, 4(12):1410–1420, 2005.
    [4] Hao-Ching Wang, Chia-Cheng Chou, Kai-Cheng Hsu, Chi-Hua Lee, and Andrew H-J Wang. New paradigm of functional regulation by dna mimic proteins: recent updates. IUBMB life, 71(5):539–548, 2019.
    [5] MD Walkinshaw, P Taylor, SS Sturrock, C Atanasiu, T Berge, Robert M Henderson, JM Edwardson, and DTF Dryden. Structure of ocr from bacteriophage t7, a protein that mimics b-form dna. Molecular cell, 9(1):187–194, 2002
    [6] Artem Isaev, Alena Drobiazko, Nicolas Sierro, Julia Gordeeva, Ido Yosef, Udi Qimron, Nikolai V Ivanov, and Konstantin Severinov. Phage t7 dna mimic protein ocr is a potent inhibitor of brex defence. Nucleic acids research, 48(10):5397–5406, 2020.
    [7] Hao-Ching Wang, Kai-Cheng Hsu, Jinn-Moon Yang, Mao-Lun Wu, Tzu-Ping Ko, Shen-Rong Lin, and Andrew H-J Wang. Staphylococcus aureus protein saugi acts as a uracil-dna glycosylase inhibitor. Nucleic acids research, 42(2):1354–1364, 2013.
    [8] Hao-Ching Wang, Chun-Han Ho, Chia-Cheng Chou, Tzu-Ping Ko, Ming-Fen Huang, Kai-Cheng Hsu, and Andrew H-J Wang. Using structural-based protein engineering to modulate the differential inhibition effects of saugi on human and hsv uracil dna glycosylase. Nucleic Acids Research, 44(9):4440–4449, 2016.
    [9] Yi-Ting Liao, Shin-Jen Lin, Tzu-Ping Ko, Chang-Yi Liu, Kai-Cheng Hsu, and Hao-Ching Wang. Structural insight into the differential interactions between the dna mimic protein saugi and two gamma herpesvirus uracil-dna glycosylases. International Journal of Biological Macromolecules, 160:903–914, 2020.
    [10] Joshua P Ramsay. Replicating methicillin resistance? Nature Structural & Molecular Biology, 23(10):874–875, 2016.
    [11] Hao-Ching Wang, Tzu-Ping Ko, Mao-Lun Wu, Shan-Chi Ku, Hsing-Ju Wu, and Andrew H-J Wang. Neisseria conserved protein dmp19 is a dna mimic protein that prevents dna binding to a hypothetical nitrogen-response transcription factor. Nucleic acids research, 40(12):5718–5730, 2012.
    [12] Ming-Fen Huang, Shin-Jen Lin, Tzu-Ping Ko, Yi-Ting Liao, Kai-Cheng Hsu, and Hao-Ching Wang. The monomeric form of neisseria dna mimic protein dmp19 prevents dna from binding to the histone-like hu protein. Plos one, 12(12):e0189461, 2017.
    [13] Ying Wu, Xiaohong Zhou, Christopher O Barnes, Maria DeLucia, Aina E Cohen, Angela M Gronenborn, Jinwoo Ahn, and Guillermo Calero. The ddb1–dcaf1–vpr–ung2 crystal structure reveals how hiv-1 vpr steers human ung2 toward destruction. Nature structural & molecular biology, 23(10):933–940, 2016.
    [14] Subray S Hegde, Matthew W Vetting, Steven L Roderick, Lesley A Mitchenall, Anthony Maxwell, Howard E Takiff, and John S Blanchard. A fluoroquinolone resistance protein from mycobacterium tuberculosis that mimics dna. Science, 308(5727):1480–1483, 2005.
    [15] Guoqiang Zhang, Wenzhao Wang, Aihua Deng, Zhaopeng Sun, Yun Zhang, Yong Liang, Yongsheng Che, and Tingyi Wen. A mimicking-of-dna-methylation-patterns pipeline for overcoming the restriction barrier of bacteria. 2012.
    [16] Jiyung Shin, Fuguo Jiang, Jun-Jie Liu, Nicolas L Bray, Benjamin J Rauch, Seung Hyun Baik, Eva Nogales, Joseph Bondy-Denomy, Jacob E Corn, and Jennifer A Doudna. Disabling cas9 by an anti-crispr dna mimic. Science advances, 3(7):e1701620, 2017.
    [17] Adrià Cereto-Massagué, María José Ojeda, Cristina Valls, Miquel Mulero, Santiago Garcia-Vallvé, and Gerard Pujadas. Molecular fingerprint similarity search in virtual screening. Methods, 71:58–63, 2015.
    [18] David Rogers and Mathew Hahn. Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754, 2010
    [19] Peter Willett. Fusing similarity rankings in ligand-based virtual screening. Computational and structural biotechnology journal, 5(6):e201302002, 2013.
    [20] Myungwon Seo, Hyun Kil Shin, Yoochan Myung, Sungbo Hwang, and Kyoung Tai No. Development of natural compound molecular fingerprint (nc-mfp) with the dictionary of natural products (dnp) for natural product-based drug development. Journal of Cheminformatics, 12(1):1–17, 2020.
    [21] Jianlin Cheng, Michael J Sweredoski, and Pierre Baldi. Accurate prediction of protein disordered regions by mining protein structure data. Data mining and knowledge discovery, 11:213–222, 2005.
    [22] George Shackelford and Kevin Karplus. Contact prediction using mutual information and neural nets. Proteins: Structure, Function, and Bioinformatics, 69(S8):159–164, 2007.
    [23] Dongsup Kim, Dong Xu, Jun-tao Guo, Kyle Ellrott, and Ying Xu. Prospect ii: protein structure prediction program for genome-scale applications. Protein engineering, 16(9):641–650, 2003.
    [24] Lisa M Parsons, Deok Cheon Yeh, and John Orban. Solution structure of the highly acidic protein hi1450 from haemophilus influenzae, a putative double-stranded dna mimic. Proteins: Structure, Function, and Bioinformatics, 54(3):375–383, 2004.
    [25] Robert Court, Nicola Cook, Kayarat Saikrishnan, and Dale Wigley. The crystal structure of ?-gam protein suggests a model for recbcd inhibition. Journal of molecular biology, 371(1):25–33, 2007.
    [26] Chun-Han Ho, Hao-Ching Wang, Tzu-Ping Ko, Yuan-Chih Chang, and Andrew H-J Wang. The t4 phage dna mimic protein arn inhibits the dna binding activity of the bacterial histone-like protein h-ns. Journal of Biological Chemistry, 289(39):27046–27054, 2014.
    [27] Mahua Ghosh, Gregor Meiss, Alfred M Pingoud, Robert E London, and Lars C Pedersen. The nuclease a-inhibitor complex is characterized by a novel metal ion bridge. Journal of Biological Chemistry, 282(8):5682–5690, 2007.
    [28] Tai Wei Guo, Alberto Bartesaghi, Hui Yang, Veronica Falconieri, Prashant Rao, Alan Merk, Edward T Eng, Ashleigh M Raczkowski, Tara Fox, Lesley A Earl, et al. Cryo-em structures reveal mechanism and inhibition of dna targeting by a crispr-cas surveillance complex. Cell, 171(2):414–426, 2017.
    [29] Benjamin E Ramirez, Ad Bax, Oleg N Voloshin, and R Daniel Camerini-otero. Solution structure of dini provides insight into its mode of reca inactivation. Protein Science, 9(11):2161–2169, 2000.
    [30] Juan Luis Asensio, Laura Perez-Lago, Jose M Lazaro, Carlos Gonzalez, Gemma Serrano-Heras, and Margarita Salas. Novel dimeric structure of phage ?29-encoded protein p56: insights into uracil-dna glycosylase inhibition. Nucleic acids research, 39(22):9779–9788, 2011.
    [31] Saikat Chowdhury, Joshua Carter, MaryClare F Rollins, Sarah M Golden, Ryan N Jackson, Connor Hoffmann, Joseph Bondy-Denomy, Karen L Maxwell, Alan R Davidson, Elizabeth R Fischer, et al. Structure reveals mechanisms of viral suppressors that intercept a crispr rna-guided surveillance complex. Cell, 169(1):47–57, 2017.
    [32] De Dong, Minghui Guo, Sihan Wang, Yuwei Zhu, Shuo Wang, Zhi Xiong, Jianzheng Yang, Zengliang Xu, and Zhiwei Huang. Structural basis of crispr–spycas9 inhibition by an anti-crispr protein. Nature, 546(7658):436–439, 2017.
    [33] Christopher D Putnam, Mary Jane N Shroyer, Amy J Lundquist, Clifford D Mol, Andrew S Arvai, Dale W Mosbaugh, and John A Tainer. Protein mimicry of dna from crystal structures of the uracil-dna glycosylase inhibitor protein and its complex with escherichia coli uracil-dna glycosylase. Journal of molecular biology, 287(2):331–346, 1999.
    [34] Hao-Ching Wang, Han-Ching Wang, Tzu-Ping Ko, Yu-May Lee, Jiann-Horng Leu, Chun-Han Ho, Wei-Pang Huang, Chu-Fang Lo, and Andrew H-J Wang. White spot syndrome virus protein icp11: A histone-binding dna mimic that disrupts nucleosome assembly. Proceedings of the National Academy of Sciences, 105(52):20758–20763, 2008.
    [35] Ashley T Tucker, Benjamin G Bobay, Allison V Banse, Andrew L Olson, Erik J Soderblom, M Arthur Moseley, Richele J Thompson, Kristen M Varney, Richard Losick, and John Cavanagh. A dna mimic: The structure and mechanism of action for the anti-repressor protein abba. Journal of molecular biology, 426(9):1911–1924, 2014.
    [36] Ashley T Tucker, Benjamin G Bobay, Allison V Banse, Andrew L Olson, Erik J Soderblom, M Arthur Moseley, Richele J Thompson, Kristen M Varney, Richard Losick, and John Cavanagh. A dna mimic: The structure and mechanism of action for the anti-repressor protein abba. Journal of molecular biology, 426(9):1911–1924, 2014.
    [37] Stephen A McMahon, Gareth A Roberts, Kenneth A Johnson, Laurie P Cooper, Huanting Liu, John H White, Lester G Carter, Bansi Sanghvi, Muse Oke, Malcolm D Walkinshaw, et al. Extensive dna mimicry by the arda anti-restriction protein and its role in the spread of antibiotic resistance. Nucleic acids research, 37(15):4887–4897, 2009.
    [38] Hao-Ching Wang, Mao-Lun Wu, Tzu-Ping Ko, and Andrew H-J Wang. Neisseria conserved hypothetical protein dmp12 is a dna mimic that binds to histone-like hu protein. Nucleic acids research, 41(9):5127–5138, 2013.
    [39] J. Ross Quinlan. Induction of decision trees. Machine learning, 1:81–106, 1986.
    [40] Leo Breiman. Random forests. Machine learning, 45:5–32, 2001.
    [41] Gongde Guo, Hui Wang, David Bell, Yaxin Bi, and Kieran Greer. Knn model-based approach in classification. In On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE: OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, November 3-7, 2003. Proceedings, pages 986–996. Springer, 2003.
    [42] Joachims, Thorsten. Making large-scale SVM learning practical. No. 1998, 28. Technical report, 1998.
    [43] Davide Chicco, Matthijs J Warrens, and Giuseppe Jurman. The matthews correlation coefficient (mcc) is more informative than cohen’s kappa and brier score in binary classification assessment. IEEE Access, 9:78368–78381, 2021.
    [44] Allen, David M. "Mean square error of prediction as a criterion for selecting variables." Technometrics 13.3 (1971): 469-475.
    [45] Pishro-Nik, Hossein. "Introduction to probability, statistics, and random processes." (2016).
    [46] Schluchter, Mark D. "Mean square error." Encyclopedia of Biostatistics 5 (2005).
    [47] Asratian, Armen S., Tristan MJ Denley, and Roland Häggkvist. Bipartite graphs and their applications. Vol. 131. Cambridge university press, 1998.
    [48] Scheinerman, Edward A. Mathematics: a discrete introduction. Cengage Learning, 2012.
    [49] Blum, Norbert. A simplified realization of the Hopcroft Karp approach to maximum matching in general graphs. Vol. 19. Inst. für Informatik, 1999.
    [50] Tutte, William Thomas, and William Thomas Tutte. Graph theory. Vol. 21. Cambridge university press, 2001.
    [51] Bengoetxea, Endika. Inexact graph matching using estimation of distribution algorithms. Diss. PhD thesis, Ecole Nationale Supérieure des Télécommunications, Paris, France, 2002.
    [52] Raina, Satish, and Dominique Missiakas. "Making and breaking disulfide bonds." Annual review of microbiology 51.1 (1997): 179-202.
    [53] Ferrière, Régis, Ulf Dieckmann, and Denis Couvet, eds. Evolutionary conservation biology. Vol. 4. Cambridge University Press, 2004.
    [54] Takeda, Y., et al. "DNA-binding proteins." Science 221.4615 (1983): 1020-1026.
    [55] Tate, Peri H., and Adrian P. Bird. "Effects of DNA methylation on DNA-binding proteins and gene expression." Current opinion in genetics & development 3.2 (1993): 226-231.
    [56] Ren, Bing, et al. "Genome-wide location and function of DNA binding proteins." science 290.5500 (2000): 2306-2309.
    [57] Halford, Stephen E., and John F. Marko. "How do site‐specific DNA‐binding proteins find their targets?." Nucleic acids research 32.10 (2004): 3040-3052.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE