| 研究生: |
鄭宇翔 Cheng, Yu-Hsiang |
|---|---|
| 論文名稱: |
將計算工具整合至統一平台以研究蛋白質相分離與小分子藥物篩選 Integrating computational tools into unified platforms for studying protein phase separation and screening small molecule drugs |
| 指導教授: |
王淑鶯
Wang, Shu-Ying |
| 學位類別: |
碩士 Master |
| 系所名稱: |
醫學院 - 微生物及免疫學研究所 Department of Microbiology & Immunology |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 123 |
| 中文關鍵詞: | 圖形使用者介面(GUI) 、液-液相分離(LLPS) 、光漂白後螢光回復(FRAP) 、神經網路 、虛擬藥物篩選 |
| 外文關鍵詞: | graphical user interface (GUI), liquid-liquid phase separation (LLPS), fluorescent recovery after photobleaching (FRAP), neural network, virtual drug screening |
| 相關次數: | 點閱:8 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
雖然計算方法能加速並簡化各種實驗流程,但是由於將實驗問題轉化為電腦方法較困難,這些方法在以實驗為主的實驗室中常常未被充分利用。因此,本研究開發了三個計算工具,用以針對特定的實驗需求來填補實驗與計算之間的代溝。這些工具以簡便易用為設計原則,鼓勵其在實驗室流程中被廣泛應用,特別是在液-液相分離(LLPS)—一種新興且參與多種細胞反應調控的蛋白質機制—以及藥物開發。這三個程式分別為:1. FRAPy:一款安全且一站式的工具用於分析光漂白後螢光回復(FRAP)數據;2. PhaseNet:一個快速且高通量的神經網絡,用於預測蛋白質進行LLPS的傾向;3. MolDocker:一個簡化分子對接流程的整合平台,用於加速從電腦運算到實驗證實的藥物發現過程。這些程式展示了專門化的計算工具如何無縫的整合進實驗研究中,提升實驗效率與洞察力。
Although computational methods can accelerate and simplify various experimental workflows, they are often underutilized in experiment-focused labs due to the challenges in translating experimental problems into computational solutions. Therefore, three programs were developed in this study that aim to bridge this gap by addressing specific experimental needs. These tools were designed with simplicity in mind to encourage their adaptation into laboratory workflows, particularly in the contexts of liquid-liquid phase separation (LLPS)—an emerging protein mechanism involved in regulating diverse cellular responses—and drug development. The three programs are: 1.) FRAPy, a secure, end-to-end tool for analyzing fluorescence recovery after photobleaching data, 2.) PhaseNet, a fast and scalable neural network for predicting protein LLPS propensity, and 3.) MolDocker, a unified platform for streamlining molecular docking workflows to accelerate drug discovery from computation to experimentation. These programs demonstrate how specialized computational tools can be seamlessly integrated into experimental research to enhance efficiency and insight.
1 Krizhevsky, A., Sutskever, I. & Hinton, G. E. (eds F. Pereira, C. J. Burges, L. Bottou, & K. Q. Weinberger) (Curran Associates, Inc.).
2 He, K., Zhang, X., Ren, S. & Sun, J. in Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.
3 Brown, T. et al. Language models are few-shot learners. Advances in neural information processing systems 33, 1877-1901 (2020).
4 Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. in Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 4171-4186.
5 Vaswani, A. et al. Attention is all you need. Advances in neural information processing systems 30 (2017).
6 Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583-589 (2021). https://doi.org:10.1038/s41586-021-03819-2
7 Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493-500 (2024). https://doi.org:10.1038/s41586-024-07487-w
8 Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024). https://doi.org:doi:10.1126/science.adl2528
9 Heid, E. et al. Chemprop: A Machine Learning Package for Chemical Property Prediction. Journal of Chemical Information and Modeling 64, 9-17 (2024). https://doi.org:10.1021/acs.jcim.3c01250
10 Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616-624 (2023). https://doi.org:10.1038/s41586-023-06139-9
11 Nguyen, E. et al. Sequence modeling and design from molecular to genome scale with Evo. Science 386, eado9336 (2024). https://doi.org:doi:10.1126/science.ado9336
12 Wang, B. et al. Liquid–liquid phase separation in human health and diseases. Signal Transduction and Targeted Therapy 6, 290 (2021). https://doi.org:10.1038/s41392-021-00678-1
13 Niu, X. et al. Biomolecular condensates: Formation mechanisms, biological functions, and therapeutic targets. MedComm 4, e223 (2023). https://doi.org:https://doi.org/10.1002/mco2.223
14 Lyon, A. S., Peeples, W. B. & Rosen, M. K. A framework for understanding the functions of biomolecular condensates across scales. Nature Reviews Molecular Cell Biology 22, 215-235 (2021). https://doi.org:10.1038/s41580-020-00303-z
15 Zhang, Y., Jin, C., Xu, X., Guo, J. & Wang, L. The role of liquid-liquid phase separation in the disease pathogenesis and drug development. Biomed Pharmacother 180, 117448 (2024). https://doi.org:10.1016/j.biopha.2024.117448
16 McSwiggen, D. T., Mir, M., Darzacq, X. & Tjian, R. Evaluating phase separation in live cells: diagnosis, caveats, and functional consequences. Genes Dev 33, 1619-1634 (2019). https://doi.org:10.1101/gad.331520.119
17 Carnell, M., Macmillan, A. & Whan, R. Fluorescence recovery after photobleaching (FRAP): acquisition, analysis, and applications. Methods Mol Biol 1232, 255-271 (2015). https://doi.org:10.1007/978-1-4939-1752-5_18
18 Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nature Methods 9, 671-675 (2012). https://doi.org:10.1038/nmeth.2089
19 Koulouras, G. et al. EasyFRAP-web: a web-based tool for the analysis of fluorescence recovery after photobleaching data. Nucleic Acids Research 46, W467-W472 (2018). https://doi.org:10.1093/nar/gky508
20 Li, J. et al. Protein phase separation and its role in chromatin organization and diseases. Biomed Pharmacother 138, 111520 (2021). https://doi.org:10.1016/j.biopha.2021.111520
21 Alberti, S., Gladfelter, A. & Mittag, T. Considerations and Challenges in Studying Liquid-Liquid Phase Separation and Biomolecular Condensates. Cell 176, 419-434 (2019). https://doi.org:10.1016/j.cell.2018.12.035
22 Lancaster, A. K., Nutter-Upham, A., Lindquist, S. & King, O. D. PLAAC: a web and command-line application to identify proteins with prion-like amino acid composition. Bioinformatics 30, 2501-2502 (2014). https://doi.org:10.1093/bioinformatics/btu310
23 Erdős, G., Pajkos, M. & Dosztányi, Z. IUPred3: prediction of protein disorder enhanced with unambiguous experimental annotation and visualization of evolutionary conservation. Nucleic Acids Research 49, W297-W303 (2021). https://doi.org:10.1093/nar/gkab408
24 Aspromonte, M. C. et al. DisProt in 2024: improving function annotation of intrinsically disordered proteins. Nucleic Acids Research 52, D434-D441 (2023). https://doi.org:10.1093/nar/gkad928
25 Piovesan, D. et al. MobiDB: 10 years of intrinsically disordered proteins. Nucleic Acids Res 51, D438-d444 (2023). https://doi.org:10.1093/nar/gkac1065
26 Hardenberg, M., Horvath, A., Ambrus, V., Fuxreiter, M. & Vendruscolo, M. Widespread occurrence of the droplet state of proteins in the human proteome. Proceedings of the National Academy of Sciences 117, 33254-33262 (2020). https://doi.org:doi:10.1073/pnas.2007670117
27 Saar, K. L. et al. Learning the molecular grammar of protein condensates from sequence determinants and embeddings. Proceedings of the National Academy of Sciences 118, e2019053118 (2021). https://doi.org:doi:10.1073/pnas.2019053118
28 van Mierlo, G. et al. Predicting protein condensate formation using machine learning. Cell Reports 34, 108705 (2021). https://doi.org:https://doi.org/10.1016/j.celrep.2021.108705
29 Chen, Z. et al. Screening membraneless organelle participants with machine-learning models that integrate multimodal features. Proc Natl Acad Sci U S A 119, e2115369119 (2022). https://doi.org:10.1073/pnas.2115369119
30 Chu, X. et al. Prediction of liquid–liquid phase separating proteins using machine learning. BMC Bioinformatics 23, 72 (2022). https://doi.org:10.1186/s12859-022-04599-w
31 Pu, M., Fu, B. & Zhang, Y. J. LLPSWise - fast and accurate prediction of LLPS constituents. bioRxiv, 2022.2007.2025.501404 (2022). https://doi.org:10.1101/2022.07.25.501404
32 Miyata, K. & Iwasaki, W. Seq2Phase: language model-based accurate prediction of client proteins in liquid–liquid phase separation. Bioinformatics Advances 4 (2023). https://doi.org:10.1093/bioadv/vbad189
33 Yu, K. et al. dSCOPE: a software to detect sequences critical for liquid-liquid phase separation. Brief Bioinform 24 (2023). https://doi.org:10.1093/bib/bbac550
34 Zhou, S., Zhou, Y., Liu, T., Zheng, J. & Jia, C. PredLLPS_PSSM: a novel predictor for liquid–liquid protein separation identification based on evolutionary information and a deep neural network. Briefings in Bioinformatics 24 (2023). https://doi.org:10.1093/bib/bbad299
35 Hou, S. et al. Machine learning predictor PSPire screens for phase-separating proteins lacking intrinsically disordered regions. Nature Communications 15, 2147 (2024). https://doi.org:10.1038/s41467-024-46445-y
36 Sun, J. et al. Precise prediction of phase-separation key residues by machine learning. Nature Communications 15, 2662 (2024). https://doi.org:10.1038/s41467-024-46901-9
37 Mohs, R. C. & Greig, N. H. Drug discovery and development: Role of basic biological research. Alzheimers Dement (N Y) 3, 651-657 (2017). https://doi.org:10.1016/j.trci.2017.10.005
38 Sertkaya, A., Beleche, T., Jessup, A. & Sommers, B. D. Costs of Drug Development and Research and Development Intensity in the US, 2000-2018. JAMA Netw Open 7, e2415445 (2024). https://doi.org:10.1001/jamanetworkopen.2024.15445
39 Sabe, V. T. et al. Current trends in computer aided drug design and a highlight of drugs discovered via computational techniques: A review. Eur J Med Chem 224, 113705 (2021). https://doi.org:10.1016/j.ejmech.2021.113705
40 Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47, D930-D940 (2019). https://doi.org:10.1093/nar/gky1075
41 Tingle, B. I. et al. ZINC-22 horizontal line A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery. J Chem Inf Model 63, 1166-1176 (2023). https://doi.org:10.1021/acs.jcim.2c01253
42 Kim, S. et al. PubChem 2023 update. Nucleic Acids Res 51, D1373-D1380 (2023). https://doi.org:10.1093/nar/gkac956
43 Zhao, H. et al. NPASS database update 2023: quantitative natural product activity and species source database for biomedical research. Nucleic Acids Res 51, D621-D628 (2023). https://doi.org:10.1093/nar/gkac1069
44 Butina, D. Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets. Journal of Chemical Information and Computer Sciences 39, 747-750 (1999). https://doi.org:10.1021/ci9803381
45 Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 46, 3-26 (2001). https://doi.org:10.1016/s0169-409x(00)00129-0
46 Kralj, S., Jukič, M. & Bren, U. Molecular Filters in Medicinal Chemistry. Encyclopedia 3, 501-511 (2023).
47 Baell, J. B. & Holloway, G. A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem 53, 2719-2740 (2010). https://doi.org:10.1021/jm901137j
48 Brenk, R. et al. Lessons Learnt from Assembling Screening Libraries for Drug Discovery for Neglected Diseases. ChemMedChem 3, 435-444 (2008). https://doi.org:https://doi.org/10.1002/cmdc.200700139
49 Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Research 28, 235-242 (2000). https://doi.org:10.1093/nar/28.1.235
50 Varadi, M. et al. AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences. Nucleic Acids Research 52, D368-D375 (2023). https://doi.org:10.1093/nar/gkad1011
51 Eberhardt, J., Santos-Martins, D., Tillack, A. F. & Forli, S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J Chem Inf Model 61, 3891-3898 (2021). https://doi.org:10.1021/acs.jcim.1c00203
52 Alhossary, A., Handoko, S. D., Mu, Y. & Kwoh, C.-K. Fast, accurate, and reliable molecular docking with QuickVina 2. Bioinformatics 31, 2214-2216 (2015). https://doi.org:10.1093/bioinformatics/btv082
53 Hassan, N. M., Alhossary, A. A., Mu, Y. & Kwoh, C.-K. Protein-Ligand Blind Docking Using QuickVina-W With Inter-Process Spatio-Temporal Integration. Scientific Reports 7, 15451 (2017). https://doi.org:10.1038/s41598-017-15571-7
54 smina: Scoring and Minimization with AutoDock Vina.
55 Zhao, H., Huang, D. & Caflisch, A. Discovery of tyrosine kinase inhibitors by docking into an inactive kinase conformation generated by molecular dynamics. ChemMedChem 7, 1983-1990 (2012). https://doi.org:10.1002/cmdc.201200331
56 McNutt, A. T. et al. GNINA 1.0: molecular docking with deep learning. Journal of Cheminformatics 13, 43 (2021). https://doi.org:10.1186/s13321-021-00522-2
57 Corso, G. et al. Deep confident steps to new pockets: Strategies for docking generalization. ArXiv, arXiv: 2402.18396 v18391 (2024).
58 Yu, Y. et al. Uni-Dock: GPU-Accelerated Docking Enables Ultralarge Virtual Screening. Journal of Chemical Theory and Computation 19, 3336-3345 (2023). https://doi.org:10.1021/acs.jctc.2c01145
59 Google Colab: Welcome to Colab, <https://colab.research.google.com/> (
60 Hugging Face Spaces: Discover amazing AI apps made by the community!, <https://huggingface.co/spaces> (
61 PyQt5, <https://riverbankcomputing.com/software/pyqt/intro> (
62 Bradski, G. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).
63 Harris, C. R. et al. Array programming with NumPy. Nature 585, 357-362 (2020). https://doi.org:10.1038/s41586-020-2649-2
64 Hunter, J. D. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering 9, 90-95 (2007). https://doi.org:10.1109/MCSE.2007.55
65 You, K. et al. PhaSepDB: a database of liquid-liquid phase separation related proteins. Nucleic Acids Res 48, D354-d359 (2020). https://doi.org:10.1093/nar/gkz847
66 Consortium, T. U. UniProt: the Universal Protein Knowledgebase in 2025. Nucleic Acids Research 53, D609-D617 (2024). https://doi.org:10.1093/nar/gkae1010
67 MLX: Efficient and flexible machine learning on Apple silicon (2023).
68 Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
69 Inc, P. T. Collaborative data science. (2015).
70 McKinney, W. (eds Stéfan van der Walt & Jarrod Millman) 56-61.
71 PySide6: Qt for Python., <https://www.qt.io/qt-for-python> (
72 RDKit: Open-source cheminformatics., < https://www.rdkit.org> (
73 O'Boyle, N. M. et al. Open Babel: An open chemical toolbox. Journal of Cheminformatics 3, 33 (2011). https://doi.org:10.1186/1758-2946-3-33
74 Meeko: Interfacing RDKit and AutoDock., <https://github.com/forlilab/Meeko> (
75 requests: A simple, yet elegant, HTTP library. , <https://github.com/psf/requests> (
76 aiohttp: Asynchronous HTTP client/server framework for asyncio and Python
77 Bjerrum, E. J., Palunas, K. & Menke, J. (ChemRxiv, 2024).
78 Collet, Y. & Kucherawy, M. RFC 8878: Zstandard Compression and the 'application/zstd' Media Type. (RFC Editor, 2021).
79 SQLite: Official Git mirror of the SQLite source tree
80 PDBFixer: PDBFixer fixes problems in PDB files., <https://github.com/openmm/pdbfixer> (
81 Eastman, P. et al. OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials. The Journal of Physical Chemistry B 128, 109-116 (2024). https://doi.org:10.1021/acs.jpcb.3c06662
82 Boothroyd, S. et al. Development and Benchmarking of Open Force Field 2.0.0: The Sage Small Molecule Force Field. J Chem Theory Comput 19, 3251-3275 (2023). https://doi.org:10.1021/acs.jctc.3c00039
83 Rose, A. S. & Hildebrand, P. W. NGL Viewer: a web application for molecular visualization. Nucleic Acids Res 43, W576-579 (2015). https://doi.org:10.1093/nar/gkv402
84 Kochnev, Y. & Durrant, J. D. FPocketWeb: protein pocket hunting in a web browser. J Cheminform 14, 58 (2022). https://doi.org:10.1186/s13321-022-00637-0
85 Solomon, D. A., Smikle, R., Reid, M. J. & Mizielinska, S. Altered Phase Separation and Cellular Impact in C9orf72-Linked ALS/FTD. Front Cell Neurosci 15, 664151 (2021). https://doi.org:10.3389/fncel.2021.664151
86 Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology 35, 1026-1028 (2017). https://doi.org:10.1038/nbt.3988
87 Wang, C.-Y. et al. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 390-391.
88 Rai, S., Pramanik, S. & Mukherjee, S. Deciphering the liquid–liquid phase separation induced modulation in the structure, dynamics, and enzymatic activity of an ordered protein β-lactoglobulin. Chemical Science 15, 3936-3948 (2024). https://doi.org:10.1039/D3SC06802A
89 Alshareedah, I., Moosa, M. M., Raju, M., Potoyan, D. A. & Banerjee, P. R. Phase transition of RNA−protein complexes into ordered hollow condensates. Proceedings of the National Academy of Sciences 117, 15650-15658 (2020). https://doi.org:doi:10.1073/pnas.1922365117
90 Kang, H. et al. PARIS undergoes liquid-liquid phase separation and poly(ADP-ribose)-mediated solidification. EMBO reports 24, e56166 (2023). https://doi.org:https://doi.org/10.15252/embr.202256166
91 Uyeki, T. M., Hui, D. S., Zambon, M., Wentworth, D. E. & Monto, A. S. Influenza. The Lancet 400, 693-706 (2022). https://doi.org:10.1016/S0140-6736(22)00982-5
92 Lafond, K. E. et al. Global burden of influenza-associated lower respiratory tract infections and hospitalizations among adults: A systematic review and meta-analysis. PLoS Med 18, e1003550 (2021). https://doi.org:10.1371/journal.pmed.1003550
93 Guo, Y. et al. Research progress on the antiviral activities of natural products and their derivatives: Structure–activity relationships. Frontiers in Chemistry Volume 10 - 2022 (2022). https://doi.org:10.3389/fchem.2022.1005360
94 Carnell, M., Macmillan, A. & Whan, R. in Methods in Membrane Lipids (ed Dylan M. Owen) 255-271 (Springer New York, 2015).
95 Wang, Y.-L. et al. MRNIP condensates promote DNA double-strand break sensing and end resection. Nature Communications 13, 2638 (2022). https://doi.org:10.1038/s41467-022-30303-w
96 Xie, Q., Luong, M.-T., Hovy, E. & Le, Q. V. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10687-10698.
97 Lin, Z. et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022.2007.2020.500902 (2022). https://doi.org:10.1101/2022.07.20.500902
98 Heinzinger, M. et al. Bilingual language model for protein sequence and structure. NAR Genomics and Bioinformatics 6 (2024). https://doi.org:10.1093/nargab/lqae150
99 Yang, K. K., Fusi, N. & Lu, A. X. Convolutions are competitive with transformers for protein sequence pretraining. bioRxiv, 2022.2005.2019.492714 (2024). https://doi.org:10.1101/2022.05.19.492714
100 Wang, J. & Dokholyan, N. V. Multimodal Bonds Reconstruction Towards Generative Molecular Design. bioRxiv, 2025.2005.2006.652517 (2025). https://doi.org:10.1101/2025.05.06.652517
101 Lu, W. et al. DynamicBind: predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model. Nature Communications 15, 1071 (2024). https://doi.org:10.1038/s41467-024-45461-2
102 Team, B. A. A. S. et al. Protenix - Advancing Structure Prediction Through a Comprehensive AlphaFold3 Reproduction. bioRxiv, 2025.2001.2008.631967 (2025). https://doi.org:10.1101/2025.01.08.631967
103 team, C. D. et al. Chai-1: Decoding the molecular interactions of life. bioRxiv, 2024.2010.2010.615955 (2024). https://doi.org:10.1101/2024.10.10.615955
104 Wohlwend, J. et al. Boltz-1 Democratizing Biomolecular Interaction Modeling. bioRxiv, 2024.2011.2019.624167 (2024). https://doi.org:10.1101/2024.11.19.624167
105 Diao, Y., Hu, F., Shen, Z. & Li, H. MacFrag: segmenting large-scale molecules to obtain diverse fragments with high qualities. Bioinformatics 39 (2023). https://doi.org:10.1093/bioinformatics/btad012
106 Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking. Journal of Medicinal Chemistry 55, 6582-6594 (2012). https://doi.org:10.1021/jm300687e
校內:2030-07-23公開