研究生: |
蔡孟勳 Tsai, Meng-Hsun |
---|---|
論文名稱: |
基因編輯CRISPR/CAS9之裂解點及裂解機率預測 – 相比於經驗評分與深度學習方法 Cross target computer prediction of cleavage sites of CRISPR/CAS9 – Comparison of hypothesis-driven and deep learning methods |
指導教授: |
賀保羅
Paul Horton |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 50 |
中文關鍵詞: | 基因編輯 、CRISPR 、CAS 、機器學習 、深度學習 、CK_Score 、剪輯預測 |
外文關鍵詞: | Gene editing, CRISPR, CAS, Machine learning, Deep learning, CK_Score, Cleavage probability |
相關次數: | 點閱:101 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
人類的基因是很多遺傳學家想要了解以及解鎖的領域之一,並且想要從中達到如神一般的基因編輯能力,所以基因編輯技術的應用從最早期的鋅指核酸酶(ZFN)以及利用人工改造的一種限制性內切酶 TALEN,到目前最新技術 常間回文重複序列叢集/常間回文重複序列叢集關聯蛋白系統(CRIPSR/CAS systems, Clustered regularly interspaced short palindromic repeats and CRISPR-associated systems)的使用,其在每一次實驗方法的開發都對於基因編輯的操作方式、操作過程、實驗時間以及相關應用產業效能都取得大幅的改善。
而雖然CRISPR/CAS系統操作方式簡單、實驗時間縮短,但是在基因編輯上仍存在著脫靶的問題,理想的情況是,可以在實驗前預先得到基因編輯位點及剪輯機率的預測能夠有效地減少實驗操作所花費的時間及金錢,但是,由於CRISPR/CAS實驗過程的不透明以及實驗的過程中沒辦法加入一些特別記號或結果有大幅改變,而使得這項技術很難在最後的結果中去證明哪些基因位點曾被剪輯過,從而使得脫靶編輯的缺點更難以被解決,因此,多篇論文研究,試圖探討利用機器學習技術來擬合經驗數據以此來進行有效的預測.
至目前爲止,已經有幾篇論文利用最流行的技術機器學習以及深度學習來對CRISPR/CAS系統來做研究開裂解位點分類以及位點裂解機率預測,並且也有一些學者對這些開發出來的模型作回顧比較並且從中選出幾個較具代表性以及對於裂解機率預測效果較好的結果分析.
但是這些已開發出來的模型,在預測上仍然有著其準確性問題,並且存在著不同細胞株應用上(未在訓練資料集的目標)的侷限性,所以本文旨在提高在不同資料集(跨目標)的預測精準度,我們首先藉由研究CRISPR/Cas9實驗操作過程的文獻、蒐集可以用於訓練和評估模型的數據集以及比較這些目前效能較好的預測方法,在探索一些選項之後,開發了一種將深度學習分類器與改良後的概率評分算法CK_Score相結合的新演算法,最後,我們結合我們所擁有的方法,將之設計成一套工具,可以用於預測人類DNA序列中CRISPR/Cas9 系統編輯效率。
Human Genes are one of the domains that all geneticists want to understand and unlock how they work, and also want to achieve ability of gene editing. So biologists began a series of studies on gene editing in 1988. An early development in gene editing technology uses zinc finger nucleases (ZFN) and an artificially made restriction enzyme called TALEN. More recently a technology called CRISPR/CAS systems (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-Associated Systems) has become increasingly common since can be easily applied to nearly any target gene by simply providing an appropriate RNA guide sequence. In each experiment ways changed, the development of methods has greatly improved the operation mode, operation process, experiment time and related application industry performance of gene editing.
Although the CRISPR/CAS system is simple to operate and the experimental time is shortened, there is still a same problem of off-target gene cleavage and editing. Ideally such off-target events could be predicted in advance before the experiment to reduce the time and money spent on the experimental operation. Unfortunately, the CRISPR/CAS mechanism is not understood in such fine detail to allow automatically and quantitatively prediction of cleavage sites based on a precise physical model. Thus researchers have resorted to using machine learning techniques to fit to empirical data do such predictions.
So far, there have been several papers using the state of the art and most popular technical machine learning and deep learning to do research on the cleavage site classification and probability prediction of CRISPR/CAS systems, and some scholars have reviewed and compared these developed models and selected a few of them that are more representative and have better results for classification and prediction.
However, the predictions of these developed models still have their accuracy problems and limitations in the application to targets they are not trained on. Therefore, in this thesis we aim to improve cross-target prediction accuracy. We began by surveying literature on the CRISPR/Cas9 experiment operation process, datasets available for training and evaluating models and the current state of the art prediction methods. After exploring some options, we developed a new method combining a deep learning classifier and an improved probabilistic scoring algorithm called CK_Score. Finally, we designed a tool that can be used to predict the probability of CRISPR/Cas9 gene editing in human DNA sequence.
[1] Ishino Y, Shinagawa H, Makino K, Amemura M, Nakata A. Nucleotide sequence of the iap gene, responsible for alkaline phosphatase isozyme conversion in Escherichia coli, and identification of the gene product. J Bacteriol. 169(12):5429-33, 1987. doi: 10.1128/jb.169.12.5429-5433.1987. PMID: 3316184; PMCID: PMC213968.
[2] CRISPR WIKIPEDIA [ONLINE] Available: https://en.wikipedia.org/wiki/CRISPR
[3] Terns MP, Terns RM. CRISPR-based adaptive immune systems. Curr Opin Microbiol. 14(3):321-7,2011. doi: 10.1016/j.mib.2011.03.005. Epub 2011 Apr 29. PMID: 21531607; PMCID: PMC3119747.
[4] Restriction enzyme WIKIPEDIA [ONLINE]
Available: https://en.wikipedia.org/wiki/Restriction_enzyme
[5] Westra ER, Dowling AJ, Broniewski JM, van Houte S. "Evolution and Ecology of CRISPR". Annual Review of Ecology, Evolution, and Systematics. 47 (1): 307–331,2016. doi:10.1146/annurev-ecolsys-121415-032428
[6] Makarova KS, Koonin EV. Annotation and Classification of CRISPR-Cas Systems. Methods Mol Biol. 1311:47-75,2015. doi: 10.1007/978-1-4939-2687-9_4. PMID: 25981466; PMCID: PMC5901762.
[7] Koonin EV, Makarova KS. Origins and evolution of CRISPR-Cas systems. Philos Trans R Soc Lond B Biol Sci. 374(1772):20180087,2019. doi:10.1098/rstb.2018.0087
[8] Gleditzsch D, Pausch P, Müller-Esparza H, Özcan A, Guo X, Bange G, Randau L. PAM identification by CRISPR-Cas effector complexes: diversified mechanisms and structures. RNA Biol. 16(4):504-517,2019. doi: 10.1080/15476286.2018.1504546. Epub 2018 Sep 18. PMID: 30109815; PMCID: PMC6546366.
[9] Zhang, Y., Ge, X., Yang, F. et al. Comparison of non-canonical PAMs for CRISPR/Cas9-mediated DNA cleavage in human cells. Sci Rep 4:5405,2015. https://doi.org/10.1038/srep05405
[10] Khan FJ, Yuen G, Luo J. Multiplexed CRISPR/Cas9 gene knockout with simple crRNA:tracrRNA co-transfection. Cell Biosci. 9:41,2019. doi: 10.1186/s13578-019-0304-0. PMID: 31139343; PMCID: PMC6528186.
[11] Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 337(6096):816‐821,2012. doi:10.1126/science.1225829
[12] Sternberg, S., Redding, S., Jinek, M. et al.
DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507:62–67,2014. https://doi.org/10.1038/nature13011
[13] Daley JM, Sung P. 53BP1, BRCA1, and the choice between recombination and end joining at DNA double-strand breaks. Mol Cell Biol. 34(8):1380‐1388,2014. doi:10.1128/MCB.01639-13
[14] CRISPR gene editing WIKIPEDIA [ONLINE]
Available: https://en.wikipedia.org/wiki/CRISPR_gene_editing
[15] Kuscu C, Arslan S, Singh R, Thorpe J, Adli M. Genome-wide analysis reveals characteristics of off-target sites bound by the Cas9 endonuclease. Nat Biotechnol. 32(7):677‐683,2014. doi:10.1038/nbt.2916
[16] Lee CM, Cradick TJ, Fine EJ, Bao G. Nuclease Target Site Selection for Maximizing On-target Activity and Minimizing Off-target Effects in Genome Editing. Mol Ther. 24(3):475‐487,2016. doi:10.1038/mt.2016.1
[17] Wang J, Zhang X, Cheng L, Luo Y. An overview and metanalysis of machine and deep learning-based CRISPR gRNA design tools. RNA Biol. 17(1):13‐22,2020. doi:10.1080/15476286.2019.1669406
[18] Doench, J., Fusi, N., Sullender, M. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat Biotechnol 34:184–191,2016. https://doi.org/10.1038/nbt.3437
[19] Labun K, Montague TG, Gagnon JA, Thyme SB, Valen E. CHOPCHOP v2: a web tool for the next generation of CRISPR genome engineering. Nucleic Acids Res. 44(W1):W272‐W276,2016. doi:10.1093/nar/gkw398
[20] Stemmer M, Thumberger T, Del Sol Keyer M, Wittbrodt J, Mateo JL. CCTop: An Intuitive, Flexible and Reliable CRISPR/Cas9 Target Prediction Tool. PLoS One. 10(4):e0124633,2015. doi:10.1371/journal.pone.0124633
[21] Aach J, Mali P, Church GM. CasFinder: Flexible algorithm for identifying specific Cas9 targets in genomes. bioRxiv. 2014:005074. https://doi.org/10.1101/005074.
[22] Chuai, G., Ma, H., Yan, J. et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol 19:80,2018. https://doi.org/10.1186/s13059-018-1459-4
[23] Listgarten J, Weinstein M, Kleinstiver BP, et al. Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs. Nat Biomed Eng. 2(1):38‐47,2018. doi:10.1038/s41551-017-0178-6
[24] Abadi S, Yan WX, Amar D, Mayrose I, et al. A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns
underlying its mechanism of action. PLoS Comput Biol. 13(10): e1005807,2017. doi:10.1371/journal.pcbi.1005807
[25] Lin J, Wong KC. Off-target predictions in CRISPR-Cas9 gene editing using deep learning. Bioinformatics. 34(17):i656-i663,2018.
doi: 10.1093/bioinformatics/bty554. PMID: 30423072; PMCID: PMC6129261.
[26] Liu Q, Cheng X, Liu G, Li B, Liu X. Deep learning improves the ability of sgRNA off-target propensity prediction. BMC Bioinformatics. 21(1):51,2020. doi: 10.1186/s12859-020-3395-z. PMID: 32041517; PMCID: PMC7011380.
[27] Zhang, J., Li, X., Neises, A. et al. Different Effects of sgRNA Length on CRISPR-mediated Gene Knockout Efficiency. Sci Rep 6:28566,2016.
https://doi.org/10.1038/srep28566
[28] Fu Y, Sander JD, Reyon D, Cascio VM, Joung JK. Improving CRISPR-Cas nuclease specificity using truncated guide RNAs. Nat Biotechnol. 32(3):279‐284,2014. doi:10.1038/nbt.2808
[29] Hsu, P., Scott, D., Weinstein, J. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31:827–832,2013. https://doi.org/10.1038/nbt.2647
[30] Zeng H, Edwards MD, Liu G, Gifford DK. Convolutional neural network architectures for predicting DNA-protein binding. Bioinformatics. 32(12):i121-i127,2016. doi: 10.1093/bioinformatics/btw255. PMID: 27307608; PMCID: PMC4908339.
[31] Vanessa Isabell Jurtz, Alexander Rosenberg Johansen, Morten Nielsen, Jose Juan Almagro Armenteros, Henrik Nielsen, Casper Kaae Sønderby, Ole Winther, Søren Kaae Sønderby, An introduction to deep learning on
biological sequence data: examples and solutions, Bioinformatics. 33(22):3685–3690,2017. https://doi.org/10.1093/bioinformatics/btx531
[32] Aidan R O’Brien, Gaetan Burgio, Denis C Bauer, Domain-specific introduction to machine learning terminology, pitfalls and opportunities in CRISPR-based gene editing, Briefings in Bioinformatics, bbz145, https://doi.org/10.1093/bib/bbz145
[33] Haeussler, M., Schönig, K., Eckert, H. et al. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol 17(1):148,2016. https://doi.org/10.1186/s13059-016-1012-2
[34] GENOMES BROWSER OF UNIVERSITY OF CALIAFORINIA GENOMICS INSTITUTE [ONLINE] Available:
https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=criGri1&g=crispr#References
[35] Tsai SQ, Zheng Z, Nguyen NT, Liebers M, Topkar VV, Thapar V, Wyvekens N, Khayter C, Iafrate AJ, Le LP, Aryee MJ, Joung JK. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat Biotechnol. 33(2):187-197,2015. doi: 10.1038/nbt.3117. Epub 2014 Dec 16. PMID: 25513782; PMCID: PMC4320685.
[36] Cho, S. W., Kim, S., Kim, Y., Kweon, J., Kim, H. S., Bae, S., & Kim, J. S. Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases. Genome research. 24(1):132–141,2014. https://doi.org/10.1101/gr.162339.113
[37] Frock, R., Hu, J., Meyers, R. et al. Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nat Biotechnol. 33:179–186,2015. https://doi.org/10.1038/nbt.3101
[38] Kim D, Kim S, Kim S, Park J, Kim JS. Genome-wide target specificities of CRISPR-Cas9 nucleases revealed by multiplex Digenome-seq. Genome Res. 26(3):406‐415,2016. doi:10.1101/gr.199588.115
[39] Ran, F., Cong, L., Yan, W. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature. 520:186–191,2015. https://doi.org/10.1038/nature14299
[40] Wang T, Wei JJ, Sabatini DM, Lander ES. Genetic screens in human cells using the CRISPR-Cas9 system. Science. 343(6166):80‐84,2014. doi:10.1126/science.1246981
[41] Stormo GD. Maximally efficient modeling of DNA sequence motifs at all levels of complexity. Genetics. 187(4):1219‐1224,2011. doi:10.1534/genetics.110.126052
[42] Fu, Y., Foden, J., Khayter, C. et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat Biotechnol. 2013;31:822–826. https://doi.org/10.1038/nbt.2623
[43] Hinz, John M., Marian F Laughery and John J. Wyrick. Nucleosomes Inhibit Cas9 Endonuclease Activity in Vitro. Biochemistry. 54(48):7063-6,2015.
[44] Pattanayak, V., Lin, S., Guilinger, J. et al. High-throughput profiling of off-target DNA cleavage reveals RNA-programmed Cas9 nuclease specificity. Nat Biotechnol. 31:839–843,2013. https://doi.org/10.1038/nbt.2673
[45] Wu X, Scott DA, Kriz AJ, et al. Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells. Nat Biotechnol. 32(7):670‐676,2014. doi:10.1038/nbt.2889
[46] Johnson, J.M., Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J Big Data. 6:27,2019. https://doi.org/10.1186/s40537-019-0192-5
[47] Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794. doi:https://doi.org/10.1145/2939672.2939785
[48] Klein, Misha & Eslami-Mossallam, Behrouz & Arroyo, Dylan & Depken, Martin. Hybridization Kinetics Explains CRISPR-Cas Off-Targeting Rules. Cell Reports. 22:1413-1423,2018. 10.1016/j.celrep.2018.01.045.
[49] e!Ensembl [ONLINE] Available: https://m.ensembl.org/biomart/martview/