簡易檢索 / 詳目顯示

研究生: 陳品豪
Chen, Pin-Hao
論文名稱: CLASH資料分析網頁工具
CLASH data analysis web tool
指導教授: 吳謂勝
Wu, Wei-Sheng
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 68
中文關鍵詞: CLASH分析RNA調控標靶預測網頁工具
外文關鍵詞: CLASH analysis, RNA regulation, target predict, web tool
相關次數: 點閱:118下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Ribonucleic acid (RNA)在細胞內不僅是基因傳訊者,更以small regulatory RNA (如piRNA、miRNA、siRNA等)的形式調節著其他RNA (target RNA),進而影響許多基因表現,在細胞生長、凋亡、分化等機制中扮演著重要角色,因此熟知RNA的調控功能可幫助我們探索更多的生物機能,進而應用在生物醫學等方面。
    傳統上想知道某個(regulatory RNA)-(target RNA)對的關係,就需進行一次實驗,若想探討多組RNA-RNA對,則會耗費許多時間與精力。2011年CLASH (Cross-linking ligation and sequencing of hybrids)這個實驗技術被提出,能大量有效獲取含有(regulatory RNA)-(target RNA)對的read,然而由於實驗過程中的不完美,含有(regulatory RNA)-(target RNA)對的CLASH read只佔少數,更不知read上對應的regulatory RNA與target RNA的身分和真實結合情況,為此需要進行後續的生物資訊分析,才能辨識出read中所隱含的regulatory RNA及target RNA的身分。
    目前文獻中有三種分析方法(piRTarBase, Hyb, CLAN)可對CLASH實驗得到的read進行分析,然而三種分析方法各有優缺點以及不同的設計流程,還須要使用者熟悉Command-Line Interface (CLI)才能進行安裝與操作,並且三種分析方法最後都只給出read上regulatory RNA與target RNA的身分,沒有進一步分析兩者可能的結合區域和結合穩定度。另外也沒有整理出真正對生物學家有幫助的資訊,例如:一個regulatory RNA可以調控那些target RNA以及一個target RNA會被那些regulatory RNA調控等。因此本研究對於這三種分析方法進行整合與改良,開發出一個更完整的網頁分析平台,能讓生物學家只需上傳資料不需操作Command-Line Interface便能進行分析,並提供多種資訊查詢與輔助參照,以及將分析結果進行統整與視覺化呈現,方便生物學家觀察。
    最後本研究利用開發出的網頁分析工具,針對十組CLASH資料集進行分析,探討三種分析方法的異同,以及平台所設計參數的作用,總結出推薦的分析方式。

    RNA interference (RNAi) plays an important role in post-transcriptional regulation of gene expression. The CLASH (Cross-linking, Ligation and Sequencing of Hybrids) technique can efficiently captures reads which contain information of (regulatory RNA)–(target RNA) interactions by physically joining two RNAs. Nevertheless, the number of reads containing information are relatively rare; moreover, these reads do not provide the identity of regulatory RNA and target RNA. Therefore, the follow-up bioinformatics analysis is required. At present, there are three analysis methods (piRTarBase, Hyb, CLAN) can analyze the reads obtained from the CLASH experiment. However, each of the three analysis methods has advantages and disadvantages as well as different design processes. All of them require biologists to be familiar with the command line interface (CLI) to install and operate. Furthermore, the result of three analysis methods only gives the identity of the regulatory RNA and target RNA on the read, without further analysis of the possible binding site and binding stability of the two RNAs. As mentioned above, we present a more complete analysis process web tool that allows biologists to analyze data without using CLI, have a variety of information to query, and easily make an observation by visualizing the analysis results.

    摘要 I EXTEND ABSTRACT III 致謝 VI 目錄 VII 圖目錄 XI 表目錄 XV 第一章 研究背景與動機 1 1.1 RNA 調控功能 1 1.1.1 RNA干擾機制 1 1.1.2 RNA-induced silencing complex (RISC) 1 1.2 CLASH定序實驗與分析 2 1.2.1 CLASH定序實驗 3 1.2.2 定序後的分析流程 4 1.2.2.1 輸入資料 4 1.2.2.2 去除5’ linker 6 1.2.2.3 去除 3’ linker 6 1.2.2.4 尋找RNA-RNA對資訊 6 1.3 現有 CLASH 分析方法介紹 7 1.3.1 piRTarBase (尋找RNA-RNA對資訊) 7 1.3.2 Hyb (尋找RNA-RNA對資訊) 8 1.3.3 CLAN (尋找RNA-RNA對資訊) 9 1.4 研究動機 10 第二章 CLASH Analyst 模組介紹 12 2.1 分析流程設計簡介 12 2.2 各個輸入模組介紹 13 2.2.1 輸入資料 13 2.2.2 CLASH Read前處理 13 2.2.3 CLASH Read品質挑選 14 2.2.3.1 Read Count品質挑選 15 2.2.3.2 RNAfold品質挑選 15 2.2.4 尋找RNA-RNA對資訊 16 2.2.4.1 尋找RNA-RNA對資訊(piRTarBase) 16 2.2.4.2 尋找RNA-RNA對資訊(Hyb) 18 2.2.4.3 尋找RNA-RNA對資訊(CLAN) 20 2.2.4.4 對Hyb與CLAN結果後處理 20 2.2.5 RNAup Analysis 21 2.2.5.1 Targeted Region前處理 21 2.2.5.2 RNAup計算-最佳結合區域 22 2.2.5.3 RNAup結果分析 23 2.2.5.4 RNAup計算-計算binding site自由能 24 2.2.6 原始分析結果 25 第三章 CLASH Analyst分析網站 26 3.1 網站功能介紹 26 3.1.1 上傳資料與參數選擇 27 3.1.1.1 gene_file.csv作用 29 3.1.2 E-mail確認與通知 30 3.1.3 工作排程(Job scheduling) 30 3.1.4 瀏覽頁面功能 31 3.1.4.1 條件瀏覽與整理 33 3.1.5 瀏覽模式 34 3.1.5.1 以target RNA為基準 34 3.1.5.2 以regulatory RNA為基準 35 3.1.6 視覺化頁面功能 36 第四章 實例探討 38 4.1 輸入資料介紹 38 4.1.1 人類資料集介紹 38 4.1.2 線蟲資料集介紹 39 4.1.3 輸入參數介紹 40 4.2 RNA-RNA對數量比較 42 4.3 RNA-RNA對交集比較 43 4.3.1 分析流程 43 4.3.2 結果探討 44 4.3.3 進階探討- Hyb結果數量比CLAN少的原因 45 4.3.4 進階探討- piRTarBase結果數量比CLAN少的原因 46 4.3.5 進階探討- Hyb和CLAN的結果差異 47 4.3.6 進階探討- piRTarBase和CLAN的結果差異 48 4.3.7 進階探討- piRTarBase和Hyb的結果差異 48 4.3.8 進階探討-不考慮RNA-RNA對來源的交集比較 49 4.4 RNA-RNA對品質比較 51 4.4.1 分析流程 51 4.4.2 結果探討 52 4.4.3 進階探討-高品質RNA-RNA對差異 54 4.5 分析RNA-RNA對所需時間比較 54 4.6 read品質挑選- RNAfold作用 55 4.6.1 分析流程 56 4.6.2 結果探討 57 4.6.2.1 進階探討- read產生RNA-RNA對的比率 60 4.7 read品質挑選-Read Count作用 60 4.7.1 分析流程 61 4.7.2 結果探討 61 4.7.2.1 進階探討- read產生RNA-RNA對的比率 64 第五章 結論與未來展望 65 5.1 結論 65 5.2 未來展望 65 參考文獻 66 附錄 i 附錄一 : RNAup不同分數挑選(-10,-20,-30) i 附錄二 : RNAfold作用 iv 附錄三 : Read Count作用 viii

    [1] K. E. Baker and J. Coller. The many routes to regulating mRNA translation, Genome Biol, vol. 7, no. 12, pp. 332, 2006.
    [2] D. Moazed. Small RNAs in transcriptional gene silencing and genome defence, Nature, vol. 457, no. 7228, pp. 413-20, 2009.
    [3] A. S. Pickford and C. Cogoni. RNA-mediated gene silencing, Cell Mol Life Sci, vol. 60, no. 5, pp. 871-82, 2003.
    [4] M. Tijsterman, R. F. Ketting, and R. H. Plasterk. The genetics of RNA silencing, Annu Rev Genet, vol. 36, no., pp. 489-519, 2002.
    [5] Grace W Redberry, Gene silencing : new research. 2006: New York : Nova Science Publishers.
    [6] D. H. Kim and J. J. Rossi. Strategies for silencing human disease using RNA interference, Nat Rev Genet, vol. 8, no. 3, pp. 173-84, 2007.
    [7] B. L. Davidson and P. B. McCray, Jr. Current prospects for RNA interference-based therapies, Nat Rev Genet, vol. 12, no. 5, pp. 329-40, 2011.
    [8] Nucleoproteins. the US National Library of Medicine Medical Subject Headings (MeSH).
    [9] M. Wakiyama, K. Takimoto, O. Ohara, and S. Yokoyama. Let-7 microRNA-mediated mRNA deadenylation and translational repression in a mammalian cell-free system, Genes Dev, vol. 21, no. 15, pp. 1857-62, 2007.
    [10] A. J. Pratt and I. J. MacRae. The RNA-induced silencing complex: a versatile gene-silencing machine, J Biol Chem, vol. 284, no. 27, pp. 17897-901, 2009.
    [11] G. Kudla, S. Granneman, D. Hahn, J. D. Beggs, and D. Tollervey. Cross-linking, ligation, and sequencing of hybrids reveals RNA-RNA interactions in yeast, Proc Natl Acad Sci U S A, vol. 108, no. 24, pp. 10010-5, 2011.
    [12] A. Helwak, G. Kudla, T. Dudnakova, and D. Tollervey. Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding, Cell, vol. 153, no. 3, pp. 654-65, 2013.
    [13] J. Ule, K. B. Jensen, M. Ruggiu, A. Mele, A. Ule, and R. B. Darnell. CLIP identifies Nova-regulated RNA networks in the brain, Science, vol. 302, no. 5648, pp. 1212-5, 2003.
    [14] D. D. Licatalosi, A. Mele, J. J. Fak, J. Ule, M. Kayikci, S. W. Chi, T. A. Clark, A. C. Schweitzer, J. E. Blume, X. Wang, J. C. Darnell, and R. B. Darnell. HITS-CLIP yields genome-wide insights into brain alternative RNA processing, Nature, vol. 456, no. 7221, pp. 464-9, 2008.
    [15] P. J. Cock, C. J. Fields, N. Goto, M. L. Heuer, and P. M. Rice. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants, Nucleic Acids Res, vol. 38, no. 6, pp. 1767-71, 2010.
    [16] D. L. Wheeler, T. Barrett, D. A. Benson, S. H. Bryant, K. Canese, V. Chetvernin, D. M. Church, M. Dicuccio, R. Edgar, S. Federhen, M. Feolo, L. Y. Geer, W. Helmberg, Y. Kapustin, O. Khovayko, D. Landsman, D. J. Lipman, T. L. Madden, D. R. Maglott, V. Miller, J. Ostell, K. D. Pruitt, G. D. Schuler, M. Shumway, E. Sequeira, S. T. Sherry, K. Sirotkin, A. Souvorov, G. Starchenko, R. L. Tatusov, T. A. Tatusova, L. Wagner, and E. Yaschenko. Database resources of the National Center for Biotechnology Information, Nucleic Acids Res, vol. 36, no. Database issue, pp. D13-21, 2008.
    [17] NCBI. SRA-toolkit. Available from: https://github.com/ncbi/sra-tools/wiki.
    [18] G. M. Church and S. Kieffer-Higgins. Multiplex DNA sequencing, Science, vol. 240, no. 4849, pp. 185-8, 1988.
    [19] J. M. Bartlett and D. Stirling. A short history of the polymerase chain reaction, Methods Mol Biol, vol. 226, no., pp. 3-6, 2003.
    [20] W. S. Wu, J. S. Brown, T. T. Chen, Y. H. Chu, W. C. Huang, S. Tu, and H. C. Lee. piRTarBase: a database of piRNA targeting sites and their roles in gene regulation, Nucleic Acids Res, vol. 47, no. D1, pp. D181-d187, 2019.
    [21] A. J. Travis, J. Moody, A. Helwak, D. Tollervey, and G. Kudla. Hyb: a bioinformatics pipeline for the analysis of CLASH (crosslinking, ligation and sequencing of hybrids) data, Methods, vol. 65, no. 3, pp. 263-73, 2014.
    [22] Zhang S Zhong C. CLAN: the CrossLinked reads ANalyais tool, vol., no., pp.,
    [23] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool, J Mol Biol, vol. 215, no. 3, pp. 403-10, 1990.
    [24] P. G. Engström, T. Steijger, B. Sipos, G. R. Grant, A. Kahles, G. Rätsch, N. Goldman, T. J. Hubbard, J. Harrow, R. Guigó, and P. Bertone. Systematic evaluation of spliced alignment programs for RNA-seq data, Nat Methods, vol. 10, no. 12, pp. 1185-91, 2013.
    [25] Michael; Wheeler Burrows, David J. A block sorting lossless data compression algorithm, vol., no., pp., 1994.
    [26] R. Lorenz, S. H. Bernhart, C. Höner Zu Siederdissen, H. Tafer, C. Flamm, P. F. Stadler, and I. L. Hofacker. ViennaRNA Package 2.0, Algorithms Mol Biol, vol. 6, no., pp. 26, 2011.
    [27] M. Dodt, J. T. Roehr, R. Ahmed, and C. Dieterich. FLEXBAR-Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms, Biology (Basel), vol. 1, no. 3, pp. 895-905, 2012.
    [28] S. Andrews. FASTX toolkit. 2010; Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
    [29] Felix Krueger. Trim Galore. Available from: https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/.
    [30] B. Ewing, L. Hillier, M. C. Wendl, and P. Green. Base-calling of automated sequencer traces using phred. I. Accuracy assessment, Genome Res, vol. 8, no. 3, pp. 175-85, 1998.
    [31] B. Ewing and P. Green. Base-calling of automated sequencer traces using phred. II. Error probabilities, Genome Res, vol. 8, no. 3, pp. 186-94, 1998.
    [32] B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biol, vol. 10, no. 3, pp. R25, 2009.
    [33] W. J. Kent. BLAT--the BLAST-like alignment tool, Genome Res, vol. 12, no. 4, pp. 656-64, 2002.
    [34] M. Wang and L. Kong. pblat: a multithread blat algorithm speeding up aligning sequences to genomes, BMC Bioinformatics, vol. 20, no. 1, pp. 28, 2019.
    [35] B. Langmead and S. L. Salzberg. Fast gapped-read alignment with Bowtie 2, Nat Methods, vol. 9, no. 4, pp. 357-9, 2012.
    [36] U. Mückstein, H. Tafer, J. Hackermüller, S. H. Bernhart, P. F. Stadler, and I. L. Hofacker. Thermodynamics of RNA-RNA binding, Bioinformatics, vol. 22, no. 10, pp. 1177-82, 2006.
    [37] S. Brenner. The genetics of Caenorhabditis elegans, Genetics, vol. 77, no. 1, pp. 71-94, 1974.
    [38] C. M. Phillips, K. C. Brown, B. E. Montgomery, G. Ruvkun, and T. A. Montgomery. piRNAs and piRNA-Dependent siRNAs Protect Conserved and Essential C. elegans Genes from Misrouting into the RNAi Pathway, Dev Cell, vol. 34, no. 4, pp. 457-65, 2015.
    [39] A. Kozomara and S. Griffiths-Jones. miRBase: integrating microRNA annotation and deep-sequencing data, Nucleic Acids Res, vol. 39, no. Database issue, pp. D152-7, 2011.
    [40] miRBase. Available from: http://www.mirbase.org.
    [41] GENCODE. Available from: https://www.gencodegenes.org.
    [42] WormBase. Available from: https://wormbase.org/

    下載圖示 校內:2025-07-26公開
    校外:2025-07-26公開
    QR CODE