簡易檢索 / 詳目顯示

研究生: 林華毅
Lin, Hua-Yi
論文名稱: 開發 RNA-seq 資料的多面向候選基因抽取與不同分析管線基準衡量之平台
Develop a platform for multi-faceted candidate gene extraction and benchmarking of different analysis pipelines from RNA-seq data
指導教授: 吳謂勝
Wu, Wei-Sheng
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 95
中文關鍵詞: 差異表達啟動子變化內部核糖體進入位點isoform 變化RNA-seq
外文關鍵詞: RNA-seq, differential expression, protein isoform change, promoter change, internal ribosome entry site
相關次數: 點閱:42下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • RNA 定序 (RNA-seq) 是一種有助於分析生物樣本中 RNA 分子數量的技術。它提供了對基因表達的深入瞭解。例如,它可以揭示不同條件下的細胞基因表達模式,並有助於探索條件轉錄本多樣性。RNA-seq 對於發現新基因、剖析複雜的基因網路以及找出潛在的致病機制至關重要。基因的調控有多方面的變因,因著選擇性剪接,一個基因可能轉錄成不同的isoform,從而增加蛋白質功能的多樣性,在生物不同發育階段也扮演著重要角色。這些異構物在不同的細胞類型和組織中表達,並在特定的生理和病理條件下發揮獨特的功能。此外,異構物變化的異常可能導致疾病,如癌症、神經退行性疾病和心血管疾病。不僅如此,啟動子的變化、失調也被證實與癌症有密且的關聯,對於不同環境的刺激,啟動子也會產生不同的使用情形。由於分析 RNA-seq 資訊重要性,大量的分析工具已被開發出來,因著多樣性的提高,過於複雜的工具組合、分析流程都使得研究人員進一步分析困難重重,並且這些工具通常需要高技術的生物資訊學人員才能操作,其結果的呈現方式可能也會讓使用者難以收集和詮釋。為了解決這個問題,我們開發了一個可自動化進行多種 RNA-seq 管線的工具,如果要探索多種管線,可簡化資料預處理步驟,以及降低學習工具的成本。此外,所開發的工具有系統地進行以下分析,以萃取不同實驗條件下的候選基因列表: (1)差異表達分析、(2)isoform變化、(3)啟動子變化,以及(4)篩選內部核糖體入口位點(IRES),以進行 cap-indepent 轉譯分析。我們使用國立陽明交通大學生化暨分子生物研究所張崇德助理教授所提供的RNA-seq 之結果上展示了此工具的用法。我們也開發在虛擬機上簡單的單機版使用者友善網頁介面,只需點擊幾下滑鼠,不同面向的候選基因就能在數小時內被開發的工具篩選出來。總的來說,這項工作提出了一個友善的網頁使用者介面,方便整合使用現有的 RNA-seq 分析管道,並從設計的實驗中系統性地找出候選基因清單。

    RNA sequencing (RNA-seq) is a technique used to analyze the number of RNA molecules in a biological sample. It provides insights into gene expression, revealing patterns of cellular gene expression under different conditions and exploring conditional transcript diversity. RNA-seq is essential for discovering new genes, dissecting complex gene networks, and identifying potential disease-causing mechanisms. Gene regulation is multifactorial, and selective splicing can result in a gene being transcribed into different isoforms, increasing the functional diversity of proteins and playing a critical role in various stages of biological development. The isoforms are expressed in various cell types and tissues, carrying out distinct functions under specific physiological and pathological conditions. Abnormalities in isoform changes can contribute to diseases like cancer, neurodegenerative diseases, and cardiovascular diseases. Furthermore, alterations and dysregulation of the promoter are closely associated with cancer. The promoter is utilized differently in response to various environmental stimuli. Due to the importance of analyzing RNA-seq information, a large number of analytical tools have been developed. However, the increased diversity and overly complex combinations of these tools have made it difficult for researchers to analyze them further. Additionally, these tools often require highly skilled bioinformaticians to operate, and the presentation of the results may be challenging for users to collect and interpret. To solve this problems, we have developed a tool that automates multiple RNA-seq pipelines, simplifies data preprocessing when exploring multiple pipelines, and reduces the learning costs for users. In addition, the developed tool systematically performs the following analyses to extract a list of candidate genes under different experimental conditions: (1) differential expression analysis, (2) isoform changes, (3) promoter changes, and (4) We demonstrated the use of this tool on the results of RNA-seq provided by Assistant Professor Chung-Te CHANG at the Institute of Biochemistry and Molecular Biology, National Yang Ming Chiao Tung University. We also developed a simple stand-alone user-friendly web interface on a virtual machine. With just a few mouse clicks, candidate genes of different orientations can be screened out by the developed tool within a few hours. Overall, this work presents a user-friendly web user interface that facilitates the integration of existing RNA-seq analysis pipelines and the systematic identification of candidate gene lists from designed experiments.

    中文摘要 ii SUMMARY iv 誌謝 ix 目錄 xi 表目錄 xiv 圖目錄 xv 第1章 研究背景與動機 1 1.1 研究背景 1 1.2 研究動機 3 1.3 研究目的 4 第2章 文獻回顧 5 2.1 RNA-Seq協議 5 2.2 轉接子(adapter) 6 2.3 GSEA (Gene set enrichment analysis) 7 2.4 轉譯起始(Translation Initiation) 10 2.4.1 Cap-Dependent Translation Initiation 11 2.4.2 Cap-Independent Translation Initiation 13 第3章 方法與工具 14 3.1 實驗工具 14 3.1.1 序列品質控管工具 Trim Galore 15 3.1.2 基因組映射工具 Bowtie, Tophat2, STAR 16 3.1.3 排序工具 samtools 19 3.1.4 轉錄本組裝、轉錄本計數工具 cufflinks, RSEM, HTSeq 20 3.1.5 轉錄本最後組裝工具 cuffmerge 24 3.1.6 差異分析工具cuffdiff, edgeR, DESeq2 24 3.2 基因多面向資料取得及前處理 28 3.2.1 表現量資訊提取 29 3.2.2 排除一個轉錄本對到不同啟動子、track_id的狀況 31 3.2.3 排除非活性基因/isoform/啟動子 33 3.2.4 基因表現量正規化校正 33 3.2.4.1 Cufflinks 正規化方法 34 3.2.4.2 DESeq 正規化方法 34 3.2.4.3 edgeR 正規化方法 39 3.3 候選基因篩選條件 39 3.3.1 假設檢定 40 3.3.1.1 Student t檢定(Student t-test) 41 3.3.1.2 多重假設檢定(Multiple Hypothesis Testing) 43 3.3.2 變化倍數 44 3.3.3 isoform/啟動子占比 44 3.4 篩選差異表現之基因 47 3.5 篩選具isoform變化之基因 48 3.6 篩選啟動子變化之基因 50 3.7 Upset plot 52 3.8 IRES 辨識方法 53 第4章 研究成果 55 4.1 MVC 架構 55 4.2 網頁系統架構 55 4.3 網頁介面資料傳輸 API 56 4.4 網路插槽 (socket) 57 4.5 網頁展示 58 第5章 實例分析 65 5.1 案例資料 65 5.2 分析結果呈現 65 5.2.1 GO, KEGG pathway 分析結果散佈圖 66 5.2.2 GSEA 分析結果 67 5.2.3 Upset plot呈現不同管線之比較結果 68 第6章 結論與未來展望 71 6.1 結論 71 6.2 未來展望 72 第7章 參考文獻 73

    [1] N. L. Barbosa-Morais et al., “The evolutionary landscape of alternative splicing in vertebrate species,” Science, vol. 338, no. 6114, pp. 1587–1593, Dec. 2012, doi: 10.1126/science.1230612.
    [2] A. Joglekar et al., “A spatially resolved brain region- and cell type-specific isoform atlas of the postnatal mouse brain,” Nat Commun, vol. 12, no. 1, p. 463, Jan. 2021, doi: 10.1038/s41467-020-20343-5.
    [3] S. H. Dam, L. R. Olsen, and K. Vitting-Seerup, “Expression and splicing mediate distinct biological signals,” BMC Biology, vol. 21, no. 1, p. 220, Oct. 2023, doi: 10.1186/s12915-023-01724-w.
    [4] D. Demircioğlu et al., “A Pan-cancer Transcriptome Analysis Reveals Pervasive Regulation through Alternative Promoters,” Cell, vol. 178, no. 6, pp. 1465-1477.e17, Sep. 2019, doi: 10.1016/j.cell.2019.08.018.
    [5] K. R. Kalari et al., “MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing,” BMC Bioinformatics, vol. 15, no. 1, p. 224, Jun. 2014, doi: 10.1186/1471-2105-15-224.
    [6] E. Afgan et al., “The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update,” Nucleic Acids Res, vol. 46, no. Web Server issue, pp. W537–W544, Jul. 2018, doi: 10.1093/nar/gky379.
    [7] S. M. E. Sahraeian et al., “Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis,” Nat Commun, vol. 8, no. 1, p. 59, Jul. 2017, doi: 10.1038/s41467-017-00050-4.
    [8] G. Teichman et al., “RNAlysis: analyze your RNA sequencing data without writing a single line of code,” BMC Biology, vol. 21, no. 1, p. 74, Apr. 2023, doi: 10.1186/s12915-023-01574-6.
    [9] A. Subramanian et al., “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles,” Proc Natl Acad Sci U S A, vol. 102, no. 43, pp. 15545–15550, Oct. 2005, doi: 10.1073/pnas.0506580102.
    [10] M. López-Lastra, A. Rivas, and M. I. Barría, “Protein synthesis in eukaryotes: The growing biological relevance of cap-independent translation initiation,” Biological Research, vol. 38, no. 2–3, pp. 121–146, 2005, doi: 10.4067/S0716-97602005000200003.
    [11] N. Sonenberg and T. E. Dever, “Eukaryotic translation initiation factors and regulators,” Current Opinion in Structural Biology, vol. 13, no. 1, pp. 56–63, Feb. 2003, doi: 10.1016/S0959-440X(03)00009-5.
    [12] A. A. Deviatkin et al., “Cap-Independent Circular mRNA Translation Efficiency,” Vaccines (Basel), vol. 11, no. 2, p. 238, Jan. 2023, doi: 10.3390/vaccines11020238.
    [13] T.-H. Yang, C.-Y. Wang, H.-C. Tsai, and C.-T. Liu, “Human IRES Atlas: an integrative platform for studying IRES-driven translational regulation in humans,” Database (Oxford), vol. 2021, p. baab025, May 2021, doi: 10.1093/database/baab025.
    [14] B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg, “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biology, vol. 10, no. 3, p. R25, Mar. 2009, doi: 10.1186/gb-2009-10-3-r25.
    [15] B. Langmead and S. L. Salzberg, “Fast gapped-read alignment with Bowtie 2,” Nature Methods, vol. 9, no. 4, pp. 357–359, Apr. 2012, doi: 10.1038/nmeth.1923.
    [16] A. Dobin et al., “STAR: ultrafast universal RNA-seq aligner,” Bioinformatics, vol. 29, no. 1, pp. 15–21, Jan. 2013, doi: 10.1093/bioinformatics/bts635.
    [17] P. Danecek et al., “Twelve years of SAMtools and BCFtools,” GigaScience, vol. 10, no. 2, p. giab008, Feb. 2021, doi: 10.1093/gigascience/giab008.
    [18] C. Trapnell et al., “Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation,” Nat Biotechnol, vol. 28, no. 5, pp. 511–515, May 2010, doi: 10.1038/nbt.1621.
    [19] B. Li and C. N. Dewey, “RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome,” BMC Bioinformatics, vol. 12, no. 1, p. 323, Aug. 2011, doi: 10.1186/1471-2105-12-323.
    [20] M. D. Robinson, D. J. McCarthy, and G. K. Smyth, “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data,” Bioinformatics, vol. 26, no. 1, pp. 139–140, Jan. 2010, doi: 10.1093/bioinformatics/btp616.
    [21] M. I. Love, W. Huber, and S. Anders, “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2,” Genome Biology, vol. 15, no. 12, p. 550, Dec. 2014, doi: 10.1186/s13059-014-0550-8.
    [22] S. Anders, P. T. Pyl, and W. Huber, “HTSeq—a Python framework to work with high-throughput sequencing data,” Bioinformatics, vol. 31, no. 2, pp. 166–169, Jan. 2015, doi: 10.1093/bioinformatics/btu638.
    [23] S. Anders and W. Huber, “Differential expression analysis for sequence count data,” Genome Biology, vol. 11, no. 10, p. R106, Oct. 2010, doi: 10.1186/gb-2010-11-10-r106.
    [24] K. Vitting-Seerup and A. Sandelin, “The Landscape of Isoform Switches in Human Cancers,” Molecular Cancer Research, vol. 15, no. 9, pp. 1206–1220, Aug. 2017, doi: 10.1158/1541-7786.MCR-16-0459.
    [25] J. Reimand, M. Kull, H. Peterson, J. Hansen, and J. Vilo, “g:Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments,” Nucleic Acids Res, vol. 35, no. Web Server issue, pp. W193–W200, Jul. 2007, doi: 10.1093/nar/gkm226.
    [26] A. Lex, N. Gehlenborg, H. Strobelt, R. Vuillemot, and H. Pfister, “UpSet: Visualization of Intersecting Sets,” IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 12, pp. 1983–1992, Feb. 2014, doi: 10.1109/TVCG.2014.2346248.
    [27] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology, vol. 215, no. 3, pp. 403–410, Oct. 1990, doi: 10.1016/S0022-2836(05)80360-2.
    [28] A. A. Gritsenko, S. Weingarten-Gabbay, S. Elias-Kirma, R. Nir, D. de Ridder, and E. Segal, “Sequence features of viral and human Internal Ribosome Entry Sites predictive of their activity,” PLoS Comput Biol, vol. 13, no. 9, p. e1005734, Sep. 2017, doi: 10.1371/journal.pcbi.1005734.

    無法下載圖示 校內:2029-08-27公開
    校外:2029-08-27公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE