| 研究生: |
楊智婷 Yang, Jhih-Ting |
|---|---|
| 論文名稱: |
利用零膨脹負二項模型於單細胞 RNA-seq 基因差異表達之研究 Single-cell Differential Expression Analysis Based on Zero-inflated Negative Binomial Model |
| 指導教授: |
李俊毅
Li, Chung-I |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 45 |
| 中文關鍵詞: | 單細胞定序 、基因差異表達 、零膨脹負二項模型 |
| 外文關鍵詞: | single cell RNA-seq, differential gene expression analysis, zero-inflated negative binomial model |
| 相關次數: | 點閱:114 下載:9 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近幾年,由於高通量單細胞核糖核酸定序(single-cell RNA sequencing, scRNA-seq )技術的進步,揭露了細胞間的異質性。然而,scRNA-seq 資料具有大量的零計數、變異大且基因表達的分佈複雜等特性,這對於基因表達差異分析的統計方法與計算發展帶來重大影響。本研究基於零膨脹負二項模型(zero-inflated negative binomial model ),提出一統計方法來偵測表達差異的基因。此外,本研究透過統計模擬與實例分析,比較本研究所提出之方法與現有表達差異方法的表現。比較結果顯示本研究所提出之方法在 ROC 曲線(receiver operating characteristic)底下的面積 AUC (area under curve)與 F1 分數(F1 score)的表現優於其他方法。
The recent advances in high-throughput single-cell RNA sequencing (scRNA-seq) technology reveal cellular heterogeneity. However, the distinct challenges in scRNA-seq including stochastic dropout, increased variability and complex expression distribution, have implications for both statistical methodologies and computational developments for differential gene expression analysis. In this thesis, we proposed a new method, scDEseq, which employed zero-inflated negative binomial model to identify differentially expressed genes between biological conditions in scRNA-seq data. In addition, we used simulated data and real data to evaluate the performance of proposed method and compare its performance with the existing methods in terms of sensitivity, specificity, accuracy, precision, F1 score and AUC (area under curve). Results indicate that the proposed method has an overall powerful performance in terms of F1 score and AUC.
1.Anders, S. and Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11(10):R106.
2.Bacher, R. and Kendziorski, C. (2016). Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biology, 17(1):63.
3.Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: series B (Methodological), 57(1):289–300.
4.Dal Molin, A., Baruzzo, G., and Di Camillo, B. (2017). Single-cell RNA-sequencing: as- sessment of differential expression analysis methods. Frontiers in Genetics, 8:62.
5.Finak, G., McDavid, A., Yajima, M., Deng, J., Gersuk, V., Shalek, A. K., Slichter, C. K., Miller, H. W., McElrath, M. J., Prlic, M., et al. (2015). MAST: a flexible statistical frame- work for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biology, 16(1):278.
6.Greene, W. H. (1994). Accounting for excess zeros and sample selection in Poisson and negative binomial regression models. NYU Working Paper No. EC-94-10.
7.Hochberg, Y. and Tamhane, A. C. (1987). Multiple Comparison Procedures. John Wiley & Sons, Inc., New York, NY, USA.
8.Islam, S., Kjällquist, U., Moliner, A., Zajac, P., Fan, J. B., Lönnerberg, P., and Linnarsson, S. (2011). Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Research, 21(7):1160–1167.
9.Jaakkola, M. K., Seyednasrollah, F., Mehmood, A., and Elo, L. L. (2016). Comparison of methods to detect differentially expressed genes between single-cell populations. Briefings in Bioinformatics, 18(5):735–743.
10.Kharchenko, P. V., Silberstein, L., and Scadden, D. T. (2014). Bayesian approach to single- cell differential expression analysis. Nature Methods, 11(7):740.
11.Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in man- ufacturing. Technometrics, 34(1):1–14.
12.Lun, A. T., Bach, K., and Marioni, J. C. (2016). Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biology, 17(1):75.
13.Miao, Z., Deng, K., Wang, X., and Zhang, X. (2018). DEsingle for detecting three types of differential expression in single-cell RNA-seq data. Bioinformatics, 34(18):3223–3224.
14.Miao, Z. and Zhang, X. (2016). Differential expression analyses for single-cell RNA-Seq: old questions on new data. Quantitative Biology, 4(4):243–260.
15.Moliner, A., Ernfors, P., Ibáñez, C. F., and Andäng, M. (2008). Mouse embryonic stem cell-derived spheres with distinct neurogenic potentials. Stem Cells and Development, 17(2):233–243.
16.Pierson, E. and Yau, C. (2015). ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biology, 16(1):241.
17.Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010). edgeR: a Bioconductor pack- age for differential expression analysis of digital gene expression data. Bioinformatics, 26(1):139–140.
18.Robinson, M. D. and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology, 11(3):R25.
19.Vallejos, C. A., Risso, D., Scialdone, A., Dudoit, S., and Marioni, J. C. (2017). Normal- izing single-cell RNA sequencing data: challenges and opportunities. Nature Methods, 14(6):565.
20.Wagner, G. P., Kin, K., and Lynch, V. J. (2012). Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory in Biosciences, 131(4):281–285.
21.Wang, T., Li, B., Nelson, C. E., and Nabavi, S. (2019). Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinformatics, 20(1):40.
22.Wang, T. and Nabavi, S. (2017). Differential gene expression analysis in single-cell RNA sequencing data. In Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference on, pages 202–207. IEEE.