| 研究生: |
陳沛群 Chen, Pei-Chun |
|---|---|
| 論文名稱: |
單細胞 RNA 定序資料的半參考式集成學習之細胞類型標註 Semi-Reference Ensemble Learning for Cell Type Annotation in scRNA-seq Data |
| 指導教授: |
戴安順
Tai, An-Shun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 100 |
| 中文關鍵詞: | 單細胞 RNA 測序 、細胞種類標註 、集成學習 、分類 |
| 外文關鍵詞: | Single-cell RNA sequencing(scRNA-seq), Cell Type Annotation, Ensemble Learning, Classification |
| 相關次數: | 點閱:68 下載:17 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
單細胞 RNA 定序是一項革命性的技術,能夠在單一細胞層級量化基因表達,顯著推進我們對細胞多樣性與功能的理解。細胞類型標註是解析組織異質性的重要步驟,然而實驗性標註方式在處理大規模資料時不切實際,使得統計與計算方法成為 scRNA-seq 分析中不可或缺的工具。隨著定序資料量呈指數性成長,已有眾多計算方法陸續被提出,涵蓋統計建模與資料科學等領域,以期能精確進行細胞類型標註。然而,如何整合來自不同方法的預測結果,特別是在處理未知或先前未特徵化之細胞族群的標註任務時,仍是一項重大挑戰。本研究系統性地比較多種細胞類型標註方法,並提出一種創新的半參考式集成學習架構,此方法有效結合監督式(依賴參考資料)與非監督式(不依賴參考資料)之策略。我們所設計的集成策略採用加權投票機制,依據各方法在先前實驗中的表現作為權重,以提升標註準確率。所提出的半參考式集成工作流程整合多個主流方法的預測結果,能夠提供準確且具彈性的細胞類型標註,並具備偵測未知細胞類型的能力。
Single-cell RNA sequencing (scRNA-seq) is a revolutionary technology that quantifies gene expression at the resolution of individual cells, significantly advancing our understanding of cellular diversity and function. Cell type annotation is essential for understanding tissue heterogeneity, yet experimental methods are impractical for large-scale data, making computational approaches indispensable in scRNA-seq analysis. With growing sequencing data, many statistical and data science methods have emerged for cell-type annotation. However, integrating results from diverse methods remains challenging, especially for annotating unknown or novel cell types. In this study, we present a systematic comparison of various cell-type annotation methods and propose a novel semi-reference ensemble learning framework that effectively combines supervised (reference-based) and unsupervised (reference-free) approaches. Our ensemble strategy uses weighted voting, with method performance as weights, to improve annotation accuracy. The proposed semi-reference ensemble workflow integrates predictions from multiple leading methods to provide accurate and flexible cell type annotation for scRNA-seq data, including the detection of unknown cell types.
[1] Yuhan Hao and et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nature Biotechnology, 42(2):293–304, 2024.
[2] F. Wolf, P. Angerer, and F. Theis. Scanpy: large-scale single-cell gene expression data analysis. Genome Biology, 19:15, 2018.
[3] V. Y. Kiselev, K. Kirschner, M. T. Schaub, T. Andrews, A. Yiu, T. Chandra, K. N. Natarajan, W. Reik, M. Barahona, A. R. Green, and M. Hemberg. Sc3: consensus clustering of single-cell rna-seq data. Nature Methods, 14(5):483–486, 2017.
[4] J. H. Levine, E. F. Simonds, S. C. Bendall, K. L. Davis, E. D. Amir, M. D. Tadmor, O. Litvin, H. G. Fienberg, A. Jager, E. R. Zunder, R. Finck, A. L. Gedman, I. Radtke, J. R. Downing, D. Pe’er, and G. P. Nolan. Data-driven phenotypic dissection of aml reveals progenitor-like cells that correlate with prognosis. Cell, 162(1):184–197, 2015.
[5] C. Cheng, W. Chen, H. Jin, and X. Chen. A review of single-cell rna-seq annota- tion, integration, and cell–cell communication. Cells, 12(15):1970, 2023.
[6] X. Shao, J. Liao, X. Lu, R. Xue, N. Ai, and X. Fan. sccatch: Automatic annotation on cell types of clusters from single-cell rna sequencing data. iScience, 23:100882, 2020.
[7] C. Hu, T. Li, Y. Xu, X. Zhang, F. Li, J. Bai, J. Chen, W. Jiang, K. Yang, Q. Ou, and et al. Cellmarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scrna-seq data. Nucleic Acids Research, 51(D1):D870–D876, 2023.
[8] O. Franz ́en, L.-M. Gan, and J. L. M. Bjo ̈rkegren. Panglaodb: a web server for exploration of mouse and human single-cell rna sequencing data. Database, 2019:baz046, 2019.
[9] Aleksandr Ianevski, Anil K. Giri, and Tero Aittokallio. Fully-automated and ultra-fast cell-type identification using specific marker combinations from single- cell transcriptomic data. Nature Communications, 13(1):1246, 2022.
[10] Z. Zhang, D. Luo, X. Zhong, J. H. Choi, Y. Ma, S. Wang, E. Mahrt, W. Guo, E. W. Stawiski, Z. Modrusan, S. Seshagiri, P. Kapur, G. C. Hon, J. Brugarolas, and T. Wang. Scina: A semi-supervised subtyping algorithm of single cells and bulk samples. Genes, 10(7):531, 2019.
[11] A. W. Zhang, C. O’Flanagan, E. A. Chavez, J. L. P. Lim, N. Ceglia, A. McPherson, M. Wiens, P. Walters, T. Chan, B. Hewitson, D. Lai, A. Mottok, C. Sarkozy, L. Chong, T. Aoki, X. Wang, A. P. Weng, J. N. McAlpine, S. Aparicio, C. Steidl, K. R. Campbell, and S. P. Shah. Probabilistic cell-type assignment of single- cell rna-seq for tumor microenvironment profiling. Nature Methods, 16:1007–1015, 2019.
[12] W. Hou and Z. Ji. Assessing gpt-4 for cell type annotation in single-cell rna-seq analysis. Nature Methods, 21:1462–1465, 2024.
[13] X. Ji, D. Tsao, K. Bai, M. Tsao, L. Xing, and X. Zhang. Scannotate: An auto- mated cell-type annotation tool for single-cell rna-sequencing data. Bioinformatics Advances, 3:vbad030, 2023.
[14] K. Boufea, S. Seth, and N. N. Batada. scid uses discriminant analysis to identify transcriptionally equivalent cell types across single-cell rna-seq data with batch effect. iScience, 23(3):100914, 2020.
[15] R. Fu, A. E. Gillen, R. M. Sheridan, C. Tian, M. Daya, Y. Hao, J. R. Hessel- berth, and K. A. Riemondy. clustifyr: an r package for automated single-cell rna sequencing cluster classification. F1000Research, 9:223, 2020.
[16] S. Liang, W. Guan, X. Lin, and J. Liu. scmatch: a single-cell gene expression profile annotation tool using reference datasets. Bioinformatics, 34(21):3707–3714, 2018.
[17] V.Y. Kiselev, A. Yiu, and M. Hemberg. scmap: projection of single-cell rna-seq data across data sets. Nature Methods, 15:359–362, 2018.
[18] D. Aran, A.P. Looney, L. Liu, E. Wu, V. Fong, A. Hsu, and et al. Reference- based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nature Immunology, 20:163–172, 2019.
[19] J.K. De Kanter, P. Lijnzaad, T. Candelli, T. Margaritis, and F.C.P. Holstege. Chetah: a selective, hierarchical cell type identification method for single-cell rna sequencing. Nucleic Acids Research, 47:e95–e96, 2019.
[20] J. Alquicira-Hernandez, Q. Nguyen, and J. Powell. scpred: accurate supervised method for cell-type classification from single-cell rna-seq data. Genome Biology, 20:264, 2019.
[21] D. W. Krieg, R. Mitra, and R. Herwig. Moana: a robust and scalable cell type classification framework for single-cell rna-seq data. Bioinformatics, 34(18):3225– 3231, 2018.
[22] Y. Tan and P. Cahan. Singlecellnet: A computational tool to classify single cell rna-seq data across platforms and across species. Cell Systems, 9:207, 2019.
[23] C. Xu, R. Lopez, E. Mehlman, J. Regier, M. I. Jordan, and N. Yosef. Probabilis- tic harmonization and annotation of single-cell transcriptomics data with deep generative models. Molecular Systems Biology, 17(1):e9620, 2021.
[24] Y. Lieberman, L. Rokach, and T. Shay. Castle - classification of single cells by transfer learning: Harnessing the power of publicly available single cell rna se- quencing experiments to annotate new experiments. PLoS ONE, 13:e0208349, 2018.
[25] J. C. Kimmel and D. R. Kelley. Semi-supervised adversarial neural networks for single-cell classification. Genome Research, page gr.268581.120, 2021.
[26] J. Chen, H. Xu, W. Tao, Z. Chen, Y. Zhao, and J.-D.J. Han. Transformer for one stop interpretable cell type annotation. Nature Communications, 14:223, 2023.
[27] F. Yang, W. Wang, F. Wang, Y. Fang, D. Tang, J. Huang, H. Lu, and J. Yao. Scbert as a large-scale pretrained deep language model for cell type annotation of single-cell rna-seq data. Nature Machine Intelligence, 4:852–866, 2022.
[28] T. Abdelaal, L. Michielsen, D. Cats, D. Hoogduin, H. Mei, M. J. T. Reinders, and A. Mahfouz. A comparison of automatic cell identification methods for single-cell rna sequencing data. Genome Biology, 20(1):194, 2019.
[29] Q. Fu, C. Dong, Y. Liu, X. Xia, G. Liu, F. Zhong, and L. Liu. A comparison of scrna-seq annotation methods based on experimentally labeled immune cell subtype dataset. Briefings in Bioinformatics, 25(5):bbae392, 2024.
[30] Y. Lin, Y. Cao, H.J. Kim, A. Salim, T.P. Speed, D.M. Lin, P. Yang, and J.Y.H. Yang. Scclassify: Sample size estimation and multiscale classification of cells using single and multiple reference. Molecular Systems Biology, 16:e9389, 2020.
[31] Z. H. Zhou. Ensemble Methods: Foundations and Algorithms. CRC Press, Boca Raton, 2012.
[32] J. Ding, L. Zhang, Q. Wang, Y. Li, X. Liu, and et al. Systematic comparison of single-cell and single-nucleus rna-sequencing methods. Nature Biotechnology, 38(6):737–746, 2020.
[33] G. X. Y. Zheng, A. Sahalie, A. Veres, K. Nishimura, W. Yang, and et al. Massively parallel digital transcriptional profiling of single cells. Nature Communications, 8(1):14049, 2017.
[34] B. Tasic, V. Menon, L. Ng, S. Oh, H. Zeng, and et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature, 563(7729):72–78, 2018.
[35] R. D. Hodge, T. E. Bakken, J. A. Miller, K. A. Smith, E. R. Barkan, L. T. Graybuck, and et al. Conserved cell types with divergent features between human and mouse cortex. Nature, 573(7773):59–68, 2019.
[36] N. Schaum, J. Karkanias, Y. Hwang, M. Weng, L. Liao, J. Peterson, and et al. Single-cell transcriptomics of 20 mouse organs creates a tabula muris: The tabula muris consortium. Nature, 562(7727):367, 2018.
[37] L. Tian, M. Xie, Y. Zhan, S. Zhang, Z. Zhang, J. Wang, and et al. Benchmarking single cell rna-sequencing analysis pipelines using mixture control experiments. Nature Methods, 16(6):479–487, 2019.
[38] M. Baron, A. Veres, S. Linnarsson, W. L. Pang, J. Yan, D. Rojas, and et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure. Cell Systems, 3(4):346–360, 2016.
[39] M. J. Muraro, G. Dharmadhikari, G. Bae, L. Wei, M. van den Broek, and et al. A single-cell transcriptome atlas of the human pancreas. Cell Systems, 3(4):385–394, 2016.
[40] Y. Xin, J. Kim, H. Okamoto, J. Taneera, T. Yamada, and et al. Rna sequencing of single human islet cells reveals type 2 diabetes genes. Cell Metabolism, 24(4):608– 615, 2016.
[41] V. Kleshchevnikov, H. Maatz, V. Patchev, J. Vachon, R. Dries, and et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nature Biotechnology, 40(5):661–671, 2022.
[42] Y. Hao, S. Hao, S. Andersen, H. Swerdlow, C. Chen, and et al. Integrated analysis of multimodal single-cell data. Cell, 184(13):3573–3587, 2021.
[43] C. Dom ́ınguez Conde, L. Barro ́n, L. Mart ́ınez-Rivas, M. Ferna ́ndez, C. P ́erez- Cruz, and et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science, 376(6594):eabl5197, 2022.
[44] H. M. Natri, S. Wang, M. Johnson, Y. Zhan, H. Taylor, and et al. Cell type- specific and disease-associated eqtl in the human lung. bioRxiv, pages 2023–03, 2023.
[45] N. Lawlor, J. Schug, Y. Chang, Y. Chen, X. Zhang, and et al. Single-cell transcrip- tomes identify human islet cell signatures and reveal cell-type–specific expression changes in type 2 diabetes. Genome Research, 27(2):208–222, 2017.
[46] F. Paul, H. Y. Kueh, Sagar, B., S. Rojas, and et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell, 163(7):1663–1677, 2015.
[47] K. D. Hansen, D. Risso, and S. Hicks. Tenxpbmcdata: Pbmc data from 10x genomics. 2021.
[48] X. Ren, Z. Liu, W. Chen, F. Yang, J. Li, and et al. Covid-19 immune features revealed by a large-scale single-cell transcriptome atlas. Cell, 184(7):1895–1913, 2021.
[49] M. Hagemann-Jensen, C. Soneson, M. Moser, C. Goutte, J. Tost, and et al. Single- cell rna counting at allele and isoform resolution using smart-seq3. Nature Biotech- nology, 38(6):708–714, 2020.
[50] E. Stephenson, M. Linterman, K. Kos, D. Verma, M. Brown, and et al. Single- cell multi-omics analysis of the immune response in covid-19. Nature Medicine, 27(5):904–916, 2021.
[51] M. Yoshida, Y. Matsumoto, Y. Sato, K. Nagata, Y. Okamoto, and et al. Local and systemic responses to sars-cov-2 infection in children and adults. Nature, 602(7896):321–327, 2022.
[52] J. Zhao, X. Zhang, Y. Liu, C. Zhang, J. Shi, and et al. Single-cell rna sequencing reveals the heterogeneity of liver-resident immune cells in human. Cell Discovery, 6(1):22, 2020.
[53] A. Zeisel, A. B. Mun ̃oz-Manchado, S. Codeluppi, P. L ̈onnerberg, G. La Manno, A. Jur ́eus, and et al. Brain structure. cell types in the mouse cortex and hip- pocampus revealed by single-cell rna-seq. Science, 347(6226):1138–1142, 2015.
[54] W. Tao, A. N. Concepcion, L. Vian, Y. Zhang, L. Li, Z. Wu, and X. Zhou. scrna- seq cell type annotation and marker gene identification with cellmarker 2.0. Cell Systems, 12(9):919–923, 2021.
[55] A. T. L. Lun, D. J. McCarthy, and J. C. Marioni. A step-by-step workflow for low-level analysis of single-cell rna-seq data with bioconductor. F1000Research, 5:2122, 2016.