scGSVA provides wrapper functions to perform GSVA and UCell enrichment analysis for single-cell RNA-seq data. The package includes functions to build gene set annotations for almost all species and generate publication-ready figures based on enrichment results.
- Multiple enrichment methods: GSVA, ssGSEA, and UCell scoring
- Flexible annotation building: Support for KEGG, GO, and MSigDB gene sets across 20+ species
- Batch processing: Efficient handling of large datasets with chunked computation
- Rich visualization: Violin plots, dot plots, ridge plots, heatmaps, feature plots, and spatial plots
- Statistical testing: Differential pathway analysis with limma, t-test, Wilcoxon, and ANOVA
- Spatial transcriptomics: Full support for Visium and other spatial platforms
library(devtools)
install_github("guokai8/scGSVA")
# For UCell support (optional)
BiocManager::install("UCell")set.seed(123)
library(scGSVA)
data(pbmcs)
# Build annotation
hsko <- buildAnnot(species = "human", keytype = "SYMBOL", anntype = "KEGG")
# Run enrichment analysis
res <- scgsva(pbmcs, hsko, method = "ssgsea")
# Or use UCell with custom maxRank
res <- scgsva(pbmcs, hsko, method = "UCell", maxRank = 3000)vlnPlot(res, features = "Wnt.signaling.pathway", group_by = "groups")
# With split violin: split.plot = TRUE and split.by = "condition"dotPlot(res, features = "Wnt.signaling.pathway", group_by = "groups")ridgePlot(res, features = "Wnt.signaling.pathway", group_by = "groups")featurePlot(res, features = "Wnt.signaling.pathway", reduction = "tsne")Heatmap(res, group_by = "groups")barPlot(res, features = c("Wnt.signaling.pathway", "MAPK.signaling.pathway"),
group_by = "groups")lollipopPlot(res, features = c("Wnt.signaling.pathway", "MAPK.signaling.pathway"),
group_by = "groups")# Linear model-based analysis (limma)
findPathway(res, group = "groups")
# Statistical tests (t-test, Wilcoxon, ANOVA)
sigPathway(res, group = "groups", test.use = "wilcox")# Get all genes in a pathway with expression values
genes(res, features = "Wnt.signaling.pathway")
# Get top influential genes driving pathway scores
topGenes(res, features = "Wnt.signaling.pathway", n = 10)
# With group-wise statistics
topGenes(res, features = "Wnt.signaling.pathway", n = 10, group = "groups")# Get summary statistics for pathways
summaryPathway(res, features = c("Wnt.signaling.pathway", "MAPK.signaling.pathway"),
group_by = "groups")# KEGG pathways
hsko <- buildAnnot(species = "human", keytype = "SYMBOL", anntype = "KEGG")
# GO terms
hsgo <- buildAnnot(species = "human", keytype = "SYMBOL", anntype = "GO")
# Check supported species
showData()# Hallmark gene sets
hallmark <- buildMSIGDB(species = "human", keytype = "SYMBOL",
anntype = "HALLMARK")
# KEGG pathways from MSigDB
msig_kegg <- buildMSIGDB(species = "human", keytype = "SYMBOL",
anntype = "KEGG")
# GO Biological Process
go_bp <- buildMSIGDB(species = "human", keytype = "SYMBOL",
anntype = "BP")
# Check available annotation types
msigdbinfo()Available anntype values: HALLMARK, KEGG, REACTOME, BIOCARTA, GO, BP, CC, MF, CGP, MIR, TFT
If you cannot connect to Zenodo, download the MSigDB file manually:
# 1. Download manually (use VPN or mirror):
# Human: https://zenodo.org/records/15800824/files/msigdb.2025.1.Hs.rds
# Mouse: https://zenodo.org/records/15800824/files/msigdb.2025.1.Mm.rds
# 2. Load and use:
msig_data <- readRDS("msigdb.2025.1.Hs.rds")
hallmark <- buildMSIGDB(species = "human", keytype = "SYMBOL",
anntype = "HALLMARK", msigdb_data = msig_data)library(Seurat)
library(SeuratData)
# Load spatial data
brain <- LoadData("stxBrain", type = "anterior1")
# Run enrichment
hsko <- buildAnnot(species = "human", keytype = "SYMBOL", anntype = "KEGG")
res <- scgsva(brain, hsko, assay = "Spatial")
# Visualize on tissue
spatialFeaturePlot(res, features = "Wnt.signaling.pathway")The scGSVA package uses the GSVA package for GSVA/ssGSEA analysis and optionally UCell for UCell scoring. The package is under active development.
For questions or issues, please contact guokai8@gmail.com or open an issue at https://github.com/guokai8/scGSVA/issues
- Added offline mode for
buildMSIGDB()withmsigdb_dataparameter (for users in China) - Added
topGenes()function to extract influential genes per pathway - Added
maxRankparameter for UCell method - Added
barPlot()andlollipopPlot()visualization functions - Added
summaryPathway()for pathway statistics across groups - Fixed msigdbr compatibility (gs_collection/gs_subcollection)
- Improved batch processing for large datasets
- Added comprehensive tutorial vignette