📚 Documentation: https://zaoqu-liu.github.io/recall/
recall (Calibrated Clustering with Artificial Variables) is a statistical framework designed to protect against over-clustering in single-cell RNA-sequencing (scRNA-seq) data analysis by controlling for the impact of double-dipping.
In standard scRNA-seq pipelines, unsupervised clustering is used to identify biologically distinct cell types, followed by differential expression testing between clusters. When clustering algorithms over-partition the data, downstream analyses produce inflated P-values and increased false discovery rates. recall addresses this fundamental statistical challenge through a knockoff-inspired calibration procedure.
- FDR-controlled clustering: Integrates knockoff filter methodology to control false discovery rate
- Algorithm agnostic: Compatible with Louvain, Leiden, and other clustering algorithms
- Seurat integration: Seamless integration with Seurat V4 and V5 workflows
- Cross-platform: Full support for Linux, macOS, and Windows
- Scalable: Efficiently handles large-scale scRNA-seq datasets
install.packages("recall", repos = "https://zaoqu-liu.r-universe.dev")# Install devtools if not already installed
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("Zaoqu-Liu/recall")devtools::install_github("immunogenomics/presto")library(Seurat)
library(recall)
# Load your single-cell data
# seurat_obj <- CreateSeuratObject(counts = your_counts_matrix)
# Standard Seurat preprocessing
seurat_obj <- NormalizeData(seurat_obj)
seurat_obj <- FindVariableFeatures(seurat_obj)
seurat_obj <- ScaleData(seurat_obj)
seurat_obj <- RunPCA(seurat_obj)
seurat_obj <- FindNeighbors(seurat_obj)
seurat_obj <- RunUMAP(seurat_obj, dims = 1:10)
# recall clustering (drop-in replacement for FindClusters)
seurat_obj <- FindClustersRecall(seurat_obj, resolution_start = 0.8)
# Visualize results
DimPlot(seurat_obj, group.by = "recall_clusters")The recall algorithm implements a three-stage calibration procedure:
Inspired by knockoff variables (Barber & Candès, 2015), we augment the expression matrix with synthetic "knockoff" genes that preserve the marginal distribution of real genes but are known a priori not to contribute to any biological signal. Supported generative models include:
- ZIP: Zero-Inflated Poisson (default, fast)
- NB: Negative Binomial
- ZIP-copula: ZIP with Gaussian copula for gene-gene correlations
- NB-copula: NB with Gaussian copula
Both original and knockoff features undergo identical preprocessing (normalization, scaling, PCA) and clustering, ensuring knockoffs experience the same double-dipping as real genes.
For each cluster pair, we compute the knockoff filter statistic:
Clusters are merged if no genes pass the knockoff filter at a target FDR (default: 0.05). The algorithm iteratively reduces resolution until all cluster pairs exhibit statistically significant differential expression.
| Function | Description |
|---|---|
FindClustersRecall() |
Main clustering function using knockoff calibration |
FindClustersCountsplit() |
Alternative method using count splitting |
seurat_workflow() |
Complete Seurat preprocessing pipeline |
# Use Negative Binomial with copula for better correlation modeling
seurat_obj <- FindClustersRecall(
seurat_obj,
null_method = "NB-copula",
resolution_start = 1.0,
reduction_percentage = 0.1,
cores = 4
)# Count splitting approach (Neufeld et al., 2022)
seurat_obj <- FindClustersCountsplit(
seurat_obj,
resolution_start = 0.8,
algorithm = "leiden"
)If you use recall in your research, please cite:
DenAdel, A., Ramseier, M., Navia, A., Shalek, A., Raghavan, S., Winter, P., Amini, A., & Crawford, L. (2025). A knockoff calibration method to avoid over-clustering in single-cell RNA-sequencing. American Journal of Human Genetics. https://doi.org/10.1016/j.ajhg.2025.01.001
@article{denadel2025knockoff,
title={A knockoff calibration method to avoid over-clustering in single-cell RNA-sequencing},
author={DenAdel, Alan and Ramseier, Megan and Navia, Andrew and Shalek, Alex and Raghavan, Srivatsan and Winter, Peter and Amini, Arash and Crawford, Lorin},
journal={American Journal of Human Genetics},
year={2025},
publisher={Elsevier}
}- Barber, R. F., & Candès, E. J. (2015). Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43(5), 2055-2085.
- Neufeld, A., Gao, L. L., Pober, J., & Witten, D. (2022). Inference after latent variable estimation for single-cell RNA sequencing data. Biostatistics, 24(1), 33-51.
For questions, bug reports, or feature requests, please open an issue on GitHub or contact Zaoqu Liu.
This project is licensed under the MIT License - see the LICENSE file for details.