ECODA: Exploratory Compositional Data Analysis for scRNA-seq Cohorts

This repository contains the code to reproduce the results and figures from the paper: "Cell type composition drives patient stratification in single-cell RNA-seq cohorts".

Overview Single-cell RNA

sequencing (scRNA-seq) enables high-resolution characterization of cellular heterogeneity, but summarizing this data for cohort-level analysis remains a challenge. We benchmarked several state-of-the-art sample representation methods—including deep generative models and factor decomposition—against a simple baseline: ECODA (Exploratory Compositional Data Analysis).

Key Findings

Performance: Centered log-ratio (CLR)-transformed cell-type proportions (ECODA) consistently match or outperform more complex methods in recovering known biological groupings in an unsupervised setting.
Efficiency: ECODA requires orders of magnitude fewer computational resources and produces embeddings in seconds.
Robustness: The approach is highly robust to technical batch effects and various cell-type annotation strategies.
Interpretability: Biological stratification is often driven by a small subset of highly variable cell types (HVCs), providing direct mechanistic insights.

Repository Contents

Pre-filtering and quality controlled for performed individually for each dataset with the respective scripts in ./QC_filtering/
Preprocess_datasets.Rmd: Standardized preprocessing and automated annotation for all cohorts used in the study.
Process_data.ipynb: This script is used to run the benchmarked methods that run in python.
MAIN_Analysis.Rmd: Core script to run the benchmark and generate paper figures.
functions.R: Underlying R functions for ECODA, CLR transformations, and separation metric calculations (ANOSIM, ARI, Modularity). The scECODA R package for scalable cohort-level analysis is available at github.com/carmonalab/scECODA. ---

Reference

If you use ECODA or this benchmark code in your research, please cite our preprint:

Cell type composition drives patient stratification in single-cell RNA-seq cohorts. Halter, C., Andreatta, M., & Carmona, S. J. (2026). bioRxiv. doi: 10.64898/2026.03.27.714811v1

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
QC_filtering		QC_filtering
aux		aux
renv		renv
.gitignore		.gitignore
Batch_effect.Rmd		Batch_effect.Rmd
Figure_workflow_schematic.Rmd		Figure_workflow_schematic.Rmd
MAIN_Analysis.Rmd		MAIN_Analysis.Rmd
Preprocess_datasets.Rmd		Preprocess_datasets.Rmd
Preprocess_gongsharma.ipynb		Preprocess_gongsharma.ipynb
Process_data.ipynb		Process_data.ipynb
Processed_dataset_metadata.R		Processed_dataset_metadata.R
README.md		README.md
functions.R		functions.R
renv.lock		renv.lock
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECODA: Exploratory Compositional Data Analysis for scRNA-seq Cohorts

Overview Single-cell RNA

Key Findings

Repository Contents

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ECODA: Exploratory Compositional Data Analysis for scRNA-seq Cohorts

Overview Single-cell RNA

Key Findings

Repository Contents

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages