scLDM is a latent-diffusion model consisting of a novel fully transformer-based VAE architecture for exchangeable data that uses a single set of fixed-size, permutation-invariant latent variables. The model introduces a Multi-head Cross-Attention Block (MCAB) that serves dual purposes: It acts as a permutation-invariant pooling operator in the encoder, and functions as a permutation-equivariant unpooling operator in the decoder. This unified approach eliminates the need for separate architectural components for handling varying set sizes. Our latent diffusion model is trained with the flow matching loss and linear interpolants using the Scalable Interpolant Transformers formulation (SiT) (Ma et al., 2024), and a denoiser parameterized by Diffusion Transformers (DiT) (Peebles & Xie, 2023). This allows for better modeling of the complex distribution of cellular states and enables controlled generation through classifier-free guidance.
Please refer to the documentation, in particular, the API documentation.
You need to have Python 3.11 or newer installed on your system. If you don't have Python installed, we recommend installing uv.
To install the latest release of scldm from PyPI:
pip install scldm "cellarium-ml @ git+https://github.com/cellarium-ai/cellarium-ml.git"
# or
uv pip install scldm "cellarium-ml @ git+https://github.com/cellarium-ai/cellarium-ml.git"This model uses cellarium-ml. Currently,
the most recent version on PyPI (0.0.7) is not compatible with anndata>=0.10.9,
which this model uses. You must install a newer version of cellarium-ml from source:
You can install cellarium-ml separately with:
pip install "cellarium-ml @ git+https://github.com/cellarium-ai/cellarium-ml.git"
# or
uv pip install "cellarium-ml @ git+https://github.com/cellarium-ai/cellarium-ml.git"To download model checkpoints and other required artifacts:
scldm-download-artifacts
# or
uv run scldm-download-artifactsThis will automatically download all artifacts to the _artifacts subdirectory. You
can change this with the --destination flag. If you don't want to download all
files, you can specify --group datasets, --group vae_census, --group fm_observational, and/or
--group fm_perturbation to download just those artifacts.
See the changelog.
If you found a bug, please use the [issue tracker][issue-tracker].
Palla G., Babu S., Dibaeinia P., Li D., Khan A., Karaletsos T., Tomczak J.M., Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models, arXiv, 2025