Skip to content

jaydu1/scVAEIT

Repository files navigation

PyPI PyPI-Downloads

Variational autoencoder for multimodal mosaic integration and transfer learning

This repository contains implementations of scVAEIT for integration and imputation of multi-modal datasets. scVAEIT (Variational autoencoder for multimodal single-cell mosaic integration and transfer learning) was originally proposed by [Du22] for single-cell genomics data. scVAEIT is a deep generative model based on a variational autoencoder (VAE) with masking strategies, which can integrate and impute multi-modal single-cell data, such as single-cell DOGMA-seq, CITE-seq, and ASAP-seq data. scVAEIT has also been extended to impute single-cell proteomic data in [Moon24], though it is also applicable to other types of data. scVAEIT is implemented in Python, and an R wrapper is also available.

For R users, reticulate can be used to call scVAEIT from R. The documentation and tutorials using both Python and R are available at scvaeit.readthedocs.io.

Check out the example folder for illustrations of how to use scVAEIT:

Example Language Notebooks
Imputation of ADT Python Badge imputation_1modality.ipynb
Imputation of RNA and ADT Python Badge imputation_2modalities.ipynb
Integration of RNA, ADT, and peaks Python Badge integration_3modalities.ipynb
Imputation of RNA R Badge imputation_scRNAseq.ipynb
Imputation of peptides R Badge imputation_peptide.ipynb

For preparing your own data to run scVAEIT, please read about:

Example Language Notebooks
Prepare input data Python Badge prepare_data_input.ipynb

Reproducibility Materials

The code for reproducing results in the paper [Du22] can be found in the folder Reproducibility materials. The large preprocessed dataset that contains DOGMA-seq, CITE-seq, and ASAP-seq data from GSE156478 can be accessed through Google Drive.

Dependencies

The package can be installed via PyPI:

pip install scVAEIT

Alternatively, the dependencies can be installed via the following commands:

mamba create --name tf python=3.9 -y
conda activate tf
mamba install -c conda-forge "tensorflow>=2.12, <2.16" "tensorflow-probability>=0.12, <0.24" pandas jupyter -y
mamba install -c conda-forge "scanpy>=1.9.2" matplotlib scikit-learn -y

If you are using conda, simply replace mamba above with conda.

The code is only tested on Linux and MacOS. If you are using Windows, installing the dependencies pip instead of conda is more convenient.

References

  • [Du22] Du, J. H., Cai, Z., & Roeder, K. (2022). Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT. Proceedings of the National Academy of Sciences, 119(49), e2214414119.
  • [Moon24] Moon, H., Du, J. H., Lei, J., & Roeder, K. (2024). Augmented Doubly Robust Post-Imputation Inference for Proteomic data. bioRxiv, 2024-03.

About

Variational autoencoder for single-cell integration and transfer learning.

Resources

License

Stars

Watchers

Forks

Packages

No packages published