This repository contains implementations of scVAEIT for integration and imputation of multi-modal datasets. scVAEIT (Variational autoencoder for multimodal single-cell mosaic integration and transfer learning) was originally proposed by [Du22] for single-cell genomics data. scVAEIT is a deep generative model based on a variational autoencoder (VAE) with masking strategies, which can integrate and impute multi-modal single-cell data, such as single-cell DOGMA-seq, CITE-seq, and ASAP-seq data. scVAEIT has also been extended to impute single-cell proteomic data in [Moon24], though it is also applicable to other types of data. scVAEIT is implemented in Python, and an R wrapper is also available.
For R users, reticulate can be used to call scVAEIT from R.
The documentation and tutorials using both Python and R are available at scvaeit.readthedocs.io.
Check out the example folder for illustrations of how to use scVAEIT:
| Example | Language | Notebooks |
|---|---|---|
| Imputation of ADT | imputation_1modality.ipynb |
|
| Imputation of RNA and ADT | imputation_2modalities.ipynb |
|
| Integration of RNA, ADT, and peaks | integration_3modalities.ipynb |
|
| Imputation of RNA | imputation_scRNAseq.ipynb |
|
| Imputation of peptides | imputation_peptide.ipynb |
For preparing your own data to run scVAEIT, please read about:
| Example | Language | Notebooks |
|---|---|---|
| Prepare input data | prepare_data_input.ipynb |
The code for reproducing results in the paper [Du22] can be found in the folder Reproducibility materials.
The large preprocessed dataset that contains DOGMA-seq, CITE-seq, and ASAP-seq data from GSE156478 can be accessed through Google Drive.
The package can be installed via PyPI:
pip install scVAEITAlternatively, the dependencies can be installed via the following commands:
mamba create --name tf python=3.9 -y
conda activate tf
mamba install -c conda-forge "tensorflow>=2.12, <2.16" "tensorflow-probability>=0.12, <0.24" pandas jupyter -y
mamba install -c conda-forge "scanpy>=1.9.2" matplotlib scikit-learn -yIf you are using conda, simply replace mamba above with conda.
The code is only tested on Linux and MacOS. If you are using Windows, installing the dependencies pip instead of conda is more convenient.
- [Du22] Du, J. H., Cai, Z., & Roeder, K. (2022). Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT. Proceedings of the National Academy of Sciences, 119(49), e2214414119.
- [Moon24] Moon, H., Du, J. H., Lei, J., & Roeder, K. (2024). Augmented Doubly Robust Post-Imputation Inference for Proteomic data. bioRxiv, 2024-03.