The official implementation of the ICML 2026 paper (poster) Modeling Temporal scRNA-seq Data with Latent Gaussian Process and Optimal Transport.
Single-cell RNA sequencing provides insights into gene expression at single-cell resolution, yet inferring temporal processes from these static snapshot measurements remains a fundamental challenge. Current approaches utilizing neural differential equations and flows are sensitive to overfitting and lack careful considerations of biological variability. In this work, we propose a generative framework that models population trends using a latent heteroscedastic Gaussian process (GP) approximated by Hilbert space methods. To address the absence of genuine cell trajectories, we leverage an optimal transport (OT) objective that aligns generated and observed population distributions. Our method explicitly captures biological heterogeneity by incorporating cell-specific latent time and cell type conditioning to disentangle temporal asynchrony and trajectories to different cell types. We demonstrate state-of-the-art performance on complex interpolation and extrapolation benchmarks and introduce a novel gradient-based strategy for inferring perturbation trajectories.
To install requirements:
Basic setup for non-GPU usage:
bash setup.sh
Change environment_basic.yaml to environment_gpu.yaml in setup.sh and run the command above for the GPU setup.
First, create a directory to store all datasets.
mkdir ../data
Then, download the datasets from here and place them in the ../data directory. The datasets include pre-processed versions of three scRNA-seq datasets: zebrafish embryo, drosophila, and Schiebinger2019. Each dataset contains the raw and pre-processed data, which can be used for training and evaluating the LGP-OT model.
Train the model using the following command, where you can specify the dataset and split type:
python scripts/LGPOT.py --data_name <dataset_name> --split_type <split_type> --seed <random_seed>
| Argument | Options | Mapping |
|---|---|---|
| data_name | zebrafishdrosophilawot |
zebrafish = ZBdrosophila = DR wot = SC (Schiebinger2019) |
| split_type | three_interpolationthree_forecastingtwo_forecastingrremove_recovery |
Easy: three_interpolationMedium: three_forecasting (two_forecasting for ZB)Hard: remove_recovery |
Used code bases:
- DGBFGP (Balik et al., 2025): https://github.com/YigitBalik/DGBFGP
- scNODE (Zhang et al., 2024): https://github.com/rsinghlab/scNODE
References:
- Balık, M. Y., Sinelnikov, M., Ong, P., & Lähdesmäki, H. (2025). Bayesian Basis Function Approximation for Scalable Gaussian Process Priors in Deep Generative Models. In Forty-second International Conference on Machine Learning.
- Zhang, J., Larschan, E., Bigness, J., & Singh, R. (2024). scNODE: generative model for temporal single cell transcriptomic data prediction. Bioinformatics, 40, ii146-ii154.