Skip to content

YigitBalik/LGP-OT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latent Gaussian Process with Optimal Transport (LGP-OT)

The official implementation of the ICML 2026 paper (poster) Modeling Temporal scRNA-seq Data with Latent Gaussian Process and Optimal Transport.

LGP-OT overview.

Abstract

Single-cell RNA sequencing provides insights into gene expression at single-cell resolution, yet inferring temporal processes from these static snapshot measurements remains a fundamental challenge. Current approaches utilizing neural differential equations and flows are sensitive to overfitting and lack careful considerations of biological variability. In this work, we propose a generative framework that models population trends using a latent heteroscedastic Gaussian process (GP) approximated by Hilbert space methods. To address the absence of genuine cell trajectories, we leverage an optimal transport (OT) objective that aligns generated and observed population distributions. Our method explicitly captures biological heterogeneity by incorporating cell-specific latent time and cell type conditioning to disentangle temporal asynchrony and trajectories to different cell types. We demonstrate state-of-the-art performance on complex interpolation and extrapolation benchmarks and introduce a novel gradient-based strategy for inferring perturbation trajectories.

Requirements

To install requirements:

Basic setup for non-GPU usage:

bash setup.sh

Change environment_basic.yaml to environment_gpu.yaml in setup.sh and run the command above for the GPU setup.

Datasets Pre-Requirements

First, create a directory to store all datasets.

mkdir ../data

Then, download the datasets from here and place them in the ../data directory. The datasets include pre-processed versions of three scRNA-seq datasets: zebrafish embryo, drosophila, and Schiebinger2019. Each dataset contains the raw and pre-processed data, which can be used for training and evaluating the LGP-OT model.

Train the model using the following command, where you can specify the dataset and split type:

python scripts/LGPOT.py --data_name <dataset_name> --split_type <split_type> --seed <random_seed>
Argument Options Mapping
data_name zebrafish
drosophila
wot
zebrafish = ZB
drosophila = DR
wot = SC (Schiebinger2019)
split_type three_interpolation
three_forecasting
two_forecasting
rremove_recovery
Easy: three_interpolation
Medium: three_forecasting (two_forecasting for ZB)
Hard: remove_recovery

Used code bases:

References:

  1. Balık, M. Y., Sinelnikov, M., Ong, P., & Lähdesmäki, H. (2025). Bayesian Basis Function Approximation for Scalable Gaussian Process Priors in Deep Generative Models. In Forty-second International Conference on Machine Learning.
  2. Zhang, J., Larschan, E., Bigness, J., & Singh, R. (2024). scNODE: generative model for temporal single cell transcriptomic data prediction. Bioinformatics, 40, ii146-ii154.

About

[ICML 2026] Implementation of the Latent Gaussian Process with Optimal Transport (LGP-OT) model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors