scvelo-rs

A Rust + PyO3 port of scVelo.

Rust implementation of heavy-weight bottlenecks like recover_dynamics & velocity_graph - bit-exact to the original scVelo.

Highlights

End-to-end on real workflows. The canonical scvelo Pancreas tutorial (3,696 cells, full pipeline incl. PCA, KNN, latent time) runs 11.5× faster - 15 min on stock scvelo becomes 1.3 min on scvelo-rs. The Rust recover_dynamics kernel is doing the heavy lifting (~160× isolated); the remaining workflow time is preprocessing and downstream analysis that pass through unchanged. See Benchmarks and vendor/ for the full end-to-end numbers.
Bit-exact equivalence to scVelo on 99.9% of genes - the residual drift is at f64 ULP scale (per-gene Pearson r = 1.0000 across fit_alpha, fit_beta, fit_gamma, fit_t_).
Drop-in: import scvelo_rs.patch and every downstream call to scv.tl.{recover_dynamics, velocity, velocity_graph} routes to Rust. Or import scvelo_rs as scv for the full API.
Cross-platform wheels for Linux x86_64/aarch64, macOS arm64, Windows x86_64. Single abi3-py310 wheel covers Python 3.10–3.13.
CPU-only. Runs anywhere Python runs - laptop, HPC, Docker, ARM. No CUDA. No Numba. No JIT warmup.

# Option 1 - drop-in import.
import scvelo_rs as scv

adata = scv.datasets.pancreas()
scv.pp.filter_and_normalize(adata); scv.pp.moments(adata)
scv.tl.recover_dynamics(adata)
scv.tl.velocity(adata, mode="dynamical")
scv.tl.velocity_graph(adata)
scv.pl.velocity_embedding_stream(adata, basis="umap")

# Option 2 - monkey-patch. Keep your existing `import scvelo as scv`;
# only the three patched hot paths get the Rust kernel.
import scvelo as scv
import scvelo_rs.patch  # noqa: F401   # patches scv.tl.{recover_dynamics, velocity, velocity_graph}

scv.tl.recover_dynamics(adata)         # bit-exact, no other code change needed

Installation

pip install scvelo-rs

scVelo and scanpy are runtime dependencies (used for plotting, dataset I/O, DPT/PAGA pass-through). They are pulled in automatically.

Quick start

Three usage patterns, in order of how invasive the migration is.

1. Monkey-patch (zero code changes)

Add one line at the top of your existing scVelo script:

import scvelo as scv
import scvelo_rs.patch  # noqa: F401   # swaps scv.tl.{recover_dynamics, velocity, velocity_graph}

adata = scv.datasets.pancreas()
scv.pp.filter_and_normalize(adata)
scv.pp.moments(adata)
scv.tl.recover_dynamics(adata)         # now Rust-backed
scv.tl.velocity(adata, mode="dynamical")
scv.tl.velocity_graph(adata)

Originals are preserved at scv.tl.<name>_original for A/B comparison.

2. Drop-in import

Replace import scvelo as scv with import scvelo_rs as scv. The scvelo_rs.{tl, pp, pl, datasets} namespaces expose scVelo's full public API; the hot loops route through Rust, everything else passes through scVelo unchanged.

import scvelo_rs as scv

adata = scv.datasets.pancreas()
scv.pp.filter_and_normalize(adata)
scv.pp.moments(adata)
scv.tl.recover_dynamics(adata)
scv.tl.velocity(adata, mode="dynamical")
scv.tl.velocity_graph(adata)
scv.pl.velocity_embedding_stream(adata, basis="umap")

3. Direct call

import scvelo_rs
scvelo_rs.recover_dynamics(adata)      # same signature as scv.tl.recover_dynamics

See examples/ for runnable end-to-end scripts.

Benchmarks

Measured on a developer workstation, single-threaded n_jobs=1. GitHub-hosted CI runners (2 cores) will show smaller speedups - these numbers illustrate the gap rather than serving as a hardware-neutral benchmark. The full suite lives in notebooks/02_benchmarks.py and stamps the runner's CPU/RAM into the regenerated table automatically.

Real-world end-to-end workflows

This is the bar the project is held to: published scvelo pipelines, on the real datasets and downstream tools people actually use, run end-to-end on both backends with identical input - not a microbench of one kernel. It's what a user sees when they swap import scvelo as scv for import scvelo_rs as scv on their own analysis. All five are registered in notebooks/02_benchmarks.py (category="vendor"), run on the CI cron (every other day), and live in vendor/workflows/.

workflow	cells	tissue · downstream	speedup	numerical match
Pancreas tutorial (Bastidas-Ponce 2019)	3,696	mouse pancreas · `latent_time`	11.5× (919 s → 80 s)	near-bit-exact
Differential kinetics (Pijuan-Sala E7.5)	~21k	mouse embryo · `differential_kinetic_test` + per-cluster refit	21.6× full pipeline; 593× on the test step alone	`differential_kinetic_test` bit-exact
CellRank-2 hematopoiesis (Setty 2019)	~24k	human bone marrow · CellRank `VelocityKernel` + GPCCA fate mapping	~8.9× end-to-end	near-bit-exact
PBMC 68k (Zheng 2017)	~68k	human PBMCs · `latent_time`	~5–8 h → ~10–20 min (cron)	near-bit-exact
Mouse gastrulation atlas (Pijuan-Sala 2019)	~116k	mouse embryo · atlas-scale dynamical	stock OOM/timeout → ~15–30 min	scvelo-rs-only

Why these are representative, not cherry-picked: the pancreas tutorial is the canonical scvelo intro; CellRank fate-mapping is the single most common downstream consumer of recover_dynamics; PBMC 68k (Zheng 2017, ~5,000 citations) is the standard high-volume stress dataset; and the 116k gastrulation atlas is exactly where stock scvelo is documented to OOM/time out (scvelo issues #247, #756, #405) - the bench reports those as SKIPPED and produces a scvelo-rs-only number.

Numerical equivalence. differential_kinetic_test is bit-exact vs scvelo given identical fits (fit_diff_kinetics matches 2000/2000 string-equal; per-cluster p-values agree to f64 ULP, ~1e-16). recover_dynamics / velocity are near-bit-exact: with input layers cast to float64, per-gene parameters match scvelo to a median of 0 and ≤~3e-3 relative on a small set of Nelder-Mead saddle-point outlier genes (≈1% of fitted genes); that residual propagates into velocity / fit_t in the full pipeline. fit_scaling, fit_std_u/s, and fit_likelihood are bit-exact (the latter matches to 3.6e-16 after porting scvelo's get_likelihood(weighted='upper')). See the phase log in CLAUDE.md for the per-gene breakdown.

Latest CI run (auto-updated every other day)

The tables below are regenerated every other day by the benchmarks workflow on a GitHub-hosted runner and committed here automatically - reproducible CI measurements, not hand-entered numbers. They cover the --long tier (synthetic micro-benchmarks isolating each kernel, plus the pancreas and CellRank vendor workflows); the atlas-scale extra-long benches exceed CI's 6 h job cap and are summarised in the real-world table above. A rolling per-run retrospective (last 100 runs, one compact JSON line each) is kept in notebooks/_artifacts/benchmark_history.jsonl.

Run on: AMD EPYC 9V74 80-Core Processor, 2C/4T, 16.8 GB RAM, Linux-6.17.0-1018-azure-x86_64-with-glibc2.39, Python 3.12.13 (github-actions)
Generated: 2026-06-11 08:57 UTC

Measured single-threaded with n_jobs=1. GitHub-hosted CI runners (2 cores) show smaller speedups than developer workstations - these numbers illustrate the gap rather than serving as a hardware-neutral benchmark.

13 measurements: 6 speed + 5 memory + 2 vendor (real workflows).

Speed (wall time)

benchmark	cells	genes	ops	scvelo	scvelo-rs	ratio	bit-exact
speed_recover_dynamics_5k	5,000	50	recover_dynamics	24.7 s	3.96 s	6.24×	-
speed_velocity_20k	20,000	100	recover_dynamics,velocity	195.63 s	36.02 s	5.43×	-
speed_velocity_graph_20k	20,000	100	recover_dynamics,velocity,velocity_graph	206.32 s	37.15 s	5.55×	-
speed_full_pipeline_50k (LONG)	50,000	100	recover_dynamics,velocity,velocity_graph	601.69 s	108.33 s	5.55×	✓ PASS (14/14)
speed_recover_dynamics_100k (LONG)	100,000	30	recover_dynamics	343.42 s	72.0 s	4.77×	✓ PASS (13/13)
speed_compute_dynamics_5k	5,000	50	compute_dynamics	0.07 s	0.01 s	7.0×	-

Memory (peak heap)

benchmark	cells	genes	ops	scvelo	scvelo-rs	ratio	bit-exact
mem_recover_dynamics_5k	5,000	50	recover_dynamics	79.9 MB	0.0 MB	+79.9 MB	-
mem_velocity_graph_20k	20,000	100	recover_dynamics,velocity,velocity_graph	1697.2 MB	424.1 MB	+1273.1 MB	-
mem_steady_state_layers	5,000	200	recover_dynamics,velocity,velocity_graph	239.9 MB	47.9 MB	+192.0 MB	-
mem_full_pipeline_50k (LONG)	50,000	100	recover_dynamics,velocity,velocity_graph	4679.5 MB	1212.0 MB	+3467.5 MB	-
mem_oom_crash_100k (LONG)	100,000	30	recover_dynamics,velocity,velocity_graph	6979.3 MB	1929.1 MB	+5050.2 MB	-

Vendor workflows (real-world end-to-end)

workflow	cells	genes	scvelo	scvelo-rs	speedup	bit-exact
vendor_pancreas_tutorial (LONG)	3,696	2000	413.95 s	71.56 s	5.78×	✓ PASS (15/15)
vendor_cellrank2_hematopoiesis (LONG)	24,000	2000	346.18 s	119.79 s	2.89×	✓ PASS (15/15)

Build from source

Requires Rust 1.75+ and Python 3.10+.

git clone https://github.com/ilaykav/scvelo-rs
cd scvelo-rs
python -m venv .venv && source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install -e ".[dev]"
maturin develop --release
pytest tests/unit tests/integration

The Rust crates nalgebra (SVD for PCA) and hnsw_rs (HNSW for KNN) are pure-Rust - no OpenBLAS, no vcpkg, no system C libraries. Cross-platform builds work out of the box.

Documentation

A full Sphinx site (Quick start, Installation, Migration from scVelo, Architecture, Numerical parity, Benchmarks) is in the works for v0.2. Until then, this README, the CHANGELOG, and the runnable scripts under examples/ and notebooks/ cover the same ground.

Contributing

Bug reports, PRs, and benchmark contributions welcome. See CONTRIBUTING.md - the short version is:

git clone https://github.com/ilaykav/scvelo-rs
cd scvelo-rs
pip install -e ".[dev]"
maturin develop --release
pytest tests/unit tests/integration

Bit-exact equivalence to scVelo is the contract for every Rust-backed function. PRs that move per-gene drift above 1e-9 need a documented reason.

License

Released under BSD-3-Clause. The Rust kernels are independent reimplementations of theislab's published algorithms - credit for the underlying methods belongs to La Manno et al. 2018 (RNA velocity, Nature, doi:10.1038/s41586-018-0414-6) and Bergen et al. 2020 (scVelo, Nat Biotechnol, doi:10.1038/s41587-020-0591-3).

Citing this work

scvelo-rs is a faithful port: the method is Bergen et al. 2020, the implementation is this repository. Always cite the original scVelo paper as the primary reference; cite the version of scvelo-rs you used as a software dependency (pip show scvelo-rs or scvelo_rs.__version__).

@article{bergen2020generalizing,
  title   = {Generalizing RNA velocity to transient cell states through dynamical modeling},
  author  = {Bergen, Volker and Lange, Marius and Peidli, Stefan and
             Wolf, F. Alexander and Theis, Fabian J.},
  journal = {Nature Biotechnology},
  year    = {2020},
  doi     = {10.1038/s41587-020-0591-3}
}

@software{scvelo_rs,
  title   = {scvelo-rs: a Rust acceleration of scVelo's dynamical model},
  author  = {Kavitzky, Ilay},
  year    = {2026},
  version = {0.1.0},
  url     = {https://github.com/ilaykav/scvelo-rs},
  note    = {Rust + PyO3 port of Bergen et al. 2020 (doi:10.1038/s41587-020-0591-3)}
}

Authored and maintained by Ilay Kavitzky. Contribution guidelines are in CONTRIBUTING.md.

Reporting bugs and feature requests

Open an issue at github.com/ilaykav/scvelo-rs/issues.

Bug reports - include:

scvelo-rs version (pip show scvelo-rs)
OS and Python version
A minimum reproducer (a small .h5ad slice + the calls that fail is usually enough)
What you expected vs what you got

Parity issues (a fitted parameter or velocity vector differs from the original scvelo): include both runs' values for the affected gene/cell, the relative drift, and which fixture you ran on.

Feature requests - describe the workflow you can't do today, not just the API you'd like. Atlas-scale parity reports are especially welcome.

For anything else, direct mail: ilay.kavitzky@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github		.github
examples		examples
notebooks		notebooks
python/scvelo_rs		python/scvelo_rs
src		src
tests		tests
vendor		vendor
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scvelo-rs

Highlights

Installation

Quick start

1. Monkey-patch (zero code changes)

2. Drop-in import

3. Direct call

Benchmarks

Real-world end-to-end workflows

Latest CI run (auto-updated every other day)

Speed (wall time)

Memory (peak heap)

Vendor workflows (real-world end-to-end)

Build from source

Documentation

Contributing

License

Citing this work

Reporting bugs and feature requests

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scvelo-rs

Highlights

Installation

Quick start

1. Monkey-patch (zero code changes)

2. Drop-in import

3. Direct call

Benchmarks

Real-world end-to-end workflows

Latest CI run (auto-updated every other day)

Speed (wall time)

Memory (peak heap)

Vendor workflows (real-world end-to-end)

Build from source

Documentation

Contributing

License

Citing this work

Reporting bugs and feature requests

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages