1 unstable release
Uses new Rust 2024
| 0.1.0 | Apr 18, 2026 |
|---|
#492 in Biology
Used in 2 crates
160KB
3K
SLoC
peprs - A spicy 🌶️ library for managing biological sample metadata to enable reproducible and scalable bioinformatics
Don't let sample metadata parsing bottleneck your pipelines!
About this project
peprs is a rust implementation of the PEP specification and expanded ecosystem. In short, PEP is a framework for managing biological sample metadata. PEP is a community driven effort to create a fast, reliable, reusable, and scalable library for handling biological sample metadata.
PEP and its ecosystem is developed and maintained by the Databio team. As a challenge and learning experience, we have been rewriting the core components of the PEP ecosystem in Rust for performance and reliability.
We are starting with the core PEP specification for metadata management and will expand to include the full ecosystem (looper, pephub-client, pipestat). The core PEP specification is implemented in the peprs-core crate. The Python bindings are implemented in the peprs-py crate.
📦 Modules
- peprs-core — Core library implementing the PEP specification. With core module user can create pep objects and do all kind of manipulations.
- peprs-eido — Schema-based validation of PEP projects against JSON schemas with eido-specific extensions (imports, tangible file checks).
- peprs-cli — Command-line interface with
inspect,validate, andconvertsubcommands. - peprs-py — Python bindings via PyO3. Exposes the
Projectclass with full Polars/Pandas DataFrame interoperability. - pephub-client — Work in progress
⚙️ Installation
Python (recommended)
pip install peprs
or with uv
uv pip install peprs
Python (from source)
To build and install the Python package from source (requires maturin and Rust toolchain):
git clone https://github.com/pepkit/peprs.git
cd peprs/peprs-py
maturin develop
Rust
Add to your Cargo.toml:
[dependencies]
peprs-core = { git = "https://github.com/pepkit/peprs" }
CLI
Prebuilt binaries are published to GitHub Releases for Linux, macOS, Windows, and FreeBSD (x86_64 and aarch64).
Using ubi (cross-platform, no Rust required)
ubi auto-detects your platform, downloads the right archive, and installs peprs:
ubi --project khoroshevskyi/peprs --in ~/.local/bin
Using cargo-binstall
cargo binstall peprs-cli
Manual download
Grab the archive for your platform from the releases page, extract it, and place the peprs binary on your PATH. For example on Linux x86_64:
curl -L https://github.com/khoroshevskyi/peprs/releases/latest/download/peprs-Linux-x86_64-musl.tar.gz \
| tar xz -C ~/.local/bin/
From source
cargo install --path peprs-cli
Using Python
pip install peprs
🐍 Quick Python example
import peprs
# Load a PEP from a YAML config file
project = peprs.Project("path/to/project_config.yaml")
# or
project = peprs.Project.from_pephub("databio/example:default")
# Inspect the project
print(project.name)
print(project.description)
print(len(project)) # number of samples
# Get samples as a Polars DataFrame
df_pl = project.to_polars()
print(df_pl)
# Get samples as a Pandas DataFrame
df_pd = project.to_pandas()
print(df_pd)
# Look up a single sample by name
sample = project.get_sample("3-1_11102016")
# Iterate over samples
for sample in project.samples:
print(sample)
# Convert projects
project.write_csv("output.csv")
project.write_yaml("output.yaml")
project.write_json("output.json")
Benchmarks
Comparison of peppy (pure Python) vs peprs (Rust bindings). Averaged over 3 runs per sample size.
Initialization Time (seconds)
| Library | 5 | 20 | 100 | 500 | 1,000 | 5,000 | 10,000 | 50,000 | 100,000 | 600,000 |
|---|---|---|---|---|---|---|---|---|---|---|
| peppy | 0.019 | 0.026 | 0.096 | 0.428 | 0.851 | 4.226 | 8.700 | 44.017 | 87.613 | 297.433 |
| peprs | 0.003 | 0.002 | 0.002 | 0.003 | 0.004 | 0.014 | 0.036 | 0.043 | 0.068 | 0.339 |
| speedup | 7x | 15x | 50x | 149x | 196x | 306x | 244x | 1,021x | 1,288x | 877x |
Validation Time (seconds)
| Library | 5 | 20 | 100 | 500 | 1,000 | 5,000 | 10,000 | 50,000 | 100,000 | 600,000 |
|---|---|---|---|---|---|---|---|---|---|---|
| peppy | 0.004 | 0.006 | 0.017 | 0.070 | 0.166 | 0.685 | 1.380 | 6.928 | 14.208 | 84.452 |
| peprs | 0.012 | 0.001 | 0.002 | 0.008 | 0.008 | 0.038 | 0.079 | 0.423 | 0.794 | 4.339 |
| speedup | 0.4x | 9x | 10x | 9x | 20x | 18x | 17x | 16x | 18x | 19x |
🚀 Afterword
We are looking forward to integrating this project with WDL, Snakemake, and Nextflow. All contributions are welcome. Please open an issue or submit a pull request.
Dependencies
~51–73MB
~1M SLoC