1 unstable release

Uses new Rust 2024

0.1.0 Apr 18, 2026

#492 in Biology


Used in 2 crates

MIT license

160KB
3K SLoC

peprs logo

peprs - A spicy 🌶️ library for managing biological sample metadata to enable reproducible and scalable bioinformatics

Don't let sample metadata parsing bottleneck your pipelines!

About this project

peprs is a rust implementation of the PEP specification and expanded ecosystem. In short, PEP is a framework for managing biological sample metadata. PEP is a community driven effort to create a fast, reliable, reusable, and scalable library for handling biological sample metadata.

PEP and its ecosystem is developed and maintained by the Databio team. As a challenge and learning experience, we have been rewriting the core components of the PEP ecosystem in Rust for performance and reliability.

We are starting with the core PEP specification for metadata management and will expand to include the full ecosystem (looper, pephub-client, pipestat). The core PEP specification is implemented in the peprs-core crate. The Python bindings are implemented in the peprs-py crate.

📦 Modules

  • peprs-core — Core library implementing the PEP specification. With core module user can create pep objects and do all kind of manipulations.
  • peprs-eido — Schema-based validation of PEP projects against JSON schemas with eido-specific extensions (imports, tangible file checks).
  • peprs-cli — Command-line interface with inspect, validate, and convert subcommands.
  • peprs-py — Python bindings via PyO3. Exposes the Project class with full Polars/Pandas DataFrame interoperability.
  • pephub-client — Work in progress

⚙️ Installation

pip install peprs

or with uv

uv pip install peprs

Python (from source)

To build and install the Python package from source (requires maturin and Rust toolchain):

git clone https://github.com/pepkit/peprs.git
cd peprs/peprs-py
maturin develop

Rust

Add to your Cargo.toml:

[dependencies]
peprs-core = { git = "https://github.com/pepkit/peprs" }

CLI

Prebuilt binaries are published to GitHub Releases for Linux, macOS, Windows, and FreeBSD (x86_64 and aarch64).

Using ubi (cross-platform, no Rust required)

ubi auto-detects your platform, downloads the right archive, and installs peprs:

ubi --project khoroshevskyi/peprs --in ~/.local/bin

Using cargo-binstall

cargo binstall peprs-cli

Manual download

Grab the archive for your platform from the releases page, extract it, and place the peprs binary on your PATH. For example on Linux x86_64:

curl -L https://github.com/khoroshevskyi/peprs/releases/latest/download/peprs-Linux-x86_64-musl.tar.gz \
  | tar xz -C ~/.local/bin/

From source

cargo install --path peprs-cli

Using Python

pip install peprs

🐍 Quick Python example

import peprs

# Load a PEP from a YAML config file
project = peprs.Project("path/to/project_config.yaml")
# or
project = peprs.Project.from_pephub("databio/example:default")

# Inspect the project
print(project.name)
print(project.description)
print(len(project))  # number of samples

# Get samples as a Polars DataFrame
df_pl = project.to_polars()
print(df_pl)

# Get samples as a Pandas DataFrame
df_pd = project.to_pandas()
print(df_pd)

# Look up a single sample by name
sample = project.get_sample("3-1_11102016")

# Iterate over samples
for sample in project.samples:
    print(sample)

# Convert projects
project.write_csv("output.csv")
project.write_yaml("output.yaml")
project.write_json("output.json")

Benchmarks

Comparison of peppy (pure Python) vs peprs (Rust bindings). Averaged over 3 runs per sample size.

Initialization Time (seconds)

Library 5 20 100 500 1,000 5,000 10,000 50,000 100,000 600,000
peppy 0.019 0.026 0.096 0.428 0.851 4.226 8.700 44.017 87.613 297.433
peprs 0.003 0.002 0.002 0.003 0.004 0.014 0.036 0.043 0.068 0.339
speedup 7x 15x 50x 149x 196x 306x 244x 1,021x 1,288x 877x

Validation Time (seconds)

Library 5 20 100 500 1,000 5,000 10,000 50,000 100,000 600,000
peppy 0.004 0.006 0.017 0.070 0.166 0.685 1.380 6.928 14.208 84.452
peprs 0.012 0.001 0.002 0.008 0.008 0.038 0.079 0.423 0.794 4.339
speedup 0.4x 9x 10x 9x 20x 18x 17x 16x 18x 19x

🚀 Afterword

We are looking forward to integrating this project with WDL, Snakemake, and Nextflow. All contributions are welcome. Please open an issue or submit a pull request.

Dependencies

~51–73MB
~1M SLoC