#benchmark #tree #fastrace #call #ns #hierarchical #bencher #μs #sub-processes

ssb

Simple benchmarking for Rust, with hierarchical call tree, based on fastrace

2 releases

Uses new Rust 2024

0.1.1 Mar 12, 2026
0.1.0 Mar 12, 2026

#503 in Machine learning

MIT/Apache

46KB
857 lines

ssb - Simple stupid benchmarking

Instrumented benchmarking for Rust, powered by fastrace.

One run, full breakdown — see how every phase of your code performs, not just the total.

Benchmark: bench_pipeline  (43000 iterations)
────────────────────────────────────────────────────────────────────
 span                           min       median        p95        max
────────────────────────────────────────────────────────────────────
 bench_pipeline              44.80 µs    45.12 µs    48.90 µs   102.3 µs
 ├── parse                   14.13 µs    14.27 µs    15.10 µs    32.1 µs
 ├── process                  2.15 µs     2.18 µs     2.45 µs     8.2 µs
 │   └── subprocess           1.79 µs     1.82 µs     2.01 µs     6.1 µs
 └── serialize                70.0 ns     80.0 ns    120.0 ns   450.0 ns
────────────────────────────────────────────────────────────────────

Why ssb?

Other libraries measure one function at a time. ssb measures the whole call tree — just add #[fastrace::trace] or Span::enter_with_local_parent() (if you haven't used it before). As an aditional bonus, your product have not only benches, but can use tracing in production as well.

Quick start

[dev-dependencies]
bencher = { git = "..." }
fastrace = { version = "0.7", features = ["enable"] }

[[bench]]
name = "my_bench"
harness = false
use ssb::Bench;
use fastrace::Span;

fn parse(data: &[u8]) -> Vec<u32> {
    let _s = Span::enter_with_local_parent("parse");
    data.chunks(4).map(|c| u32::from_le_bytes(c.try_into().unwrap())).collect()
}

#[fastrace::trace]
fn process(items: Vec<u32>) -> u64 {
    items.iter().map(|&x| x as u64 * 31).sum()
}

fn pipeline(data: &[u8]) -> u64 {
    process(parse(data))
}

fn bench_pipeline() {
    let data = vec![0u8; 1024];
    pipeline(&data);
}

ssb::bench_main!(bench_pipeline);

Run with cargo bench. First run prints stats; subsequent runs compare against the saved baseline automatically.

Grouped benchmarks

fn bench_parsers(bench: &mut Bench) {
    let data = vec![0u8; 1024];
    let mut bench = bench.group("parsing");

    bench.name("chunks").run(|| parse_chunks(&data));
    bench.name("manual").run(|| parse_manual(&data));
}

All benchmarks in a group appear in a single table:

Comparison Group: parsing
────────────────────────────────────────────────────────
 span            baseline      current  change (median)
────────────────────────────────────────────────────────
 chunks          29.36 µs     29.44 µs  +0.27% unchanged
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
 manual          19.90 µs     19.82 µs  -0.40% unchanged
────────────────────────────────────────────────────────

Stupid analysis

The analysis not fullfill everybody needs, it might be not resistant to OS scheduling spikes for small benchmarking functions, but it accurate enough to benchmark midle sized functions and mark some results for future comparsion.

License

MIT + Apache-2.0

Dependencies

~1.6–2.9MB
~54K SLoC