lean-bench

Documentation: https://kim-em.github.io/lean-bench/

Microbenchmark library for Lean 4. Declare a benchmark and the complexity model you expect; run it; the harness reports per-call times across an adaptive parameter ladder and the ratio C = perCallNanos / complexity(n). If your model fits, C is roughly constant.

The ladder shape is auto-picked from the declared complexity: doubling for polynomial growth, linear-inside-the-cap-bracket for exponential.

Try it

From the repo root:

$ lake build && ./.lake/build/bin/fib_benchmark_example run LeanBench.Examples.Fib.goodFib

Builds the library + examples and runs a linear-Fibonacci benchmark over params 0, 1, 2, 4, 8, …, 134_217_728. Tail of the output:

       32_768  197.481 µs     ×2^9  C= 6.027
       65_536  374.301 µs     ×2^9  C= 5.711
      131_072  747.345 µs     ×2^8  C= 5.702
      262_144    1.496 ms     ×2^7  C= 5.708
      524_288    2.989 ms     ×2^6  C= 5.702
    1_048_576    5.979 ms     ×2^6  C= 5.702
    …
  verdict: consistent with declared complexity (cMin=5.702, cMax=6.863, β=-0.005)

Tombstones † on earlier rungs mark the warmup-trim region — those rows aren't included in the slope fit.

goodFib is declared as O(n); observed per-call time scales linearly across 22 doublings (n=2 through n=134M); C stabilises near 5.7 ns per iteration once the param is large enough that the small fixed per-call cost amortises. β is the log-log slope of C vs param over the trimmed tail — β ≈ 0 is what a correct complexity declaration looks like.

Use it in your project

import LeanBench

def myFib (n : Nat) : UInt64 := Id.run do
  let mut a : UInt64 := 0; let mut b : UInt64 := 1
  for _ in [0:n] do let c := a + b; a := b; b := c
  return a

setup_benchmark myFib n => n + 1
-- Or with per-benchmark overrides on `BenchmarkConfig`:
-- setup_benchmark myFib n => n + 1 where {
--   maxSecondsPerCall := 0.5
--   paramCeiling := 4096
-- }

def main (args : List String) : IO UInt32 :=
  LeanBench.Cli.dispatch args

CLI flags layer on top of declared defaults, so you can tighten a single run without recompiling:

$ lake exe my_benchmarks run myFib --max-seconds-per-call 0.25 --param-ceiling 1024

If you don't yet know the right complexity model, declare your best guess (or 1) and pass --auto-fit to see a ranked list of suggestions from a fixed catalog (1, n, n log n, n², n³, 2ⁿ) — the heuristic ranks each model by how constant C = perCallNanos / model(n) ends up across the ladder. See the deployed quickstart.

See the deployed quickstart for the full flag list.

lakefile.toml:

[[require]]
name = "lean-bench"
git = "https://github.com/kim-em/lean-bench.git"
rev = "main"

[[lean_exe]]
name = "my_benchmarks"
root = "MyBenchmarks"

Then lake exe my_benchmarks list / run NAME / compare A B / verify / profile NAME --profiler "perf record -g --". See the deployed quickstart and the Profiling page.

Fixed-problem benchmarks

For workloads with a single canonical input — "this hard problem takes ~1.2 s; the FLINT equivalent takes ~0.8 s" — there's a second registration form with no parameter sweep:

def factorXOverFTwo : Unit → Polynomial Bool := fun () => …
setup_fixed_benchmark factorXOverFTwo

(Bare def : α registrations are rejected — closed pure expressions get folded into compile-time constants. See issue #54 and the quickstart.)

The runner does one warmup call followed by repeats measured calls (default 5), reports median / min / max wall time, and hash-checks across repeats. run, compare, list, and verify all dispatch by registration kind. See the deployed quickstart for the full guide.

Status

v0.1. Linux and macOS are exercised in CI. Windows builds in CI but the test suite isn't run there yet — reports of breakage welcome. See PLAN.md for the v0.2+ roadmap, the deployed quickstart for the user guide, the Pitfalls page for Lean-specific benchmarking pitfalls (bignum Nat, forcing evaluation, sharing, warm vs. cold), the Profiling page for the external-profiler integration (perf, samply, heaptrack, /usr/bin/time -v), and the Result schema page for the JSONL result-row schema and its evolution rules.

Design

Subprocess-per-batch. The child does the timing and the auto-tuned inner-repeat loop; the parent spawns directly via IO.Process.spawn, reads JSONL stdout, and kicks off a sibling Task that kills the child via IO.Process.Child.kill if a single batch exceeds maxSecondsPerCall + killGraceMs. No external timeout(1) dependency. We use subprocesses because Lean has no portable in-process interrupt for arbitrary computation, so the only reliable way to enforce a wallclock cap is to give each measurement its own process.

See the Design page for the full architectural rationale, including limits and known caveats.

Caveats

Each measurement is a child process, so very fast operations have a per-spawn noise floor in the milliseconds. The harness measures this floor itself and prints it with every report; rows whose total wall time is below signalFloorMultiplier × floor (default 10×) are flagged [<floor] and excluded from the verdict, with an actionable advisory line at the bottom of the report. For workloads that don't fit the built-in auto / doubling / linear ladders, declare a custom one with where { paramSchedule := .custom #[…] }.

Lean's Nat is bignum, the compiler hoists pure work out of loops unless its result is observed, and reference-counted sharing can flip operations between O(1) and O(n) depending on aliasing — these and other Lean-specific traps are covered in the Pitfalls page. Read it before declaring a benchmark you're going to commit to a CI baseline.

The verdict is a thresholded log-log slope (|β| ≤ 0.15 over the trimmed tail), not a statistical test. β's sign tells you direction; read the raw ratios for magnitude.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
LeanBench		LeanBench
docs		docs
examples		examples
test		test
.gitignore		.gitignore
LeanBench.lean		LeanBench.lean
PLAN.md		PLAN.md
README.md		README.md
lake-manifest.json		lake-manifest.json
lakefile.toml		lakefile.toml
lean-toolchain		lean-toolchain

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lean-bench

Try it

Use it in your project

Fixed-problem benchmarks

Status

Design

Caveats

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

lean-bench

Try it

Use it in your project

Fixed-problem benchmarks

Status

Design

Caveats

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages