Replaying ACL2 proofs as kernel-checked Lean 4 proofs.
ACL2Lean imports theorems from ACL2 into Lean 4 with the Lean kernel as the sole trust anchor: ACL2 acts as an untrusted proof-search oracle, and every imported result is re-certified by a Lean proof object that mirrors ACL2's own proof — the actual clause tree its waterfall produced, replayed step by step. Genuine, faithful replay is the product: a proof that passes the kernel but does not mirror ACL2's reasoning does not count here.
Importing a theorem runs through a pipeline; every stage except ACL2's own proof search is this repo's code:
- ACL2 source — a
.lispfile ofdefuns anddefthms; ACL2 searches for proofs (untrusted oracle). - Instrumented ACL2 — the
acl2/submodule (branchacl2-lean-output) adds logging to ACL2's rewriter/simplifier that emits a structured proof log: runes, rewrite steps, substitutions, induction schemes, type-prescription corollaries, decision-procedure discharge nodes. Every inserted region carries aTRACE-LOG[...]tag (just check-acl2-tagsenforces the convention). - Proof-log parser (
ACL2Lean/ProofLog.lean) — log text → structured trace events. - Proof-tree reconstruction (
ACL2Lean/ClauseTree.lean,ACL2Lean/ProofTree.lean) — events → a single proof tree for the whole development: the clause tree ACL2's waterfall actually is, with per-literal rewriter detail. Unlinkable structure hard-fails. - Source translation (
ACL2Lean/WorldGen.lean,ACL2Lean/Translator.lean) — the same ACL2 source → a LeanWorld(function definitions) and the mirror-theorem statement: that the formula is eventually true in ACL2's sense —∃ N, ∀ f ≥ N, ∃ v, evalOpt f world env <formula> = some v ∧ v ≠ nil(truthiness, not exact-t; non-nilis ACL2's notion of truth). - ACL2-logic interpreter (
ACL2Lean/EvalOpt.lean,ACL2Lean/Logic.lean) — the fuel-bounded semantic model that defines what the mirror theorem means (differential-tested against real ACL2). - Proof replay (
ACL2Lean/Replay/Driver.lean,ACL2Lean/Replay/EvalLemmas.lean) — recurses the reconstructed tree and emits a Lean proof object for the mirror theorem, node by node; the Lean kernel checks it. The replay does no inference: if the tree lacks the information to replay a step, the fix is more instrumentation at the ACL2 source, never a heuristic in Lean. The sole ratified exception is decision-procedure leaves (clauses ACL2 itself closes by a verdict-only procedure), which are discharged by a kernel-checked decision procedure on the precisely-stated leaf obligation. - Native bridge (
ACL2Lean/Imported/) — the mirror theorem is decoded into a native Lean statement (e.g.(xs ++ ys).length = xs.length + ys.length), so the imported fact is usable as an ordinary Lean theorem.Imported/Lifting.leanis the lifting library: representations of Lean types in ACL2's value space (Rep, with ACL2 recognizers as the type discipline), correspondences between ACL2 functions and Lean operations (Implements), and the generic decode lemmas.Imported/NativeMirrors.leanis the catalog of native theorems proved end-to-end through the driver — its header scoreboard is the live status.
The kernel certifies the proof object for the mirror theorem exactly as stated by stages 5–6. Until a native-theorem bridge exists for a given result, a bug in stages 2–6 could yield a kernel-accepted proof of a subtly wrong statement — so each stage is validated against the real artifacts (differential testing of the interpreter, log↔tree fidelity checks, adversarial audits). Once a result is decoded to a native Lean statement, the entire ACL2 pipeline becomes untrusted for it: a bug anywhere makes the composed proof fail to typecheck; it can never certify a false native theorem.
Prerequisites. Lake + the pinned Lean toolchain
(installed automatically by elan),
just for the convenience recipes, and a
Common Lisp (SBCL is what the build uses) to build the instrumented ACL2.
Clone with the submodule (the acl2/ fork is required):
git clone --recursive https://github.com/septract/ACL2Lean.git
# or, in an existing clone: git submodule update --init --recursiveImportant — proof logs are a build input. The Lean sources embed proof
logs at compile time (include_str), and the logs (acl2_samples/**/*.proof-log)
are gitignored generated artifacts. A fresh clone therefore will not
lake build until the logs are generated. Build the instrumented ACL2 once,
then capture the logs:
# 1. Build the instrumented ACL2 (SBCL + a full ACL2 build — this is slow)
just build-acl2
# 2. Regenerate the compile-critical proof logs. This uses only the ACL2
# image, no Lean — so it avoids the build bootstrap cycle.
./scripts/capture-proof-log.sh acl2_samples/simple.lisp acl2_samples/recon-tests/*.lisp
# 3. Fetch the prebuilt mathlib cache (otherwise lake builds mathlib from
# source, which takes hours), then build.
lake exe cache get
lake buildjust capture-all-logs additionally recaptures the larger books.txt corpus
(needed for the full coverage sweep); it is not required just to build.
The toolchain is pinned in lean-toolchain; build with
Lake and
just:
lake build # type-check everything (incl. tests)
just test # unit tests
just ci # conformance gate: build + tests + driver coverage
just driver-coverage # replay the driver over the whole sample corpus
lake exe acl2lean dump-proof-tree <f> # inspect a reconstructed proof tree
lake exe acl2lean parse-proof-log <f> # parse/display a raw proof log
lake exe acl2lean gen-world <file> # generate World + theorem stubs from .lisp
lake exe acl2lean eval "<expr>" # evaluate an s-expressionIf a build fails with a missing-.proof-log error, regenerate the logs as in
Getting started above (the capture script force-invalidates the Lean modules
that embed them, so there is no silent staleness).
| Path | Contents |
|---|---|
acl2/ |
The instrumented ACL2 fork (submodule, branch acl2-lean-output) |
ACL2Lean/ |
Parser, s-expression core, interpreter, translator |
ACL2Lean/Replay/ |
The replay driver and its atomic evaluation lemmas |
ACL2Lean/Imported/ |
The lifting library and the native mirror catalog |
Tests/ |
Unit tests, driver tests, the corpus-wide coverage harness |
acl2_samples/ |
Authored corpus sources (recon-tests/ is the reconstruction suite) + captured logs; upstream books are referenced directly from the acl2/ submodule (see books.txt) |
docs/plans/ |
Design plans — 2026-06-10_generality-design.md is the governing plan |
docs/notes/ |
Investigation notes and surveys |
TODO.md |
The running backlog across all tracks |
CLAUDE.md |
The working rules (fidelity requirements, audit practices) |
This is a research prototype. The architecture is validated end-to-end — proof logs are parsed and reconstructed into the real clause tree, whole theorems (including well-founded induction with induction hypotheses, preprocess/clausify composition, and decision-procedure leaves) replay into kernel-checked Lean proof objects, and a catalog of results is decoded to native Lean statements. What remains is breadth, and the frontiers are enumerable rather than open-ended.
Concretely, on the sample corpus the driver fully replays a subset of
theorems; every theorem it cannot yet replay stops at a named frontier
(an explicit throwError, never a silent skip or a sorry) that maps to a
specific backlog item. The largest open area is induction generality
(multi-literal pushed clauses, multiple induction hypotheses, multi-variable
and non-acl2-count measures, merged/mutual schemes). Other frontiers:
previously-proved theorems used as rewrite rules, some :use/:induct hint
shapes, and a handful of decision-procedure leaves awaiting an SMT backend.
Beyond the corpus, the translator (stage 5) currently handles
defun/defthm/mutual-recursion; encapsulate, include-book,
defconst, macros, and guard verification are not yet supported, so the
larger ACL2 books in acl2_samples/ are aspirational targets, not passing
imports. The intended scope (a CORE tier targeting roughly the Milawa
fragment plus the ratified decision-procedure carve-out, with EXTENDED and
OUT tiers) is set out in docs/plans/2026-06-10_generality-design.md.
Live status lives in the repo, not here: the scoreboard at the top of
ACL2Lean/Imported/NativeMirrors.lean lists the native theorems proved
through the driver (each a build-enforced regression), the coverage table
from just driver-coverage reports per-theorem replay status over the whole
corpus, and TODO.md tracks the frontiers.
BSD 3-Clause — see LICENSE. Copyright (c) The ACL2Lean Authors.
(The acl2/ submodule is upstream ACL2 plus our instrumentation and carries
its own license — see acl2/LICENSE.)