span bridges a LaTeX mathematics paper and its Lean 4 formalization. It
indexes raw Lean declarations, parses the paper's standard \label{...} commands, and
uses a ledger (build/<paper>.lea, JSON) to align paper objects
and Lean declarations. The ledger records, for every object, which Lean declarations realize it and whether they are formalized.
Optional span: comments in Lean can provide hints, but they are not required:
the core model is raw Lean facts + paper objects + a curated ledger. For
dependency graphs, span derives edges deterministically from the labeled
LaTeX, the Lean source/build, and the ledger.
It is both a CLI and a Python library.
<project>/
latex/ main.tex # paper, indexed by \label{...}
lean/ *.lean # Lean source; span indexes declarations directly
build/ <paper>.lea # curated/generated ledger (+ spines)
span_notes.json # optional per-entry notes, keyed by align id
span auto-detects the project root (the nearest ancestor with latex/ and
lean/); override with --project/--paper/--lean-source/--out.
--paper and --lean-source are repeatable. With no explicit
Lean input, span indexes lean/ directly and, when build/<paper>.lea
already exists, uses that ledger as the curated alignment.
span build # reconcile -> build/main.lea (+ spines) and run checks
span init-ledger # create a starter ledger from paper ids and raw Lean declarations
span build --lean-source lean
# preferred: index raw Lean declarations directly
span build --lean-source lean --ledger build/main.lea
# validate/update using an existing curated alignment ledger
span build --full-lean-coverage
# include every unaligned raw Lean declaration as a lean half-span
span build --paper latex/main.tex --paper latex/appendix.tex
# reconcile a corpus of multiple labeled paper roots
span build --latex # also run a pdflatex compile check (C8)
span build --decls # also check named lean decls exist in source (C13)
span check [--strict] # run the verification suite; nonzero exit on errors
span check --decls # add C13: cross-check ledger decl names against the Lean source
span compare --ledger build/main.lea
# print the curated paper<->Lean alignment table
span compare --against build/main.lea # diff the entry set against a saved ledger
span stats [--json] # statistics from the paper, Lean spine, and alignment
span graph --paper-facing --render --dot build/deps_statements.dot
# paper-facing Graphviz graph: one node per paper object,
# with native edges from LaTeX + Lean + ledger
span graph --paper-facing --edge-provenance build/edges.json --dot build/g.dot
# also write a machine-readable edge provenance audit
span graph --edge-source lean --dot build/deps_lean.dot
# Lean-facing graph from elaborator dependencies
span graph --edge-source lean --lean-filter ledger --dot build/deps_aligned.dot
# Lean-facing graph restricted to ledger-owned declarations
span order # paper statement order vs a dependency-respecting order (C12 detail)
span lean | span paper # emit either spine as JSON
span vocabulary [--json] # print the controlled vocabulary: kinds, statuses, checks
span --version
The directory examples/one_theorem/ is a complete ledger-first
project aligning a human-readable proof of 1=1 with a Lean 4 formalization.
examples/one_theorem/
latex/main.tex
lean/One.lean
build/main.lea
From that directory, the normal workflow is:
span build
span check
span compare
span stats
span graph --paper-facing --dot build/one.dotTo initialize the ledger from the paper and raw Lean source:
rm build/main.lea
span init-ledgerThe paper label names the paper object:
\begin{theorem}[One equals one]\label{thm:one}
\(1=1\).
\end{theorem}The ledger supplies the Lean declaration:
"lean_decls": ["One.one_eq_one"]For corpus builds, the output ledger is build/corpus.lea unless --out points
elsewhere. The paper side is the coproduct of the input paper summands, the Lean
side is the coproduct of the input Lean summands, and the ledger records the
alignment span between them. If a ledger entry needs to point to a Lean object
whose id is duplicated across Lean summands, qualify the ref with the source path
recorded by span, the basename, or the basename stem. Ambiguous unqualified
refs are reported by C15 as ambiguity across coproduct summands.
Paper labels identify paper objects:
\begin{theorem}[One equals one]\label{thm:one}
...
\end{theorem}The ledger supplies the Lean declarations:
{
"id": "align:one",
"kind": "theorem",
"title": "One equals one",
"paper_ids": ["thm:one"],
"lean_decls": ["One.one_eq_one"]
}span check --lean-source lean --ledger build/main.lea then deterministically
checks that the named paper objects and Lean declarations still exist and remain
kind-compatible. The ledger can be created by hand, by an LLM-assisted matching
workflow, or by span init-ledger and editing it.
span graph writes Graphviz DOT. To make a PDF, run Graphviz separately:
span graph ... --dot build/graph.dot
dot -Tpdf build/graph.dot -o build/graph.pdfWith --render, statement labels are typeset with LaTeX into cached PNG files
under the output directory. This needs a local TeX installation and either
dvipng or a PDF-to-PNG tool supported by span.
The paper-facing graph is the figure-style graph:
span graph --paper-facing --render \
--paper latex/main.tex \
--ledger build/main.lea \
--dot build/deps_statements.dot \
--edge-provenance build/deps_statements_edges.json
dot -Tpdf build/deps_statements.dot -o build/deps_statements.pdfIt has one displayed node per ledger owner:
- ordinary aligned entries are owned by their first
paper_idslabel; - curated Lean half-spans are owned by their ledger entry id;
- paper half-spans are shown as orange nodes;
- Lean half-spans are shown as grey nodes.
When the project uses raw Lean source (the default), --paper-facing defaults
to --edge-source native. Native edges are derived deterministically from the
labeled LaTeX paper, the Lean source/build, and the ledger. No LLM,
external graph file, or hand-written edge list is used.
The native edge semantics are:
- solid statement dependency: Lean elaborator dependencies between
ledger-owned declarations, projected through the ledger. If a declaration
owned by paper object
Pdepends on a declaration owned byQ, the graph getsQ -> P. - dashed proof dependency: explicit LaTeX proof structure, including
references inside
proofenvironments, references feeding inline labeled proof objects, inline labels inside proofs, and simple summary sentences such as "Theorem P follows from Q and R". - orange dashed forward proof dependency: a dashed proof dependency whose prerequisite appears later in paper order. These are allowed: they record exposition order, not a statement-level circularity.
By default, native paper-facing graphs use a transitive display reduction, so
redundant cover edges are omitted from the drawn figure. The JSON written by
--edge-provenance records every raw edge, whether it was displayed,
transitively reduced, or suppressed by display policy, together with Lean
declaration pairs or LaTeX line/rule provenance.
The Lean-derived edges are statement/formal dependencies. Lean theorem proof
terms are not retained in imported .olean files, so theorem proof-body
dependencies are supplied by explicit LaTeX proof references rather than by
attempting to recover proof terms from Lean artifacts.
The display policy is deliberately separate from edge derivation. By default, span suppresses formal edges into inline definition anchors, since prose labels for definitions are usually anchors rather than separately displayed definition environments. Project-specific section suppression belongs in the ledger, not in span code:
"graph": {
"paper_facing": {
"display_policy": {
"suppress_formal_edges_from_sections": ["Formal verification"],
"suppress_formal_edges_into_inline_definition_anchors": true
}
}
}suppress_formal_edges_from_sections removes formal display edges from objects
in the named sections into objects outside the same section. This is useful for
certification or implementation-wrapper sections whose low-level formal
dependencies would otherwise clutter a paper-facing mathematical figure. The
raw suppressed edges remain visible in the --edge-provenance JSON audit.
Use --edge-reduction none to disable the transitive display reduction.
span build and span check report C12 as a warning if native Lean dependency
querying cannot run, for example in a tiny non-Lake example. span graph and
span order require that query to succeed because their primary output is the
dependency graph/order itself.
The Lean-facing graph keeps Lean declarations as graph nodes. With raw Lean
source, --edge-source lean asks Lean for elaborator dependencies among the
selected declarations:
span graph --edge-source lean --dot build/deps_lean.dot
dot -Tpdf build/deps_lean.dot -o build/deps_lean.pdfTo inspect only declarations named by the ledger, use:
span graph --edge-source lean --lean-filter ledger \
--dot build/deps_lean_aligned.dot
dot -Tpdf build/deps_lean_aligned.dot -o build/deps_lean_aligned.pdfUse --edge-reduction none to keep all derived Lean edges, or leave the
default auto behavior to transitively reduce Lean-derived display graphs.
Direct Lean mode also understands inert span: comments immediately before
declarations. These are optional hints; they are not required when a ledger
supplies the alignment.
Single-line form:
-- span: id=thm:one kind=theorem title="One equals one"
theorem one_eq_one ... := by
...Doc-comment form:
/-- span:
id=thm:one
kind=theorem
title=One equals one
statement=The distinguished object equals itself.
-/
theorem one_eq_one ... := by
...Supported keys for native workflows are id, kind, title, and
statement. The declaration itself supplies the fully-qualified Lean
declaration name. If no span: comment is present, the declaration is still
indexed under its fully-qualified name.
import span
lean = span.parse_lean_sources(["lean"])
paper = span.parse_paper("latex/main.tex")
entries = span.reconcile(lean, paper)
ledger = span.build_ledger_dict(lean, paper, entries, "main.tex", "lean")
for f in span.run_checks(lean, paper, entries):
print(f.severity, f.code, f.message)Recognized paper/Lean spine kinds are:
definition, lemma, theorem, proposition, corollary, claim, conjecture,
remark, example, notation, convention, assumption, axiom, constant
constant is normally a paper-side label kind. Run span vocabulary for the
full compatibility table, span statuses, and check families.
The span tool is, at the top level, the one span that bridges the paper side and the Lean side:
Paper <--- span ---> Lean
That global span is assembled from one small span per aligned concept. The
structure is self-similar, not a name clash: you run span (the tool) and
read spans (the entries) in the ledger (.lea) it writes.
- Each ledger entry is an alignment span: an apex (the aligned concept) with
a paper leg and a lean leg. The canonical
span_statusis the span's shape:coherent,paper_half,lean_half, orincoherent. - The checks are grouped by alignment family, shown in every report: object (well-defined and kind-preserving: C1-C4, C8, C10, C13), coherence (native dependency order: C12), coverage (paper and Lean half-spans: C6, C7), and fiber (shared formalizations and fan-out: C5).
- Fan-out is a fiber (
largest_fiberinspan stats), andbuild/compare/statsprint an alignment verdict naming each obstruction.
C1 id uniqueness · C2 dangling refs · C3 kind compatibility · C4 stated results
in labeled environments · C5 duplicate/shared formalizations · C6 lean half-span
coverage · C7 partial formalization · C8 LaTeX compile check · C10
compatible-kind notes · C12 statement order is a linear extension of native
Lean statement dependencies lifted through the ledger · C13 (opt-in, --decls)
named ledger declarations exist in the Lean source with a matching kind · C15
corpus id ambiguity/source qualification. See CHECKS.md.
C13 is the strongest machine-checkable paper<->Lean consistency check that
needs no proof-assistant run and no LLM: a heuristic source scan confirms every
named declaration exists and that a definition node is backed by a def (a
theorem node by a theorem/lemma).
Native graph generation runs Lean (lake env lean) to traverse the constants
each selected declaration's statement (type) references, through
project-internal helpers and stopping at other selected declarations. This
builds the statement/formal dependency edges used by span graph --paper-facing
and span graph --edge-source lean. Proof-only theorem dependencies are not
available from imported oleans, so paper-facing proof edges are derived from
explicit LaTeX references instead.
These automated checks cannot verify that the two statements mean the same thing -- that semantic faithfulness is the specification gap, which only a human or (heuristically) an LLM can judge.