Let rocq-piler do the heavy lifting for your proofs.
This project demonstrates how a Language Server Protocol (LSP) interface to the Rocq proof assistant (formerly Coq) enables neurosymbolic programming: the seamless integration of neural networks (LLMs) with symbolic reasoning systems (proof assistants).
Neurosymbolic programming combines:
- Neural/Statistical AI (LLMs): Pattern recognition, natural language understanding, heuristic search
- Symbolic AI (Proof assistants): Logical reasoning, formal verification, guaranteed correctness
This approach leverages the complementary strengths of both paradigms:
| Neural (LLM) | Symbolic (Rocq) |
|---|---|
| Intuition & pattern matching | Rigorous logical deduction |
| Generates candidate solutions | Verifies correctness |
| Probabilistic & approximate | Deterministic & exact |
| Learns from examples | Reasons from axioms |
| Fast but fallible | Slow but infallible |
Historically, interacting with proof assistants required:
- Deep knowledge of proof tactics and syntax
- Understanding of internal proof state representations
- Manual, tedious proof construction
- Limited automation capabilities
This created a barrier between AI systems (which excel at pattern matching and generation) and proof assistants (which excel at verification).
The Language Server Protocol provides a standardized, machine-friendly interface that:
- Exposes Proof State: Query current goals, hypotheses, and proof context at any position
- Enables Interactive Editing: Apply tactics and immediately observe their effects
- Supports Speculative Execution: Try tactics without committing to file edits (Pétanque API)
- Provides Structured Feedback: Parse errors, type information, and verification results
This transforms the proof assistant into a verification oracle that an LLM can query interactively.
In this implementation, the bridge is also project-aware: it can detect Coq/Rocq project roots, reopen documents under the correct workspace, and keep rocq-lsp aligned with _CoqProject, _RocqProject, or dune-project layouts.
┌─────────────┐
│ LLM │ ← Reads: proof goals, context, error messages
│ (Neural) │ → Generates: candidate tactics, proof strategies
└──────┬──────┘
│
↓ (proposes tactics)
┌──────────────┐
│ LSP Server │ ← Bridges neural and symbolic systems
│ (Interface) │ → Translates between LLM and Rocq
└──────┬───────┘
│
↓ (executes & verifies)
┌──────────────┐
│ Rocq │ ← Verifies: type-checks, proves correctness
│ (Symbolic) │ → Returns: success/failure, updated goals
└──────────────┘
Workflow:
- LLM reads proof goal via LSP (
coq_open_goals) - LLM generates candidate tactic(s) based on patterns learned from training
- Tactics are applied via LSP (
coq_insert_tactic) - Rocq verifies the tactic is type-correct and logically sound
- If successful: proof progresses (new subgoals or QED)
- If failed: error message guides LLM to try alternatives
- Loop continues until proof is complete
This architecture achieves:
Guided Generation: The LLM doesn't need to be "correct" — only creative. Rocq acts as a discriminator that filters out invalid proofs.
Incremental Verification: Each step is verified immediately, preventing cascading errors.
Explainable AI: Every LLM decision is justified by a formal proof that can be audited by humans.
Learning from Mistakes: Structured error feedback allows the LLM to refine its search strategy.
Safe Automation: The LLM can explore aggressively because Rocq guarantees soundness.
The LLM can:
- Query the current proof goal: "Prove that
forall n, n + 0 = n" - Understand the hypothesis context: "Given:
n : nat" - Generate relevant tactics: "I'll try induction on
n"
LSP Tool: coq_open_goals — returns structured goal representation
The LLM can:
- Try multiple tactics without modifying the file
- Quickly discard failures
- Only commit tactics that successfully reduce the goal
LSP Tools: coq_get_state_at_pos, coq_run_tactic, coq_goals_for_state (Pétanque API)
The LLM can:
- Incrementally build proofs tactic-by-tactic
- Respond to verification failures by adjusting strategy
- Learn which patterns work in different contexts
LSP Tool: coq_insert_tactic — applies tactic and returns updated goals
The LLM can:
- Receive structured error messages from Rocq
- Understand type mismatches, unification failures, etc.
- Adjust tactics based on specific error feedback
LSP Feature: All tools return structured error information
rocq-piler uses a content-addressed admit system: every open goal in a proof has a unique hash computed from its goal text. This turns proof construction into a clean cycle:
focus_proof → get hashes → insert_tactic(admit_hash=X) → repeat until Qed
1. focus_proof(file="Thm.v", name="deep_conj")
← 1 admit: abc123 L7: (True / True) / True
2. insert_tactic(admit_hash="abc123", tactic="split.\n- admit.\n- admit.")
← 2 admits remaining:
def456 L8: True / True (hyps: empty, goal: True / True)
ghi789 L9: True (hyps: empty, goal: True)
3. insert_tactic(admit_hash="def456", tactic="split; exact I.")
insert_tactic(admit_hash="ghi789", tactic="exact I.")
← Qed applied
Why hashes? Every admit with the same goal text shares the same hash. Close four True admits at once — across any bullet depth — in a single call. The LLM reads hashes from focus_proof, writes tactics for them, and counts down to Qed.
See examples/dep_vec.v for a full dependently-typed language (Nat + Vec n) with both preservation and progress theorems proved entirely via hash-driven MCP calls:
1. focus_proof → hash for the theorem goal
2. insert_tactic(admit_hash=X, tactic="{induction + 7 stubs}") → 7 case hashes with hyps+goal inline
3-9. insert_tactic(admit_hash=<case>, tactic="...") × 7 → Qed
Each response shows hypotheses + goal per admit, so the LLM writes each case's tactic without extra focus_proof calls.
| Tool | Description |
|---|---|
focus_proof |
One-stop shop: proof state, script, all open admits with hashes + hyps + goals |
insert_tactic |
Insert a tactic; pass admit_hash to target specific admits by hash; multi-line blocks supported |
try_step |
Speculatively execute a tactic without modifying the file |
open_goals |
Quick goal query with compact/Prev mode options |
proof_state |
Full proof context including name, statements, goal state |
undo_step |
Roll back the last N edit operations |
reset_proof |
Wipe a proof body to fresh Admitted. |
add_lemma / delete_lemma |
Insert/remove lemma stubs |
delete_step |
Remove the last tactic line from a proof |
edit_file |
Find-and-replace text edits with LSP re-sync |
check_file |
Force full document checking |
check_range |
Get diagnostics for a specific line range |
search_lemmas / inspect_term / inspect_about / locate_term / require_lib |
Knowledge & search |
snap_state / exec_tactic / state_goals |
Low-level Pétanque API |
- Content-addressed admits: every open goal gets a hash — same goal text = same hash, close all matching admits at once across any bullet depth
- Hypotheses + goal per admit:
focus_proofandinsert_tacticresponses include full hypothesis context for each open admit - Multi-line stub insertion: open a proof, insert N bulleted admits, and get N hashes back — all in one
insert_tacticcall - Re-seal with multi-goal expansion: when a tactic produces N > 1 goals, they are optionally expanded to N individually addressable bulleted admits
- Auto-Qed: proof automatically closed when all tactic admits and focused goals are eliminated; guarded against premature firing when background goals remain
- Dynamic workspace switching: automates restarting
rocq-lspunder different project roots - Speculative execution: try tactics safely via
try_stepor Pétanque state APIs before committing - Bullet-aware proof navigation:
focus_proofreports bullet stack depth, given-up counts, and sibling context - Safe undo/reset: tracks edit operations, supports multi-step undo
- Structured Interface: No need to parse Rocq syntax or output
- Immediate Feedback: Know instantly if a tactic succeeds
- Safe Exploration: Can't generate unsound proofs
- Reduced Search Space: Type system eliminates invalid candidates
- AI Assistance: Let LLMs handle tedious proof details
- Guaranteed Soundness: All proofs are formally verified
- Interactive Refinement: Guide AI when it gets stuck
- Explainable Results: Every proof step is auditable
- Benchmark Platform: Test AI proof capabilities
- Formal ML: Train models on verified code
- Safe Code Generation: Generate formally verified programs
- Hybrid Intelligence: Study human-AI-proof collaboration
LLMs can tackle routine lemmas while experts focus on complex proofs.
Generate code with machine-checked correctness proofs (e.g., crypto, compilers).
Convert natural language math into Rocq, assisted by LLMs.
Interactive tutoring systems that teach both intuition (LLM) and rigor (Rocq).
Build verifiably safe AI systems with formal guarantees.
# Install Rocq and rocq-lsp
opam install coq-lsp # rocq-lsp coming soon
# Verify installation
coq-lsp --version # or rocq-lsp --versioncd rocq-piler
npm install
npm run buildAdd to your MCP settings:
{
"mcpServers": {
"rocq-piler": {
"command": "node",
"args": [
"/path/to/rocq-piler/dist/index.js",
"--workspace-root",
"/path/to/your/rocq/project"
]
}
}
}The server can start with one workspace root and later switch automatically when a tool call targets a file inside a different Rocq project.
Then in OpenCode:
You: "Help me prove that addition is commutative in Rocq"
OpenCode: [Uses coq_open_goals to see the goal]
OpenCode: [Uses coq_insert_tactic to try "induction n"]
OpenCode: [Iteratively builds the proof using LSP feedback]
OpenCode: "Proof complete! Here's what I did..."
See examples/dep_vec.v for a complete, working example of a simple dependently-typed language:
Language: Nat + Vec n — base type Nat, dependent Vec n of length n
Theorems proved via MCP tools:
- ✅ Preservation (7 cases):
has_type t T → step t t' → has_type t' T— proved in 9 MCP calls - ✅ Progress (7 cases):
has_type t T → value t ∨ ∃ t', step t t'— proved in 9 MCP calls
Both proofs use the same pattern: focus_proof → multi-line stub → case hashes with hyps+goal → close by hash one-by-one.
A larger PCF+References benchmark is in benchmarks/complete/pcf_ref.v:
- ✅ 9 helper lemmas + preservation theorem (21 cases)
- ✅ Full heap semantics with store extension, weakening, and substitution
- ✅ All cases closed with
Qed - ✅ Total cost: 4 cents ($0.0351 actual, 50 API calls, DeepSeek V4)
| Benchmark | Theorem | Cases | Lemmas | API Calls | Cost |
|---|---|---|---|---|---|
Dep Vec (dep_vec.v) |
Preservation + Progress | 7+7 | 0 | 18 | — |
PCF + Ref (pcf_ref.v) |
Preservation | 21 | 9 | 50 | $0.04 |
A PCF language extended with mutable references (allocation, dereference, assignment), heap semantics, and store typing. The preservation theorem establishes that well-typed programs remain well-typed after a step, with the store typing possibly extended.
Auxiliary lemmas proved: extends_refl, extends_trans, nth_error_extends, store_weakening, shift_typing, subst_typing, heap_ok_extends, heap_lookup_has_type, heap_ok_update
Source: benchmarks/incomplete/pcf_ref.v (challenge) → benchmarks/complete/pcf_ref.v (solution)
examples/dep_vec.v— Dependent type system (Nat + Vec n) with preservation and progress theorems, proved hash-drivenexamples/test_issues.v— PCF + References (21-case preservation theorem)examples/example.v— Simple examples for each MCP tool
This LSP-based architecture represents a paradigm shift:
From: "AI generates code, humans verify"
To: "AI and proof assistants collaborate in real-time"
From: "LLMs are black boxes"
To: "Every LLM decision has a formal proof"
From: "Formal verification is for experts"
To: "LLMs make formal methods accessible"
As demonstrated by the proofs in this repository:
- ✅ Hash-driven proof construction:
focus_proof→ get hashes →insert_tactic(admit_hash=X)→ repeat until Qed - ✅ Multi-case induction proofs: 7 cases closed in 9 total MCP calls with full hypothesis context per case
- ✅ Multi-bullet depth: same hash works across
-/+/*bullet levels — close all matching admits in one call - ✅ Dependent type progress + preservation: Nat + Vec n language fully verified
- ✅ Interactive refinement: speculative execution via
try_stepwith immediate feedback
As LLMs improve and proof assistants expose richer APIs:
- Automated formalization of mathematical papers
- AI-assisted discovery of new theorems
- Verified AI systems with formal safety guarantees
- Natural language to formal proof translation
- Large-scale formal verification of real-world software
- Specification: See
MCP_COQ_LSP_SPEC.md - Implementation: See
docs/IMPLEMENTATION.md - Quick Start: See
docs/QUICKSTART.md - Project config detection: See
docs/PROJECT-CONFIG-DETECTION.md - Rocq Documentation: https://rocq-prover.org/
- rocq-lsp Project: https://github.com/ejgallego/coq-lsp
This is a research prototype demonstrating neurosymbolic programming principles. Contributions welcome:
- Add support for other proof assistants (Lean, Isabelle)
- Improve tactic suggestion heuristics
- Build benchmark suites for AI proof capabilities
- Develop training datasets from verified code
MIT License - See LICENSE file for details
Built with: TypeScript, rocq-lsp, Model Context Protocol
Design: Content-addressed admits, hash-driven workflow, full hypothesis context per goal