GitHub - m0n0x41d/haft: Engineering decisions engine that know when they're stale. Frame, compare, decide — with evidence decay and parity enforcement. For Claude Code, Cursor, Gemini CLI, Codex and more.

FPF governance substrate for AI-assisted software delivery.

Your agents (Claude Code, Codex) write code fast. Most repositories are not ready for serious harness engineering: the target system is underspecified, the enabling system is implicit, term maps are missing, and runtime evidence is detached from the spec. Haft makes the project harnessable before it scales execution.

What is Haft?

Haft is a governance substrate that makes a repository harnessable for principal-led FPF engineering work. It turns problem frames, comparisons, decisions, commissions, and evidence into auditable artifacts, with enforcement at the kernel boundary.

Specify → Think → Run → Govern.

Not a coding agent. Not a documentation generator. The handle between the tool and the hand: the part that turns raw model capability into formal specification, governed decisions, bounded commissions, and evidence-backed engineering work.

Three surfaces, one artifact graph

Haft is consumed through three surfaces over one .haft/ artifact graph:

Skills + slash commands in your coding agent (Claude Code, Codex, OpenCode, Cursor) — workflow skills auto-trigger; /h-frame /h-decide /h-verify ... run manually
CLI (haft problem, haft solution, haft decision, ...) — manual access, no LLM in the loop
MCP server (haft serve) — programmatic access for any LLM agent over the Model Context Protocol

The kernel MCP server is the cross-host enforcement surface: it validates arguments server-side and returns structured errors for FPF violations (missing required fields, parity gaps, weakest-link omissions, predictions without verify_after). Skills carry the procedure; the kernel carries the gates.

What changed in v8

v8 dropped the standalone interactive agent (haft agent), the TUI, and the desktop wrappers. Haft no longer competes with general coding agents on the runtime surface — it adds governance discipline on top of whichever agent you already use. The pivot, with parity-compared variants, rollback plan, and falsifiable predictions, is recorded in .haft/decisions/dec-20260525-v8-architecture-pivot-from-standalone-agent-to-g-bbe45cb7.md.

Upgrading from v7? See MIGRATION-v8.md — the upgrade checklist plus what was dropped (haft agent, TUI, desktop, v7 helper commands).

Built on First Principles Framework

FPF by Anatoly Levenchuk — a rigorous, transdisciplinary architecture for thinking.

The skill set (h-frame, h-explore, h-compare, h-decide, h-verify, and the full catalog below) gives your agent an FPF-native operating system for engineering decisions: framing before solutions, characterization before comparison, parity enforcement, evidence with congruence penalties, weakest-link assurance, and a cycle that reopens itself when evidence ages or a measurement fails.

The framing and comparison skills auto-trigger on operator context. The binding step (h-decide, h-commission) is manual-only per the Transformer Mandate: agents frame and compare; the human principal records the binding choice.

haft fpf search (and haft_query(action="fpf") from MCP) searches the indexed FPF specification. Retrieval is hybrid: exact pattern id first, then keyword (FTS5) fused with semantic recall over baked section vectors, so a reworded "how do I think about X" finds the pattern that answers it. The vectors ship inside the binary; semantic recall degrades to keyword when the embedding sidecar is absent.

Install

curl -fsSL https://raw.githubusercontent.com/m0n0x41d/haft/main/install.sh | bash

The install URL still points at the historical quint-code path. The installed binary is haft.

Then in your project, init with your host-agent flag:

haft init            # Claude Code (default)
haft init --local    # Claude Code, repo-local commands
haft init --codex    # Codex CLI / Codex App
haft init --all      # Claude Code + Codex

Claude Code and Codex are the supported hosts. Cursor, Gemini CLI, and OpenCode have experimental config flags (--cursor, --gemini, --opencode) while their runtime and docs converge.

Cursor: after init, open Settings → MCP → find haft → enable the toggle. Cursor adds MCP servers disabled by default.

What init does per tool

The binary is the same; only the MCP config and command/skill install locations differ.

Tool	MCP config	Commands / prompts	Skills
Claude Code	`.mcp.json` (project root)	`~/.claude/commands/` (or `.claude/commands/` with `--local`)	`~/.claude/skills/` (15 skills)
Codex CLI / App	`.codex/config.toml`	`~/.codex/prompts/` (or `.codex/prompts/` with `--local`)	`~/.agents/skills/` (15 skills)

Project-scoped configs (.mcp.json, .codex/config.toml) use portable project-root paths, so they are safe to commit for shared repositories.

Existing project? Run /h-onboard after init. It builds a parseable target-system spec, enabling-system spec, term map, and spec-coverage graph — not just a codebase summary.

Check spec carriers locally:

haft spec check
haft spec check --json

haft spec check is deterministic L0/L1/L1.5 only: it parses fenced yaml spec-section blocks, checks required structural fields, validates known carrier shapes, and confirms the term-map carrier parses. It makes no L2 semantic judgment, no LLM review, and no L3 runtime claim.

How It Works

Seven MCP tools

Tool	What it does
`haft_note`	Micro-decisions — atomic facts with typed anchors, validation, auto-expiry
`haft_problem`	Frame problems, declare comparison dimensions with indicator roles
`haft_solution`	Explore variants with diversity check, compare under parity
`haft_decision`	Decision contracts: invariants, claims, evidence, baseline lifecycle
`haft_commission`	WorkCommission lifecycle for execution harnesses
`haft_refresh`	Lifecycle management for every artifact kind
`haft_query`	Search, status dashboard, code graph (callers/callees/impact/explore — each reached symbol fused with the decisions governing it), FPF spec search

Fifteen skills installed by `haft init`

Skill	Mode	What it does
h-reason	auto (umbrella)	Full FPF reasoning palette in one entry — framing, exploration, comparison, verification, notes, plus slideument patterns (Goldilocks, NQD, BLP, Scaling-Law Lens). Manual `/h-reason` always works; auto-fires on broad "let's think this through" signals where no specialized skill matches sharply.
h-frame	auto	Frame a problem with B.4.1 stabilize + problem typing + umbrella-word repair
h-diagnose	auto	Diagnose a failure with parallel hypothesis testing (one Agent subagent per hypothesis to prevent anchoring)
h-explore	auto	Generate distinct candidate variants with NQD diversity discipline (parallel direction-assigned agents)
h-compare	auto	Fair comparison with dim-wise parallel scoring + Pareto front (not a scalar winner)
h-decide	manual	Record a binding DecisionRecord with full DRR — Transformer Mandate (`disable-model-invocation`)
h-verify	auto	Baseline → measure → evidence loop with drift detection
h-status	auto	Read-only project FPF state dashboard
h-onboard	auto	First-frame ceremony for projects new to haft
h-spec-cover	auto	Spec-coverage check with blind/stale module triage
h-note	auto	Lightweight micro-decision recording
h-commission	manual	WorkCommission lifecycle — manual per Transformer Mandate (`disable-model-invocation`)
h-abduct	subroutine	Pure B.5.2 abductive four-step (frame prompt → ≥3 rivals → filters → prime)
h-boundary-unpack	subroutine	A.6.B L/A/D/E decomposition of boundary statements
h-semio-review	subroutine	X-FANOUT-AUDIT — concept-rename / spec-consistency audit

Auto-triggering skills fire when their description matches operator context. Manual-only skills (h-decide, h-commission) require explicit invocation per the Transformer Mandate — binding artifacts come from the human principal, not the agent. Subroutines (h-abduct, h-boundary-unpack, h-semio-review) are called from other skills or invoked explicitly when working a specific FPF sub-discipline.

Routing reliability is testable: haft check routing runs 40 golden prompts (current pass rate 82.5%).

Evidence workflow

Attach evidence with haft_decision(action="evidence", ...). Evidence carries formality levels (F0–F3), congruence levels (CL0–CL3), and expiry dates. Trust scores (R_eff) degrade as evidence ages; stale evidence triggers refresh. Use haft_decision(action="measure", ...) for post-implementation verification.

Harness — execution engine (beta, Codex only)

The harness implements code from DecisionRecord artifacts under a real Codex agent in an isolated workspace. It is beta, and the execution agent is Codex only — there is no Claude execution path. Single-commission haft harness run is the trustworthy operator path; drain mode and auto-apply are validated on docs-class commissions, so treat them as beta on production-code commissions.

Two entry points spawn the engine. haft run implements one decision directly:

haft run dec-20260414-001

It reads the decision's invariants, claims, and affected files from the graph, builds a prompt with full reasoning context, spawns a Codex agent with the invariants as guardrails, and takes a baseline snapshot on completion.

haft harness runs commissioned work through Open-Sleigh, with scope guards (allowed_paths / forbidden_paths), per-commission locks, and discrete revertable apply commits:

haft harness run --prepare-only      # create/reuse commissions, do not start runtime
haft harness run                     # create/reuse commissions and start Open-Sleigh
haft harness run --drain --concurrency 4   # drain the queue (apply still manual by default)
haft harness status                  # inspect active/recent runs
haft harness result wc-...           # inspect one completed run and its workspace diff
haft harness apply wc-...            # apply a completed workspace patch to this checkout

Commissions carry a delivery_policy. The default workspace_patch_manual keeps changes in the isolated workspace until you run haft harness apply. workspace_patch_auto_on_pass applies a passing run as a discrete commit; blocked_policy / failed runs wait for an operator decision.

Broad harness execution is blocked for needs_onboard projects by default. For intentional tactical out-of-spec work, pass --force-skip-specs "<reason>"; haft records the reason on the selected commissions.

Release archives bundle the Open-Sleigh BEAM runtime, so normal harness use needs no Elixir/Mix install:

~/.haft/runtimes/open-sleigh/current

The lower-level surface is the haft_commission MCP tool and the haft commission CLI (create-from-decision, create-batch, create-from-plan, list, show, requeue, cancel, claim, ...). Every commission action becomes a typed artifact transition, never a free prompt:

SpecSection(s) → DecisionRecord → WorkCommission → RuntimeRun → Evidence → SpecCoverage

Cookbook — common workflows

Record an architectural choice

operator (to Claude Code): "we need to pick a queue for the new ingestion path"
↓ h-explore auto-triggers, generates 3+ distinct variants with NQD diversity
↓ h-compare auto-triggers, scores dim-wise in parallel, surfaces the Pareto front
↓ operator picks a variant, then explicitly types:
/h-decide
↓ kernel validates required DRR fields; missing fields → structured error
↓ on pass: DRR written to .haft/decisions/, ready for `haft run`

Diagnose a failure with rival hypotheses

operator: "tests are failing on the schema migration after the deploy"
↓ h-diagnose auto-triggers, spawns 3+ parallel Agent subagents, one per hypothesis
↓ each subagent reads only what its hypothesis needs (no anchoring)
↓ results merged, ranked by the FPF B.5.2 filter chain
↓ if confirmed: /h-note records the diagnosis; if architectural: /h-frame

Verify a decision still holds

operator: "did dec-20260420-cache-redesign actually work"
↓ h-verify auto-triggers
↓ reads decision predictions + valid_until + baseline file hashes
↓ measures observable claims (test output, metric query, ...)
↓ writes evidence with CL/freshness; updates R_eff
↓ if R_eff < 0.5 → marks stale; if predictions failed → reopens the problem

Quick operator status

haft check          # CI-friendly governance verification (exit 0 clean / 1 findings)
haft check routing  # sanity-check skill routing reliability

From the host agent: /h-status for the full dashboard.

What Makes It Different

Decisions are live — computed trust scores (R_eff) degrade as evidence ages
Comparison is honest — parity enforced, constraint-aware Pareto elimination, anti-Goodhart observation indicators
Reasoning fused with code — haft_query surfaces the decisions governing a symbol while you read or traverse it, so a governed node never reads as safe-to-change
Memory across sessions — related past decisions surface during framing, similar variants during exploration
The loop closes — failed measurements reopen decisions, evidence decay triggers review, drift detection flags violations
Decisions are contracts — invariants, claims with thresholds, rollback plan, valid-until date

Roadmap

v8 — Governance Substrate Pivot (current)

Standalone interactive agent, TUI, and desktop wrappers dropped. Haft is a kernel + CLI + MCP server + 15 skills, shared across Claude Code, Codex, OpenCode, and Cursor over one .haft/ artifact graph. The kernel MCP returns structured errors as hard enforcement gates; binding artifacts stay manual-only per the Transformer Mandate. Rationale: dec-20260525-v8-architecture-pivot-....

Shipped history lives in CHANGELOG.md.

The defensible edge is the fusion of the code graph with the reasoning graph, and the liveness of that reasoning graph — not raw code-graph coverage. Active directions (idea-stage, tracked as live problems under .haft/): edit-time invariant guardrails on governed symbols, trust state surfaced at the symbol, coherence checks over the governance graph, and a run-time harness that measures whether the fused graph actually reduces reads and prevents broken decisions. None are committed to a release version.

Requirements

Go 1.25+ — building from source
Claude Code or Codex — plugin mode
Rust toolchain — only to build the embedding sidecar (haft-embed) from source; without it, FPF semantic search degrades to keyword

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1,134 Commits
.githooks		.githooks
.github/workflows		.github/workflows
assets		assets
assurance		assurance
cmd		cmd
data		data
db		db
embed-sidecar		embed-sidecar
internal		internal
logger		logger
open-sleigh		open-sleigh
scripts		scripts
spec		spec
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.golangci.yml		.golangci.yml
.goreleaser.yaml		.goreleaser.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.quintignore		.quintignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MIGRATION-v8.md		MIGRATION-v8.md
README.md		README.md
Taskfile.yaml		Taskfile.yaml
go.mod		go.mod
go.sum		go.sum
install.sh		install.sh
lefthook.yml		lefthook.yml
package-lock.json		package-lock.json
package.json		package.json
query.sql		query.sql
schema.sql		schema.sql
skills-lock.json		skills-lock.json
sqlc.yaml		sqlc.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Haft?

Three surfaces, one artifact graph

What changed in v8

Built on First Principles Framework

Install

What init does per tool

How It Works

Seven MCP tools

Fifteen skills installed by `haft init`

Evidence workflow

Harness — execution engine (beta, Codex only)

Cookbook — common workflows

Record an architectural choice

Diagnose a failure with rival hypotheses

Verify a decision still holds

Quick operator status

What Makes It Different

Roadmap

v8 — Governance Substrate Pivot (current)

Next

Requirements

License

About

Uh oh!

Releases 23

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is Haft?

Three surfaces, one artifact graph

What changed in v8

Built on First Principles Framework

Install

What init does per tool

How It Works

Seven MCP tools

Fifteen skills installed by haft init

Evidence workflow

Harness — execution engine (beta, Codex only)

Cookbook — common workflows

Record an architectural choice

Diagnose a failure with rival hypotheses

Verify a decision still holds

Quick operator status

What Makes It Different

Roadmap

v8 — Governance Substrate Pivot (current)

Next

Requirements

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 23

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Fifteen skills installed by `haft init`

Packages