Mission-enforced Claude agent runner. You write what "done" looks like as shell commands; swarmd keeps a Claude agent working until those commands pass — across crashes, API outages, and context resets. It doesn't let the agent quit early, game its own check, or drift off task.
Plain claude -p "do X; pytest should pass" |
swarmd |
|
|---|---|---|
| API 424 mid-run | session dies, work lost | workflow resumes from where it crashed |
| Agent "done early" | stops when Claude says so | blocks completion until criteria pass for hold_window_sec straight |
| Agent edits tests to pass | you find out later | 6-dim anti-cheat panel fires on every pass-transition |
| Agent stuck in loop | no detection | pattern detector emits a loop finding |
| Agent drifts off task | no detection | cadence-driven goal-drift + progress audits |
| Worker machine reboots | everything lost | Temporal persists state; next worker picks up |
You give swarmd two things: a mission (natural language goal) and success criteria (shell commands whose exit codes define done). Everything else — planning, code, commits, decisions — is the agent's job, same as running claude directly.
When you swarm launch mission.yaml, this happens:
- A Temporal workflow starts on the swarm worker daemon. Its job: run enforcement, not write code.
- The workflow spawns a
claudesubprocess in your workspace with the mission prose. Claude does the actual work — files, tools, subagents, whatever it picks. - In parallel, the workflow runs a verifier loop every
run_every_sec: check tamper (are locked files unmodified?) → enforce invariants (no_mock, test_count_floor, etc.) → run every criterion's shell check in parallel → update state. - Three child workflows run alongside:
- PatternDetector — tails
events.jsonl, flags loops, oscillation, scope-shrinking - LLMCritic — cadence-driven Haiku calls for progress audit + goal-drift; fires the 6-dim anti-cheat panel (scope_reduction, mock_out, tautology, hardcode, off_criterion, coordinated_edit) on every criterion pass-transition
- ResourceMonitor — zombie processes, memory pressure, disk
- PatternDetector — tails
- When every criterion passes, the workflow enters a hold window. If they stay green for
hold_window_sec, a completion judge runs six preconditions (no open cheat/fabrication/tamper findings, no critic disagreements, per-criterion anti-cheat verdictpass) before allowing the transition tocomplete. - Transient errors (HTTP 424/429/5xx, timeouts) become Temporal retries; terminal errors (400/401, auth) halt the mission with a clear reason.
What you specify: mission prose, workspace path, success criteria, optional invariants. What you don't specify: any plan, any steps, any agent behavior. Claude figures that out.
- Temporal server — external dependency (
brew install temporal). Persistent state. swarm worker— long-running daemon that polls Temporal and executes workflows + activities. Run one or more; more workers = more missions in parallel. Restart at will — state survives.- Per mission at runtime: 1 parent workflow + 3 child workflows + 1
claudesubprocess + up to 6 parallel anti-cheat activities on each pass-transition.
pip install swarmdRequires Python 3.10+, Temporal (brew install temporal), and claude on PATH.
# mission.yaml
mission: "Add full test coverage to auth.py"
workspace: "/abs/path/to/your/project"
success_criteria:
- id: tests_pass
check: "pytest auth/ -q"
timeout_sec: 120
- id: coverage_floor
check: "coverage report --include=auth.py --fail-under=90"
timeout_sec: 30
- id: no_mocks # anti-cheat floor
check: "! grep -rE 'unittest.mock|MagicMock' auth/"
timeout_sec: 10
verification:
run_every_sec: 30
hold_window_sec: 60temporal server start-dev &
swarm worker &
swarm launch mission.yaml # → workflow_id=mission-abc123
swarm status mission-abc123
swarm findings mission-abc123 --tail 50
swarm abort mission-abc123 --reason "criteria were wrong"- Design spec — full architecture
- Mission schema — every field
- Examples — reference missions