A PM agent orchestrates specialized agents that implement, review, test, and ship. Roles are mechanically enforced, memory lives in plain files, and state survives context compaction. No service, no vector DB, no infra to host.
- One giant Claude session drifts. It forgets decisions after a compaction, loses the thread on long runs, and edits whatever file it feels like.
- "Agent framework" tools (AutoGen, CrewAI, AutoGPT) want you to write Python orchestration, stand up a vector DB, and babysit infra — just to coordinate a few roles.
- Role boundaries that live only in prompts get ignored the moment the model is confident enough to "just fix it."
Forge fixes this inside Claude Code itself — with hooks that actually enforce the rules, flat files instead of services, and a pipeline that scales to the task.
| Forge | Raw Claude Code | AutoGen / CrewAI / AutoGPT | |
|---|---|---|---|
| Runs inside Claude Code (no runtime) | ✅ | ✅ (1 session) | ❌ (Python host) |
| Roles enforced by hooks, not prompts | ✅ | ❌ | |
| Memory with no service (flat files) | ✅ | ❌ (vector DB) | |
| Survives context compaction | ✅ | ❌ | |
| Pipeline scales to task (L1–L4) | ✅ | ❌ | |
| Zero infra / DB / daemon | ✅ | ✅ | ❌ |
git clone https://github.com/ajadi/forge.git
# install into your project (pick a preset)
bash forge/install.sh /path/to/your/project --preset solo # 13 core agents
bash forge/install.sh /path/to/your/project --preset full # 37 agents
# then open the project in Claude Code and run:
/f-startPrefer it everywhere? bash forge/install-global.sh installs the agents/commands/skills
into ~/.claude/, then /f-setup-project wires any folder.
You give the PM agent a task. It assesses complexity and routes it through the matching pipeline — no full ceremony for a typo, full rigor for a feature.
| Level | What | Pipeline |
|---|---|---|
| L1 | One-line fix | Developer → RealityCheck → commit |
| L2 | Small change | Developer → CodeReview → RealityCheck → commit |
| L3 | Feature | Architect → Developer → CodeReview + Security → UnitTest → RealityCheck → commit |
| L4 | Major feature | Full pipeline + design panel (Consilium) |
flowchart TD
U([You: a task]) --> O[Orchestrator · Claude Code]
O --> PM{{PM agent — routes by complexity L1–L4}}
PM --> DEV[Developer · implements]
PM --> CR[Code Reviewer]
PM --> SEC[Security Analyst]
PM --> RC[Reality Checker · final gate]
PM --> ARCH[Architect · L3–L4 design]
DEV --> MEM[(memory/*.md · flat-file memory)]
CR --> MEM
SEC --> MEM
RC ==> CM([✓ commit])
MEM -. grep before guessing .-> PM
Four things keep it from drifting, leaking, or running up cost:
- Mechanically enforced roles — a
Write|Editguard hook stops the PM and read-only agents from editing source. Boundaries are enforced, not just documented. - File-based memory — project knowledge lives in
memory/*.md; agents grep before they guess, append after tasks, mark superseded facts~~strikethrough~~. No service, no embeddings, nothing to install. - Token economy — large non-source reads are delegated to a cheap model; noisy shell output is trimmed before it reaches context. The reasoning model stays on code.
- Long-session durability — state is dumped before compaction and rehydrated exactly once after, so long runs don't lose the thread.
Forge is v2.4, MIT-licensed, and used in real multi-agent builds.
- ✅ Complexity-routed pipeline (L1–L4) with 13 core + 24 extension agents
- ✅ Hook-enforced role boundaries, commit/push guards, production-safety stop
- ✅ Flat-file memory protocol + repo-access modes (private/shared/public)
- ✅ Long-session durability (pre-compact dump + one-shot rehydrate), token economy
- ✅ Autopilot — run a backlog end-to-end, halt only on questions/regressions/deploys
- 📋 Roadmap — richer memory via Ember (see below)
Forge's built-in memory is flat Markdown files — zero-config and perfect for most projects. For semantic recall across sessions and machines, pair it with Ember — a local-first MCP memory server (SQLite + on-device embeddings + tiered retrieval). Forge ships an optional capture adapter that feeds pipeline events (PM steps, reality-checks, decisions, retrospectives) straight into Ember. Use either alone, or both together.
Hooks — what runs when (all fail open)
| Hook | Event | What it does |
|---|---|---|
session-start · detect-gaps |
SessionStart | Load context; reset turn counter; warn on missing files |
contract-reminder |
UserPromptSubmit | Re-inject operating contract + active task objective/done-criteria |
rehydrate |
UserPromptSubmit | After compaction, one-shot re-inject of critical state, then silent |
turn-counter |
UserPromptSubmit | Session depth meter; soft-warn to checkpoint (default 40 turns) |
validate-commit · validate-push |
PreToolUse(Bash) | Block --no-verify, force-push, staged .env, framework leaks |
bash-filter |
PreToolUse(Bash) | Trim noisy commands to lean forms (token saver) |
coworker-read-gate |
PreToolUse(Read) | Delegate large non-source reads to a cheap model; source exempt |
role-write-guard |
PreToolUse(Write|Edit) | Enforce AGENTS.md boundaries — PM/read-only roles can't edit source |
check-blockers |
PostToolUse(Task) | Detect open questions after an agent runs |
pre-compact |
PreCompact | Dump full state to a durable snapshot + set rehydrate marker |
stop-check · session-stop |
Stop | Block stopping mid-pipeline or with unrecorded source changes |
Tuning knobs: COWORKER_READ_GATE, ROLE_WRITE_GUARD, FORGE_BASH_FILTER, FORGE_DEPTH_SOFT, …
Agents & extensions
Core (13, always installed): pm · developer · code-reviewer · reality-checker · architect · business-analyst · decomposer · handoff-validator · unit-tester · database-architect · rapid-prototyper · context-summarizer · status
Extensions (install per need):
| Extension | Agents | Best for |
|---|---|---|
| ext-security | security-analyst, dependency-auditor | APIs, auth, web apps |
| ext-frontend | smoke/e2e-tester, ui-designer, ux-interviewer, a11y | UI/UX |
| ext-devops | devops, env-manager, git-workflow, migration-validator | CI/CD, deploys |
| ext-quality | performance-profiler, refactoring, test-reviewer, integration-tester | Large codebases |
| ext-docs | documentation, changelog-agent | OSS / teams |
| ext-planning | estimator, consilium | L3–L4 planning |
| ext-reflection | reflect, dream, optimizer, onboarding, retro, platform-sync | Long-running projects |
Key commands
All user-invocable commands use the f- namespace to avoid collision with built-ins.
| Command | What it does |
|---|---|
/f-start |
Guided onboarding for a new project |
/f-fix TASK-XXX |
Quick fix — run PM on a specific task |
/f-decompose |
Break a feature into parallelizable tasks |
/f-autopilot |
Run the backlog end-to-end; halt only on questions/regressions/deploys |
/f-status · /f-next-task |
Project state / what to do next |
/f-audit |
Adaptive multi-agent project audit swarm |
/f-hotfix |
Emergency fix bypassing the normal pipeline |
/f-spike |
Technical spike to validate a hypothesis |
Install details, repo-access modes, prerequisites
Install flags: --preset solo|small-team|full, --ext name1,name2, --name "...",
--rollback, --apply-proposal, --list. Every run backs up CLAUDE.md, settings.json,
manifest.md, .gitignore to .claude/backup-TIMESTAMP/; CLAUDE.md is merged additively,
hard conflicts pause with a proposal file instead of overwriting.
Repo-access modes (manifest.md → repo_access, default private-solo): controls whether
framework state (.claude/, memory/, tasks/, CLAUDE.md) is committed. Switch before the
first shared/public commit:
scripts/switch-repo-access.sh public --commitPrerequisites: Claude Code CLI · Git Bash on Windows (hooks use #!/bin/bash) ·
Python 3.9+ (optional — only for merging an existing CLAUDE.md/settings.json).
Troubleshooting
| Problem | Fix |
|---|---|
| Hooks error on Windows | Make Git Bash the default shell — hooks use #!/bin/bash. |
python3 not found |
On Windows Python installs as python; Forge handles it (Python is optional). |
Empty memory/ on first run |
Not an error — memory seeds itself as tasks close. |
install.sh exits code 2 |
Hard CLAUDE.md conflict — read .claude/CLAUDE.md.merge-proposal.md, then install.sh --apply-proposal (or --rollback). |
| Framework files leaked to a public branch | switch-repo-access.sh blocks the switch; use git filter-repo or cut a fresh branch. |
Pipeline ideas (autonomy ladder, stop rules, handoff contracts, Ralph Loop, escalation matrix) inspired by alexeykrol/coursevibecode.
MIT © 2026 David Sandler