Spawn cloud agents at your GitHub repos. Watch them work live, steer them mid-run, get a branch back.
A network of interconnected burrows. Agents that operate in isolation, self-manage, self-repair, and self-improve.
Warren is a self-hostable control plane for ephemeral coding agents. Runs are short-lived and sandboxed: they complete a task, validate the changes, push a branch, and spin down. Point it at your repos, dispatch from a browser or CLI, watch the events stream live, and reap the result. One container, one volume, one HTTP API, one UI.
A fresh install needs nothing but a GitHub URL and a prompt. The built-in claude-code agent ships inline; pick it, paste your repo, write what you want done. Power features (versioned prompt libraries, persistent agent memory, an integrated issue queue, a steerable alternative harness, a shared coordination substrate) light up when you opt into them.
Engineering teams self-hosting their own agent infrastructure. The deployment unit is one team or one org running one warren on their own box, their own Fly account, or their own cluster. Run it for yourself on a home server today; the org-readiness roadmap extends the same architecture to a 50+ engineer organization without forcing a fork.
Stable (0.3.12), running on Fly.io in continuous use against real GitHub repos. The end-to-end path is covered by 27 scenario-based acceptance tests in scripts/acceptance/: manual runs, cron triggers, multi-worker placement, Postgres backend, per-run preview environments, restart recovery, cost tracking, seeds-extensions roundtrip, serial plan-run dispatch, plan-run + Plot composition. The active frontier is the org-readiness cluster: SSO, remote workers, MCP, audit, budgets, GitHub App auth. See ROADMAP.md.
- One image, one volume. The supervisor (
src/supervisor/main.ts) is the container ENTRYPOINT. It spawns the sandbox runtime first, waits for the unix socket, then spawns warren. SIGTERM/SIGINT forward to both children; the runtime restarts under a 5-in-60s budget on unexpected exit. - Native sandboxing per run. Every run gets a fresh
bwrap-isolated workspace under/data/burrow/. The host is unreachable; warren talks to the runtime over a unix socket with a shared bearer token. - Built-in agents.
claude-code,sapling, andpiship inline (src/registry/builtins/), so dispatching a run needs no extra setup. - Live event stream. NDJSON events are persisted to warren's SQLite log and tailed over
GET /runs/:id/events?follow=1. The UI, CLI (warren run), and HTTP clients all consume the same stream. - Steerable mid-run.
POST /runs/:id/steerlands a message in the agent's inbox; the next turn picks it up.POST /runs/:id/cancelaborts cleanly. - Scheduled runs.
.warren/triggers.yamldefines cron triggers per project; the in-process scheduler dispatches them on the same composition path as manual runs. - Serial plan-run dispatch. Projects shipping
.seeds/canPOST /plan-runsagainst a seeds plan; warren walks the plan's children one at a time, spawning one run per child and gating each on the previous PR merging before the next dispatches. Re-dispatching the same plan after some children have closed resumes from the next open child. - Three thin clients of one pipeline. Web UI,
warrenadmin CLI, and HTTP API all flow through the same composition path (SPEC §4.3).
git clone https://github.com/jayminwest/warren && cd warren
cp .env.example .env && $EDITOR .env
docker compose up -d
open http://localhost:8080Paste your WARREN_API_TOKEN, click Projects → Add, give it a GitHub URL. Then Dispatch run, pick claude-code, write a prompt, hit go. The events panel streams; when the run completes warren pushes a branch you can open a PR from.
Required environment variables (see .env.example for the full list):
| Variable | Purpose |
|---|---|
WARREN_API_TOKEN |
Bearer token on every route except /healthz. openssl rand -hex 32. |
BURROW_API_TOKEN |
Token the sandbox runtime requires to bind. openssl rand -hex 32. |
WARREN_BURROW_TOKEN |
Token warren's runtime client sends. Must equal BURROW_API_TOKEN; they are the two ends of one channel. |
ANTHROPIC_API_KEY |
Forwarded to agent runtimes that need it. |
GITHUB_TOKEN |
Forwarded for project clones + branch pushes. |
The compose file applies the four bwrap-required security flags (apparmor=unconfined, seccomp=unconfined, systempaths=unconfined, cap_add: SYS_ADMIN). These relax the outer container so the runtime's nested userns sandboxes can come up. Removing any one of them breaks sandbox provisioning.
Same image, same volume layout, same security flags:
fly launch # uses ./fly.toml
fly volumes create warren_data --size 50 --region sjc
fly secrets set \
WARREN_API_TOKEN=... \
BURROW_API_TOKEN=... \
WARREN_BURROW_TOKEN=... \
ANTHROPIC_API_KEY=... \
GITHUB_TOKEN=...
# Optional: attach a managed Postgres instead of the on-volume SQLite.
# Without this, warren falls back to sqlite:///data/warren.db.
# fly secrets set WARREN_DB_URL=postgres://user:pw@host/db
fly deployOnce warren is live on Fly, wiring tag-driven auto-deploy is two commands:
fly tokens create deploy -a <your-warren-app> --name "github-actions" --expiry 8760h \
| gh secret set FLY_API_TOKEN -R <your-org>/<your-fork>Then add a deploy job to your release workflow that runs after release/tag:
deploy:
needs: release
if: needs.release.outputs.release == 'true'
runs-on: ubuntu-latest
concurrency:
group: fly-deploy-<your-warren-app>
cancel-in-progress: false
steps:
- uses: actions/checkout@v6
- uses: superfly/flyctl-actions/setup-flyctl@master
- env: { FLY_API_TOKEN: "${{ secrets.FLY_API_TOKEN }}" }
run: flyctl deploy --remote-only --app <your-warren-app>The deploy-scoped token is bound to a single app and cannot list secrets, ssh, or touch other apps, so it's safe to live in CI. See .github/workflows/release.yml for the reference shape used by warren-deployed.fly.dev.
Warren bundles a small set of os-eco tools as built-in features. They're not required for a basic run. Each lights up when you use it and stays silent when you don't.
The built-in claude-code, sapling, and pi agents cover the common case. To define custom agents as versioned prompts (with inheritance, mixins, and per-agent sandbox config), point warren at a canopy repo:
fly secrets set CANOPY_REPO_URL=https://github.com/<you>/agents.gitLibrary agents override built-ins by name. See SPEC §4.2 for the agent-as-prompt schema.
If a project has a .mulch/ directory, every run gets that expertise primed into context on spawn. As the agent learns conventions, patterns, and failure modes, it records them with ml record; reap merges the new records back to the project's persistent .mulch/ with last-write-wins by timestamp. Memory accumulates across runs without a database, just files in the repo. See mulch.
If a project has a .seeds/ directory, agents can sd ready for unblocked work, claim it with sd update, file follow-ups with sd create, and close completed seeds with sd close. Reap closes any seeds the agent marked done. The trigger scheduler can also fire on past-due extensions.scheduledFor seed timestamps (SPEC §11.I). See seeds.
.seeds/ also enables plan-run dispatch: POST /plan-runs { project, planId, agent } against a seeds plan walks its children sequentially, one warren run per child, gating each step on the previous PR merging before the next dispatches. Children whose seeds are already closed are skipped, so re-dispatching the same plan after partial completion resumes from the next open child. PlanRun is a dispatch mode on top of the existing single-run primitive — same spawn path, same sandbox, same event stream. Tune the coordinator with WARREN_PLAN_RUN_TICK_MS (default 10s) or disable it with WARREN_PLAN_RUN_DISABLED=1. See SPEC §11.P.
The built-in sapling agent is a headless coding harness with proactive context management. Use it the same way you'd use claude-code. See sapling.
If a project has a .plot/ directory, runs dispatched with a plot_id get PLOT_ID + PLOT_ACTOR=agent:<name>:<run-id> injected into the sandbox. The agent inside reads context with plot get and appends decision_made / question_posed / artifact_produced events with plot append. Warren appends a run_dispatched event to the originating Plot on spawn and merges the workspace .plot/ back at reap, mirroring agent events into the run's event stream tagged with plot_id. Projects without .plot/ are byte-identical to the pre-change behavior. See plot and SPEC §11.O.
When a project ships both .plot/ and .seeds/, plan-runs compose onto Plot. A POST /plan-runs { plot_id } emits one plan_run_dispatched event on the bound Plot at start, threads plot_id through every child so each gets PLOT_ID + PLOT_ACTOR in its sandbox and emits its own run_dispatched event, and auto-transitions the Plot from active → done when the final child merges. Plan-runs dispatched without plot_id, or against a project without .plot/, are byte-identical to the standalone plan-run baseline. See SPEC §11.P.Plot.
After a successful run, warren opens a PR with a generated body (summary, run link, commits, files-changed, prompt, etc.). Projects override individual sections by shipping a .warren/pr-template.md file: every ## <fragment_name> heading replaces the default body for that fragment. Unspecified fragments keep the built-in defaults, so you can override just one piece.
## trailer
Reviewed-by: @platform-team
Please follow our [PR checklist](https://example.com/checklist) before merging.Recognized fragment names: title, summary, run, seeds, preview_url_or_placeholder, commits, files_changed, prompt, trailer. A whitespace-only body removes the fragment entirely. Unknown names + unbalanced preview markers surface via warren doctor so typos are loud. See SPEC §11.L for the full fragment contract.
When a project ships a .warren/preview.yaml, warren launches preview.command as a sidecar inside the same burrow workspace after a successful run, allocates a port, and exposes the running app at https://run-<runId>.<WARREN_PREVIEW_HOST>. Reviewers click the URL instead of git checkout-ing the branch. Idle sessions are reaped automatically; the run-detail page surfaces a status badge and a manual teardown button. Opt in with two pieces:
-
Operator side. Set
WARREN_PREVIEW_HOST=preview.<your-host>and point a wildcard CNAME at the warren box (see Per-run previews: operator setup below). WithoutWARREN_PREVIEW_HOSTthe launch sub-step is a no-op (the run still completes, the URL just has no listener). -
Project side. Ship
.warren/preview.yamlwith the preview block at the top level:type: server command: bun run dev port: 3000 readiness_path: /healthz idle_ttl: 30m max_lifetime: 8h
Projects that don't opt in skip the preview sub-step entirely. See SPEC §11.L for the full contract.
Enable the preview proxy by giving warren a host suffix it can route on:
WARREN_PREVIEW_HOST=preview.warren.example.comWarren then matches Host: run-<runId>.preview.warren.example.com as a preamble before its API/UI routes and forwards to the in-sandbox port allocated at reap time. The login route (GET /runs/:id/preview/login?token=…&redirect=…) accepts the warren bearer in the query and issues a domain-scoped signed cookie (warren_preview); the proxy rejects unauthenticated browser requests with 401 (not 502). The HMAC key is derived from WARREN_API_TOKEN, so there's no second secret to manage. warren doctor warns if the token is empty or matches a placeholder.
Wildcard DNS. Point a wildcard CNAME at the warren box so every run-* subdomain resolves:
*.preview.warren.example.com CNAME warren.example.com
TLS via Caddy with a wildcard cert. TLS stays on the operator's edge (SPEC §8.1 / §11.D). Use Caddy's DNS-01 challenge to issue *.preview.warren.example.com (HTTP-01 cannot issue wildcards). Minimal Caddyfile snippet:
*.preview.warren.example.com {
tls {
dns cloudflare {env.CLOUDFLARE_API_TOKEN}
}
reverse_proxy localhost:8080
}Caddy's DNS-01 plugin supports Cloudflare, Route 53, DigitalOcean, Hetzner, Linode, OVH, Vultr, and others. See caddy-dns for the current list. If your provider isn't on it, an operator-controlled per-project subdomain pattern is the alternative.
Lifecycle knobs. Tune for scale via WARREN_PREVIEW_IDLE_TTL (default 30m), WARREN_PREVIEW_MAX_LIFETIME (8h), WARREN_PREVIEW_MAX_LIVE (20), WARREN_PREVIEW_PORT_RANGE (30000-31000), and WARREN_PREVIEW_EVICTION_TICK_MS (60000). Per-project overrides for idle_ttl and max_lifetime live in .warren/preview.yaml. /readyz surfaces port-allocator saturation warnings.
Cross-host routing for runs landing on remote workers is in progress as R-12. Until then, the proxy returns 501 for off-host runs.
See SPEC §11.L for the full design.
┌──────────────── container (bwrap-friendly host) ────────────────┐
│ supervisor ─┬─► sandbox runtime (unix socket: /var/run/...) │
│ (Bun parent) └─► warren (Bun.serve :8080, SPA + API)│
│ │
│ /data/ │
│ ├── canopy-repo/ ← optional cloned agent library │
│ ├── projects/<o>/<n>/ ← cloned project repos │
│ ├── burrow/ ← runtime home (SQLite, workspaces) │
│ └── warren.db ← warren's SQLite (runs, events) │
└─────────────────────────────────────────────────────────────────┘
▲
│ HTTPS (terminated upstream)
[browser]
Under the hood, warren talks to burrow as the sandbox runtime. They are co-tenanted inside the container, share a unix socket, and share a bearer token (BURROW_API_TOKEN == WARREN_BURROW_TOKEN). See SPEC §10.3 for the full layout.
The warren (or wr) admin CLI is for ops; the web UI is daily.
| Command | Description |
|---|---|
warren register-agent <name> |
Refresh canopy + register one agent |
warren add-project <git-url> |
Clone a project under /data/projects |
warren run <agent> <project> -p "..." |
One-shot run, no UI |
warren init |
Scaffold a .warren/ directory in a project |
warren doctor |
Runtime reachable? Bwrap working? DB reachable? |
warren serve |
Start the HTTP server (default in entrypoint) |
warren db migrate-to-postgres --from <sqlite> --to <pg-url> |
One-shot SQLite → Postgres porter (R-13) |
warren run claude-code <project> -p "..." does the full composition end-to-end: resolves the agent (built-in or library), provisions the sandbox, dispatches the run, streams events back, then pushes the branch. If the project has .mulch/ or .seeds/, those round-trip too.
GET /agents list registered agents
POST /agents/refresh re-clone the optional canopy library
GET /agents/:name rendered agent JSON
GET /projects list cloned projects
POST /projects { gitUrl, defaultBranch? } → clone
POST /projects/:id/refresh git fetch + reset to upstream HEAD
DELETE /projects/:id remove project
GET /projects/:id/warren-config parsed .warren/ envelope
GET /projects/:id/triggers scheduler state per trigger
POST /projects/:id/triggers/:tid/run dispatch a trigger inline
POST /runs { agent, project, prompt } → spawn
GET /runs list (filter by status / agent / project)
GET /runs/:id detail incl. rendered_agent_json
GET /runs/:id/events?follow=1 NDJSON tail (warren log + live)
POST /runs/:id/steer proxy to runtime inbox
POST /runs/:id/cancel proxy to runtime cancel
GET /runs/:id/preview/login issue signed-cookie + 302 (auth-exempt, ?token=)
POST /runs/:id/preview/teardown manual preview teardown (idempotent)
POST /plan-runs { project, planId, agent } → serial dispatch (.seeds/ only)
GET /plan-runs list (filter by project / state)
GET /plan-runs/:id detail + fanned-out child runs[]
POST /plan-runs/:id/cancel cancel; aborts the in-flight child run
GET /plan-runs/:id/events NDJSON tail union over every child run
GET /healthz liveness (no auth)
GET /readyz runtime + first-render check
Authorization: Bearer ${WARREN_API_TOKEN} is required on every non-/healthz route. Warren does not terminate TLS; front it with Caddy on a home server, or rely on Fly's edge.
Requires Bun v1.1+.
bun install
bun test # all unit tests
bun run lint # biome check --error-on-warnings
bun run typecheck # tsc --noEmit
bun test && bun run lint && bun run typecheck # all quality gatesUI development (separate from the server build):
bun run ui:install
bun run ui:devThe acceptance harness in scripts/acceptance/ drives 27 scenarios against a live container. See ACCEPTANCE.md for the runbook.
See CONTRIBUTING.md for branch naming, testing conventions, and PR expectations.
src/
├── index.ts library entry (currently VERSION constant only)
├── core/ types, errors, id minting (ag_*, prj_*, run_*)
├── registry/ agent definition resolution (built-in + library)
├── projects/ GitHub clone management
├── runs/ spawn / stream / reap composition flow (SPEC §4.3)
├── triggers/ cron + scheduled-for dispatcher (SPEC §11.I)
├── warren-config/ .warren/ per-project config loader + cache (SPEC §11.H)
├── burrow-client/ facade over the sandbox runtime's HttpClient
├── supervisor/ container entrypoint (spawns warren + runtime)
├── server/ Bun.serve HTTP API + static UI serving
├── db/ drizzle schema + bun:sqlite repos
├── cli/ warren admin commands
└── ui/ React + Vite + shadcn SPA
How the current release is scoped. Full details in SPEC §11.D:
- Single bearer token. Rotation, expiry, and scopes are not supported; rotate by editing
.env(orfly secrets set) and bouncing the container. Per-user identity is on the roadmap (R-09). - TLS is upstream's job. Direct HTTP on a non-loopback bind is a misconfiguration;
warren doctorwarns. - Trust-the-socket between warren and the runtime inside the container, which are co-tenanted by design.
- No CSRF, single-user. UI calls warren's API with the bearer; CORS is strict.
- SQLite by default; Postgres optional. Run history and scheduler state live in
/data/warren.dbon the local volume out of the box. Org-scale deploys can attach a managed Postgres by settingWARREN_DB_URL=postgres://user:pw@host/db; burrow's per-run SQLite stays untouched either way. - One host is the concurrency ceiling. Horizontal scale-out across machines is in flight as R-12.
The active direction is org-readiness, extending warren from "one team, one box" to "50-engineer org, their own infra":
- Remote sandbox workers (R-12): one warren dispatching across many runtime workers; lifts the single-host ceiling.
- SSO / per-user identity (R-09): OIDC login replacing the shared bearer. The bearer stays as a service-account path for CI.
- MCP support (R-15): agents declare
mcp_serversin their prompt frontmatter; warren plumbs credentials into the sandbox. - Cross-project activity UI + stable OpenAPI (R-14): a "what is every agent doing right now" view, plus a versioned API contract.
- Audit log (R-16) and cost / concurrency guardrails (R-17): security review and budget control once real user identity lands.
- GitHub App auth (R-18): installation-scoped, short-lived per-run tokens replacing the shared PAT.
All items are additive: none change current behavior when unconfigured. See ROADMAP.md for design sketches and sequencing.
Found a vulnerability? Please follow the disclosure process in SECURITY.md.
Warren is part of the os-eco AI agent tooling ecosystem.
MIT. See LICENSE.