Smithers

Run long-horizon coding-agent work as durable workflows.

Tell your coding agent to do real, multi-step work, then Smithers runs it for minutes or days with crash recovery, retries, human approvals, and full observability. The same workflow runs across Claude Code, Codex, Pi, AI SDK models, and remote sandboxes.

A workflow run is a list of steps you can watch, pause, approve, and rewind. The run above shows several in flight at once.

What you get

🛡️ Durable runs that survive crashes: every completed step is persisted the moment it finishes, so a run resumes from where it stopped instead of starting over.
🔌 Any agent, any model: Claude Code, Codex, Pi, Antigravity, and more, plus any model through the AI SDK. Swap the harness without rewriting the workflow.
🛠️ Higher-quality output: review loops, human approvals, and evals give agents the structure that real work demands.
🧩 Dozens of ready-to-run workflows: planning, implementation, review, debugging, tickets, audits, and long-horizon missions. Your agent can author new ones.

When to use Smithers

You want to…	Smithers?
Get one answer from one prompt	No, call the model directly
Ship a chatbot or a simple tool-calling app	Usually no
Let a coding agent change a repo across many steps	Yes
Pause for a human approval, then resume later	Yes
Run several agents that review, retry, and converge	Yes
Survive crashes and replay, fork, or rewind a run	Yes
Run durable backend jobs unrelated to agents	Consider Temporal or Inngest
Host stateful per-user session agents	Consider Cloudflare Agents

Smithers is the durable runtime for coding-agent work: when the unit of work is an agent editing a real repository over many steps, and you need that work to be inspectable, approvable, and recoverable.

Prompt your agent

Smithers is driven by your coding agent, not a GUI you click. Your agent runs Smithers on your behalf: it scaffolds workflows, kicks off runs, watches them, and handles approvals.

The fastest way to make your agent fluent is two fan-out commands. They install the Smithers skill and register the MCP server into every coding agent on your machine (Claude Code, Codex, Cursor, Copilot, Pi, Hermes, OpenClaw, and ~20 more):

bunx smithers-orchestrator skills add   # install the skill set into every detected agent
bunx smithers-orchestrator mcp add      # register Smithers as an MCP server everywhere

Then just ask:

"orchestrate an agent to add rate limiting and keep iterating until the tests pass."

Your agent picks the right workflow, starts the run, and keeps going through retries and review loops until the work is actually done.

See Agent Support for the per-agent setup (skill, MCP, instructions) for Claude Code, Codex, Cursor, Copilot, Pi, Hermes, and OpenClaw.

Wire one agent by hand? If you'd rather drop the curated onboarding skill into a single agent directly:

mkdir -p ~/.claude/skills/smithers
curl -fsSL https://raw.githubusercontent.com/smithersai/smithers/main/skills/smithers/SKILL.md \
  -o ~/.claude/skills/smithers/SKILL.md
curl -fsSL https://smithers.sh/llms-full.txt \
  -o ~/.claude/skills/smithers/llms-full.txt

See skills/smithers/ for the onboarding skill.

Quick start

Prefer to drive it yourself from the CLI? Start here.

# scaffold the workflow pack into .smithers/
bunx smithers-orchestrator init

# turn a request into a practical implementation plan
bunx smithers-orchestrator workflow run plan --prompt "add rate limiting, audit logging, and API key rotation"

init scaffolds a .smithers/ folder preloaded with production-ready workflows. Once that's in place, you can chain a request from tickets to implementation:

# break a request into ticket files under .smithers/tickets/
bunx smithers-orchestrator workflow run tickets-create --prompt "add rate limiting, audit logging, and API key rotation"

# implement the tickets, each in its own worktree branch
bunx smithers-orchestrator workflow run kanban

Run bunx smithers-orchestrator starters to browse plain-English starters, and bunx smithers-orchestrator workflow list to see what's installed.

Watch your runs

Whether your agent started a run or you did, you can see exactly what's happening:

bunx smithers-orchestrator ps              # list active, paused, and recently completed runs
bunx smithers-orchestrator inspect RUN_ID  # steps, agents, approvals, and outputs for one run
bunx smithers-orchestrator logs RUN_ID     # tail the event log
bunx smithers-orchestrator chat RUN_ID     # read the agent's chat output

ps shows you what needs attention (a paused approval, a recent failure); inspect drills into a single run so you can follow each step and agent as it works.

Durable by default

Durability is the differentiator. Runs survive crashes, restarts, and flaky tools because every completed step is persisted to SQLite the moment it finishes. The runtime always knows what's done and what to run next. Approvals, human questions, retries, and replay are first-class.

prompt → render workflow → run task → validate output → persist to SQLite → re-render → resume · inspect · replay

That loop is the whole model: a task runs, its output is validated against a schema and written down, then the workflow re-renders from persisted state to decide the next task. A crash at any point resumes from the last write, not from the top.

bunx smithers-orchestrator up workflow.tsx --input '{"description":"Fix bug"}'
bunx smithers-orchestrator up workflow.tsx --run-id abc123 --resume true   # resume after a crash
bunx smithers-orchestrator rewind abc123 --frame 4                          # time-travel to an earlier frame
bunx smithers-orchestrator fork abc123                                      # branch an alternate timeline
bunx smithers-orchestrator replay abc123                                    # replay from a checkpoint

Any agent, any model

Smithers doesn't bet on one lab or one harness. Point a task at whichever agent is best for the job, mix several in one workflow, and switch freely. The workflow doesn't change when the model does, so a frontier model can plan, a fast model can fan out, and a specialized harness can do the edits.

Agents that run tasks

Agent	How it runs
Claude Code	CLI harness
Codex	CLI harness
Pi	CLI harness
Antigravity	CLI harness
Any AI SDK model	SDK agent, with tools, structured output, and MCP

Sandboxes that isolate them

The same <Sandbox> primitive runs an agent locally or on a remote provider with no change to the workflow:

Target	Notes
Local	default, runs on your machine
gVisor	syscall-isolated containers
Kubernetes	your own cluster
Freestyle · Daytona · Cloudflare	managed remote sandboxes

To drive Smithers itself, the skills add / mcp add commands also wire Cursor, Copilot, Hermes, OpenClaw, and ~20 more coding agents.

Built-in workflows

bunx smithers-orchestrator init installs a pack of ready-to-run workflows. Point your agent at one and go via bunx smithers-orchestrator workflow run WORKFLOW_ID --prompt "...":

Build

Workflow	What it does
`implement`	Implement a focused change with validation and review feedback loops.
`research-plan-implement`	Research a request, produce a plan, then implement it with validation and review.
`ticket-create` / `tickets-create`	Turn a request into one or many structured implementation tickets.
`kanban`	Implement ticket files in worktree branches, board-style.

Plan

Workflow	What it does
`plan`	Create a practical implementation plan before code changes begin.
`research`	Gather repository and external context before planning or building.
`grill-me`	Ask targeted questions until vague requirements become actionable.
`mission`	Run long-horizon work as approved milestones with focused workers and validation.

Quality

Workflow	What it does
`review`	Review current repository changes with one or more configured agents.
`debug`	Reproduce, fix, validate, and review a reported bug.
`improve-test-coverage`	Find and add high-impact missing tests for the repository.
`audit`	Audit feature groups for tests, docs, observability, and maintainability gaps.
`feature-enum`	Build or refine a code-backed feature inventory for a repository.
`ralph`	Keep working continuously on an open-ended maintenance prompt.

See docs/workflows/ for the full pack.

Examples

The examples/ folder has 100+ runnable workflows, one per orchestration pattern. Copy one as a starting point:

Example	Pattern
`code-review-loop`	Implement → review → fix, looped until approved.
`coverage-loop`	Run tests, measure coverage, write tests, repeat to target.
`panel`	N specialist agents review in parallel, a moderator synthesizes.
`supervisor`	A boss agent plans and delegates to workers dynamically.
`fan-out-fan-in`	Split work across N parallel agents, aggregate results.
`parallel-tickets`	Triage, run waves of work in parallel, merge-queue the results.
`migration`	Plan → transform files → validate → report.
`pr-shepherd`	Watch a PR, gather context, leave structured review comments.
`playwright-test-agent`	Plan E2E flows, generate Playwright tests, run/heal until stable.
`adaptive-rag-citation-loop`	Retrieve evidence, draft cited answers, grade grounding, retry.
`sql-analyst-dashboard`	Discover schema, check read-only SQL, execute, summarize with a chart.
`rfp-response-room`	Extract RFP requirements, draft cited answers, review, package.
`calendar-negotiator-with-approval`	Rank meeting slots and write calendar state only after approval.
`document-exception-queue`	Extract document packets, reconcile them, route exceptions.

That's fourteen of 100+. The folder also covers debates, canary judging, SLO-breach explainers, repo janitors, and dozens more. Browse the full set in examples/.

Author your own

The built-in workflows are normal Smithers TSX files: run them as-is, have your agent adapt them to your repo, or have it write new ones from the same primitives. A workflow is a JSX tree of tasks:

import { createSmithers, Sequence } from "smithers-orchestrator";
import { z } from "zod";

const { Workflow, Task, smithers, outputs } = createSmithers({
  analyze: z.object({
    summary: z.string(),
    severity: z.enum(["low", "medium", "high"]),
  }),
  fix: z.object({
    patch: z.string(),
    explanation: z.string(),
  }),
});

export default smithers((ctx) => (
  <Workflow name="bugfix">
    <Sequence>
      <Task id="analyze" output={outputs.analyze} agent={analyzer}>
        {`Analyze the bug: ${ctx.input.description}`}
      </Task>

      <Task id="fix" output={outputs.fix} agent={fixer}>
        {`Fix this issue: ${ctx.latest("analyze").summary}`}
      </Task>
    </Sequence>
  </Workflow>
));

Each task output is validated against its Zod schema and persisted to SQLite. If the process crashes, Smithers resumes from the last completed node without re-running finished work.

Components

Component	Purpose
`<Workflow>`	Root container
`<Task>`	AI or static task node
`<Sequence>`	Ordered execution
`<Parallel>`	Concurrent execution
`<Branch>`	Conditional execution
`<Ralph>`	Loop until a condition is met

<Ralph until={ctx.latest("validate")?.approved} maxIterations={5}>
  <Task id="implement" output={outputs.implement} agent={coder}>
    Fix based on feedback
  </Task>

  <Task id="validate" output={outputs.review} agent={reviewer}>
    Review the implementation
  </Task>
</Ralph>

There are many more: approvals, merge queues, sub-workflows, signals, timers, sagas, and composite patterns. See Components.

Deeper capabilities

Observability: every run emits Prometheus metrics and OpenTelemetry traces. Bring up the local stack with bunx smithers-orchestrator observability up (Grafana, Prometheus, Tempo, OTLP collector) and serve metrics with bunx smithers-orchestrator up workflow.tsx --serve --metrics.
Evals: run repeatable workflow regressions from JSON/JSONL cases with bunx smithers-orchestrator eval workflow.tsx --cases evals/smoke.jsonl --suite smoke; the command exits non-zero when any case fails.
Prompt optimization: run GEPA-style optimization against an eval suite with bunx smithers-orchestrator optimize, which writes an optimized prompt artifact only when the score improves.
Hot reload: edit prompts, config, agent settings, or JSX structure mid-run with bunx smithers-orchestrator up workflow.tsx --hot. In-flight tasks finish on their original code; only newly scheduled tasks pick up changes.
Scale across machines: run agents in remote sandboxes via the <Sandbox> primitive (provider table above) with no change to the workflow. Start from examples/freestyle-sandbox-provider and the Sandbox component.

Security and safety

Smithers is built for agents that modify real repositories, so control is wired into the runtime:

Approvals: gate destructive or risky steps behind a human approve / deny before they run.
Inspectable: every step, tool call, and output is persisted and replayable, so you can see exactly what an agent did and why.
Reversible: rewind, fork, or replay any run from a checkpoint instead of living with whatever the agent left behind.
Isolated: run agents in a sandbox (gVisor, Kubernetes, Freestyle, Daytona, Cloudflare) so edits never touch your host.
Auditable: every run emits Prometheus metrics and OpenTelemetry traces.

Docs

Full documentation lives at smithers.sh.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2,863 Commits
.github/workflows		.github/workflows
.smithers		.smithers
apps		apps
benchmarks		benchmarks
deploy/reference		deploy/reference
docs		docs
e2e		e2e
examples		examples
marketing		marketing
observability		observability
packages		packages
poc		poc
research		research
scripts		scripts
skills		skills
workflows		workflows
.dockerignore		.dockerignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
bunfig.toml		bunfig.toml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
preload.js		preload.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smithers

What you get

When to use Smithers

Prompt your agent

Quick start

Watch your runs

Durable by default

Any agent, any model

Built-in workflows

Examples

Author your own

Components

Deeper capabilities

Security and safety

Read next

Docs

License

About

Uh oh!

Releases 76

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Smithers

What you get

When to use Smithers

Prompt your agent

Quick start

Watch your runs

Durable by default

Any agent, any model

Built-in workflows

Examples

Author your own

Components

Deeper capabilities

Security and safety

Read next

Docs

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 76

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages