2026.01.20

Loop

Run an agent in a loop. Guardrails decide when to stop.

AI coding sessions start fresh. No memory of the last run. No awareness of what failed before.

loop works around this by writing progress and errors to files in the repo. Each iteration reads what the last one left behind. The repo is the memory.

Setup

npx @acoyfellow/loop init

This creates:

AGENTS.md              # how to build/test (north star)
tasks.md               # what to do (north star)
loop.json              # config
.loop/
  progress.md          # what's done (append-only)
  errors.md            # what went wrong (append-only)
  PAUSED               # kill switch
.github/workflows/
  loop.yml             # CI trigger

Run

npx @acoyfellow/loop enter    # one iteration
npx @acoyfellow/loop watch    # loop until paused

Each iteration:

1. reads the north star files (AGENTS.md, tasks.md)
2. reads recent progress and errors
3. builds a context window
4. calls the agent (claude, opencode, aider, whatever)
5. checks guardrails
6. records what happened
7. exits

The next iteration sees what the last one did. Errors persist. Progress persists.

Guardrails

A guardrail is a command. Exit 0 = pass. Exit 1 = fail. stderr gets appended to errors.md so the next iteration knows what went wrong.

Default: something must change. No diff = failure.

{
  "agent": "claude",
  "guardrail": "npm test"
}

Or use gateproof for structured assertions:

{
  "guardrail": "bun gate.ts"
}

What I write vs. what the agent writes

I write AGENTS.md (how to build and test), tasks.md (what to do), and guardrails (how to verify). The agent writes code. The guardrails decide if the code is acceptable.

The harder part isn't getting the agent to write code. It's writing guardrails that specify what I actually want.

Costs

  • Context bloat. If context grows unbounded, mistakes compound. Smaller iterations with good notes for the next one. That's what the progress/errors files are for.
  • Review asymmetry. Prompting takes a minute. Reviewing the output takes much longer. Guardrails help the author. They don't help the reviewer.
  • False productivity. It feels productive. The guardrails prevent slop, but they don't prevent building things you don't need.
  • Token pricing. Current pricing may be subsidized. These patterns may not stay cheap.

Try it

npx @acoyfellow/loop init
# edit AGENTS.md and tasks.md
npx @acoyfellow/loop enter

github.com/acoyfellow/loop

Related

  • deja .. the memory experiment that failed because the gates were weak.
  • gate-review .. asking what bad implementation can pass a test.
  • preflight .. startup work that burned context before building.
  • loop-demo .. using the loop on its own demo until it worked.