agent-foreman

Long Task Harness for AI agents - feature-driven development with external memory

中文文档

Problem

AI coding agents face three common failure modes when working on long-running tasks:

Doing too much at once - Trying to complete everything in one session, resulting in messy, incomplete code
Premature completion - Declaring victory before all features actually work
Superficial testing - Not thoroughly validating implementations

Solution

agent-foreman provides a structured harness that enables AI agents to:

Maintain external memory via structured files
Work on one feature at a time with clear acceptance criteria
Hand off cleanly between sessions via progress logs
Track impact of changes on other features

Why It Works

The core insight is simple: AI agents need the same tooling that makes human engineering teams effective.

Human engineers don't rely on memory either. We use:

Git for version history
Issue trackers for task management
Documentation for handoffs
Tests for verification

agent-foreman brings these same patterns to AI:

Human Practice	AI Equivalent
Scrum board	`feature_list.json`
Sprint notes	`progress.md`
CI/CD pipeline	`init.sh check`
Code review	Acceptance criteria

Why JSON Instead of Markdown?

From Anthropic's research:

"Models are more likely to respect and accurately update JSON structures than markdown checklists."

When features are stored as JSON with explicit status fields, AI agents:

Don't accidentally delete items
Update status correctly
Respect the schema

This is the difference between projects that work and projects that mysteriously lose features between sessions.

Installation

# Global installation
npm install -g agent-foreman

# Or use with npx
npx agent-foreman --help

Claude Code Plugin

agent-foreman is available as a Claude Code plugin:

# Install plugin
/plugin marketplace add mylukin/agent-foreman
/plugin install agent-foreman

Using with Claude Code

Initializing Projects

Empty Project

For a brand new project with no existing code:

mkdir my-project && cd my-project
agent-foreman init "Build a REST API for task management" --mode new

Prompt for Claude Code:

Use foreman to initialize this project.
Goal: Build a REST API for task management

Existing Project

For projects with existing code:

agent-foreman survey
agent-foreman init "Your project goal"

Prompt for Claude Code:

Use foreman to initialize this project.

Task Loop Prompts

Single Task Completion

Use foreman to get the next task, implement it, and mark it complete.

Continuous Task Loop

The Magic Prompt - Auto-complete all tasks:

Use foreman to check the project status, then continuously work through
all tasks one by one until everything is complete. For each task:
1. Run `agent-foreman step` to get the next task
2. Implement the feature according to acceptance criteria
3. Run tests to verify
4. Run `agent-foreman complete <feature_id>` to mark done
5. Repeat until all tasks are passing

Quick Status Check

Use foreman to check the current project status.

Analyze and Plan

Use foreman to analyze this project and give me a comprehensive status report.

Managing Tasks

Adding New Tasks

Edit ai/feature_list.json directly or use Claude Code:

Add a new feature to the task list:
- ID: auth.oauth
- Description: Implement OAuth2 authentication with Google
- Module: auth
- Priority: 5
- Acceptance criteria: User can login with Google account

Feature JSON Structure:

{
  "id": "auth.oauth",
  "description": "Implement OAuth2 authentication with Google",
  "module": "auth",
  "priority": 5,
  "status": "failing",
  "acceptance": [
    "User can click 'Login with Google' button",
    "System redirects to Google OAuth flow",
    "User is authenticated and redirected back"
  ],
  "dependsOn": ["auth.login"],
  "tags": ["oauth", "google"],
  "version": 1,
  "origin": "manual",
  "notes": ""
}

Changing Task Goals

Update the project goal to: "Build a full-stack task management app with React frontend"
Also update relevant features to align with the new goal.

Modifying Existing Tasks

Update feature 'api.users.create':
- Change description to: "Create user with email verification"
- Add acceptance criteria: "Send verification email after registration"
- Set priority to 3

Marking Tasks as Blocked

Mark feature 'payment.stripe' as blocked with note: "Waiting for Stripe API keys"

Auto-Complete All Tasks

Method 1: Continuous Loop Prompt

The most effective prompt for fully automated task completion:

I want you to act as an autonomous developer. Use the agent-foreman
harness to continuously complete all remaining tasks:

1. Check status with `agent-foreman status`
2. Get next task with `agent-foreman step`
3. Implement the feature completely
4. Run tests with `./ai/init.sh check`
5. Mark complete with `agent-foreman complete <id>`
6. Commit the changes
7. Loop back to step 2 until all tasks pass

Do not stop until all features are passing. Ask me only if you
encounter a blocker that requires my input.

Method 2: Using the Foreman Agent

Use the foreman agent to automatically complete all pending tasks
in this project. Work through them one by one until 100% complete.

Method 3: Batch Completion (for implemented features)

If features are already implemented but not marked:

All features in this project are already implemented and tested.
Use foreman to mark each one as complete, going through them
one by one until all are passing.

Workflow Summary

┌─────────────────────────────────────────────────────────────┐
│                    AGENT-FOREMAN WORKFLOW                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐              │
│  │  status  │───▶│   step   │───▶│implement │              │
│  └──────────┘    └──────────┘    └──────────┘              │
│       │                               │                     │
│       │                               ▼                     │
│       │                         ┌──────────┐               │
│       │                         │   test   │               │
│       │                         └──────────┘               │
│       │                               │                     │
│       │                               ▼                     │
│       │    ┌──────────┐        ┌──────────┐               │
│       │◀───│   next   │◀───────│ complete │               │
│       │    └──────────┘        └──────────┘               │
│       │                               │                     │
│       ▼                               ▼                     │
│  ┌─────────────────────────────────────────┐               │
│  │  🎉 All features passing! (100%)        │               │
│  │  📊 PROJECT_SURVEY.md auto-updated      │               │
│  └─────────────────────────────────────────┘               │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Commands Reference

Command	Description
`survey`	Generate project survey report
`init <goal>`	Initialize or upgrade the harness
`step`	Show next feature to work on
`status`	Show current project status
`impact <feature_id>`	Analyze impact of changes
`complete <feature_id>`	Mark a feature as complete

Init Modes

Mode	Description
`--mode merge`	Merge with existing (default)
`--mode new`	Create new, fail if exists
`--mode scan`	Scan only, no AI features

Core Files

File	Purpose
`ai/feature_list.json`	Feature backlog with status
`ai/progress.md`	Session handoff audit log
`ai/init.sh`	Environment bootstrap script
`docs/PROJECT_SURVEY.md`	AI-generated project survey

Feature Status Values

Status	Meaning
`failing`	Not yet implemented
`passing`	Acceptance criteria met
`blocked`	External dependency blocking
`needs_review`	May be affected by changes
`deprecated`	No longer needed

Supported Tech Stacks

Language	Frameworks
Node.js/TypeScript	Express, Vue, React, Astro, Next.js, Nuxt
Go	Echo, Gin, Fiber
Python	FastAPI, Flask, Django

Best Practices

One feature at a time - Complete before switching
Update status promptly - Mark passing when criteria met
Review impact - Run impact analysis after changes
Clean commits - One feature = one atomic commit
Read first - Always check feature list and progress log

Development

# Install dependencies
npm install

# Run in development mode
npm run dev

# Build
npm run build

# Run tests
npm test

License

MIT

Author

Lukin (@mylukin)

Inspired by Anthropic's blog post: Effective harnesses for long-running agents

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.claude-plugin		.claude-plugin
docs		docs
plugins/agent-foreman		plugins/agent-foreman
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
README_zh.md		README_zh.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

maxthraxx/agent-foreman

Folders and files

Latest commit

History

Repository files navigation