Long Task Harness for AI agents - feature-driven development with external memory
AI coding agents face three common failure modes when working on long-running tasks:
- Doing too much at once - Trying to complete everything in one session, resulting in messy, incomplete code
- Premature completion - Declaring victory before all features actually work
- Superficial testing - Not thoroughly validating implementations
agent-foreman provides a structured harness that enables AI agents to:
- Maintain external memory via structured files
- Work on one feature at a time with clear acceptance criteria
- Hand off cleanly between sessions via progress logs
- Track impact of changes on other features
The core insight is simple: AI agents need the same tooling that makes human engineering teams effective.
Human engineers don't rely on memory either. We use:
- Git for version history
- Issue trackers for task management
- Documentation for handoffs
- Tests for verification
agent-foreman brings these same patterns to AI:
| Human Practice | AI Equivalent |
|---|---|
| Scrum board | feature_list.json |
| Sprint notes | progress.md |
| CI/CD pipeline | init.sh check |
| Code review | Acceptance criteria |
From Anthropic's research:
"Models are more likely to respect and accurately update JSON structures than markdown checklists."
When features are stored as JSON with explicit status fields, AI agents:
- Don't accidentally delete items
- Update status correctly
- Respect the schema
This is the difference between projects that work and projects that mysteriously lose features between sessions.
# Global installation
npm install -g agent-foreman
# Or use with npx
npx agent-foreman --helpagent-foreman is available as a Claude Code plugin:
# Install plugin
/plugin marketplace add mylukin/agent-foreman
/plugin install agent-foremanFor a brand new project with no existing code:
mkdir my-project && cd my-project
agent-foreman init "Build a REST API for task management" --mode newPrompt for Claude Code:
Use foreman to initialize this project.
Goal: Build a REST API for task management
For projects with existing code:
agent-foreman survey
agent-foreman init "Your project goal"Prompt for Claude Code:
Use foreman to initialize this project.
Use foreman to get the next task, implement it, and mark it complete.
The Magic Prompt - Auto-complete all tasks:
Use foreman to check the project status, then continuously work through
all tasks one by one until everything is complete. For each task:
1. Run `agent-foreman step` to get the next task
2. Implement the feature according to acceptance criteria
3. Run tests to verify
4. Run `agent-foreman complete <feature_id>` to mark done
5. Repeat until all tasks are passing
Use foreman to check the current project status.
Use foreman to analyze this project and give me a comprehensive status report.
Edit ai/feature_list.json directly or use Claude Code:
Add a new feature to the task list:
- ID: auth.oauth
- Description: Implement OAuth2 authentication with Google
- Module: auth
- Priority: 5
- Acceptance criteria: User can login with Google account
Feature JSON Structure:
{
"id": "auth.oauth",
"description": "Implement OAuth2 authentication with Google",
"module": "auth",
"priority": 5,
"status": "failing",
"acceptance": [
"User can click 'Login with Google' button",
"System redirects to Google OAuth flow",
"User is authenticated and redirected back"
],
"dependsOn": ["auth.login"],
"tags": ["oauth", "google"],
"version": 1,
"origin": "manual",
"notes": ""
}Update the project goal to: "Build a full-stack task management app with React frontend"
Also update relevant features to align with the new goal.
Update feature 'api.users.create':
- Change description to: "Create user with email verification"
- Add acceptance criteria: "Send verification email after registration"
- Set priority to 3
Mark feature 'payment.stripe' as blocked with note: "Waiting for Stripe API keys"
The most effective prompt for fully automated task completion:
I want you to act as an autonomous developer. Use the agent-foreman
harness to continuously complete all remaining tasks:
1. Check status with `agent-foreman status`
2. Get next task with `agent-foreman step`
3. Implement the feature completely
4. Run tests with `./ai/init.sh check`
5. Mark complete with `agent-foreman complete <id>`
6. Commit the changes
7. Loop back to step 2 until all tasks pass
Do not stop until all features are passing. Ask me only if you
encounter a blocker that requires my input.
Use the foreman agent to automatically complete all pending tasks
in this project. Work through them one by one until 100% complete.
If features are already implemented but not marked:
All features in this project are already implemented and tested.
Use foreman to mark each one as complete, going through them
one by one until all are passing.
┌─────────────────────────────────────────────────────────────┐
│ AGENT-FOREMAN WORKFLOW │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ status │───▶│ step │───▶│implement │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌──────────┐ │
│ │ │ test │ │
│ │ └──────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌──────────┐ ┌──────────┐ │
│ │◀───│ next │◀───────│ complete │ │
│ │ └──────────┘ └──────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ 🎉 All features passing! (100%) │ │
│ │ 📊 PROJECT_SURVEY.md auto-updated │ │
│ └─────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
| Command | Description |
|---|---|
survey |
Generate project survey report |
init <goal> |
Initialize or upgrade the harness |
step |
Show next feature to work on |
status |
Show current project status |
impact <feature_id> |
Analyze impact of changes |
complete <feature_id> |
Mark a feature as complete |
| Mode | Description |
|---|---|
--mode merge |
Merge with existing (default) |
--mode new |
Create new, fail if exists |
--mode scan |
Scan only, no AI features |
| File | Purpose |
|---|---|
ai/feature_list.json |
Feature backlog with status |
ai/progress.md |
Session handoff audit log |
ai/init.sh |
Environment bootstrap script |
docs/PROJECT_SURVEY.md |
AI-generated project survey |
| Status | Meaning |
|---|---|
failing |
Not yet implemented |
passing |
Acceptance criteria met |
blocked |
External dependency blocking |
needs_review |
May be affected by changes |
deprecated |
No longer needed |
| Language | Frameworks |
|---|---|
| Node.js/TypeScript | Express, Vue, React, Astro, Next.js, Nuxt |
| Go | Echo, Gin, Fiber |
| Python | FastAPI, Flask, Django |
- One feature at a time - Complete before switching
- Update status promptly - Mark passing when criteria met
- Review impact - Run impact analysis after changes
- Clean commits - One feature = one atomic commit
- Read first - Always check feature list and progress log
# Install dependencies
npm install
# Run in development mode
npm run dev
# Build
npm run build
# Run tests
npm testMIT
Lukin (@mylukin)
Inspired by Anthropic's blog post: Effective harnesses for long-running agents