A multi-agent orchestration system for autonomous software development. Ralph iterates through a PRD (JSON), picks one story per cycle, and hands it to an Opus conductor that delegates to specialist Sonnet sub-agents — design, build, review, test, and ship.
Built on top of Ralph (the bash loop) with a Python orchestration layer that manages agent lifecycles, metrics, and cross-story learning.
┌──────────────────────────────────────────────────────┐
│ Ralph Loop (Bash) │
│ - Reads PRD JSON │
│ - Picks next open story (respects dependencies) │
│ - Renders prompt template │
│ - Launches orchestra pipeline │
│ - Checks for COMPLETE signal │
│ - Updates story status (open → in_progress → done) │
│ - Loops until all stories done or max iterations │
└───────────────────────┬──────────────────────────────┘
│
┌───────────────────────▼──────────────────────────────┐
│ Orchestra Pipeline (Python) │
│ - Classifies story type (mobile_ui, backend, etc.) │
│ - Creates isolated state directory per run │
│ - Launches Opus conductor with full context │
│ - Collects artifacts, metrics, and learnings │
│ - Updates guardrails from review feedback │
└───────────────────────┬──────────────────────────────┘
│
┌───────────────────────▼──────────────────────────────┐
│ Conductor (Opus 4.6) │
│ - Plans work decomposition │
│ - Spawns Sonnet sub-agents (design, build, review) │
│ - Reviews artifacts, runs builds, captures screenshots│
│ - Iterates on feedback (max 2 rounds) │
│ - Commits and emits COMPLETE signal │
│ │
│ Sub-agents (Sonnet 4.6): │
│ design_lead, frontend_builder, whimsy_injector, │
│ content_curator, tone_guardian, rag_architect, │
│ reality_checker, performance_tuner, visual_qa │
│ │
│ Review gate (GPT-5.4 via Codex): │
│ Cross-model code review for independent perspective │
└──────────────────────────────────────────────────────┘
# Clone into your project
cp -r ralph-orchestra/.agents/ your-project/.agents/# Generate via agent
.agents/ralph/loop.sh prd "A todo app with auth, CRUD, and dark mode"
# Or create manually (see examples/prd-example.json)Edit .agents/ralph/config.sh:
PRD_PATH=.agents/tasks/prd.json
AGENT_MODE=orchestra # multi-agent (or "single" for legacy)
MAX_ITERATIONS=25# Single iteration
.agents/ralph/loop.sh build 1
# Run until all stories done (max 25 iterations)
.agents/ralph/loop.sh
# Dry run (no agent execution)
RALPH_DRY_RUN=1 .agents/ralph/loop.sh build 1{
"version": 1,
"project": "my-project",
"qualityGates": ["npm run typecheck", "npm run build"],
"stories": [
{
"id": "FE-001",
"title": "User Profile Page",
"status": "open",
"type": "mobile_ui",
"dependsOn": [],
"goal": "Users can view their profile",
"description": "Create profile page at /profile...",
"acceptanceCriteria": [
"Profile renders at /profile",
"npm run typecheck passes"
]
}
]
}Story ID prefixes determine pipeline type:
| Prefix | Type | Pipeline |
|---|---|---|
FE-, AV-, AX-, NF-, UI- |
mobile_ui | Full visual pipeline |
BE-, ND-, FN-, PF-, AN- |
backend | Code-only, skip visual QA |
WV-, WC-, NW- |
website | Web build, skip iOS |
CP-, RA-, PZ- |
content | Editorial pipeline |
CI-, XS-, MK- |
cross_cutting | Adaptive |
Story lifecycle: open → in_progress → done
Dependencies are respected — a story won't be picked until all dependsOn stories are done.
# In .agents/ralph/agents.sh:
AGENT_CODEX_CMD="codex exec --yolo --skip-git-repo-check -"
AGENT_CLAUDE_CMD="cat {prompt} | claude -p --model claude-sonnet-4-6 --dangerously-skip-permissions"
AGENT_DROID_CMD="droid exec --skip-permissions-unsafe -f {prompt}"
AGENT_OPENCODE_CMD="opencode run \"$(cat {prompt})\""
DEFAULT_AGENT="claude"| Agent | Model | Timeout | Role |
|---|---|---|---|
conductor |
Opus 4.6 | 90 min | Persistent orchestrator |
design_lead |
Sonnet 4.6 | 5 min | Visual specs |
frontend_builder |
Sonnet 4.6 | 20 min | Production code |
visual_qa |
Sonnet 4.6 | 30 min | Testing + screenshots |
review_gate |
GPT-5.4 | 15 min | Cross-model code review |
| 10+ more | Sonnet 4.6 | 5-20 min | Specialized agents |
.ralph/ # Ralph runtime state
├── progress.md # Append-only progress log
├── guardrails.md # Learned failure patterns ("Signs")
├── activity.log # Timestamped activity log
├── errors.log # Error tracking
└── runs/ # Per-iteration logs + summaries
.agents/orchestra/state/{run-id}/ # Per-run state (auto-cleaned)
├── plan.json # Story decomposition
├── prompt-{agent}.md # Rendered prompts
├── log-{agent}.txt # Agent output logs
├── design_spec.md # Design agent artifact
├── qa_report.json # Visual QA scores
├── review_scores.json # Code review scores
└── conductor_failure.json # Failure context (if failed)
.agents/orchestra/metrics/ # JSONL metrics
├── pipeline.jsonl # Per-story metrics
└── agents.jsonl # Per-agent metrics
.agents/orchestra/memory/ # Cross-story learnings
├── design_patterns.md
├── animation_patterns.md
└── perf_patterns.md
- Dependency-aware story selection — stories with unmet
dependsOnare skipped - Stale story recovery —
STALE_SECONDSconfig reopens stuckin_progressstories - File-locked PRD updates — safe for concurrent access (fcntl)
- Process group isolation —
os.setsid+os.killpgfor reliable timeout kills - Cross-story memory — learnings from design/perf/a11y artifacts persist across stories
- Dynamic guardrails — repeated review failures auto-append warnings
- Metrics collection — wall time, success rate, review scores in JSONL
- Completion signal —
<promise>COMPLETE</promise>in stdout marks story done
- Create prompt:
.agents/orchestra/prompts/my_agent.md - Register in
config.py:
"my_agent": _agent("my_agent", timeout=600),- Reference in conductor prompt or pipeline steps
Edit .agents/orchestra/classifier.py to add new prefixes:
_PREFIX_MAP["MY"] = StoryType.BACKENDEdit _PIPELINE_STEPS in pipeline.py to change what runs per story type.
Multi-agent and autonomous coding loop systems:
| Project | Description | Link |
|---|---|---|
| Ralph | The original bash loop this builds on. Minimal, file-based, agent-agnostic. Ships as npm package. | github.com/iannuttall/ralph |
| Claude Code | Anthropic's official agentic coding CLI. The agent runtime Ralph orchestrates. | github.com/anthropics/claude-code |
| OpenAI Codex CLI | OpenAI's terminal coding agent. Supported as Ralph agent backend. | github.com/openai/codex |
| Aider | AI pair programming in the terminal. Git-aware, multi-model. | github.com/paul-gauthier/aider |
| SWE-agent | Princeton's autonomous SWE-bench solver. Agent-computer interface for coding. | github.com/princeton-nlp/SWE-agent |
| OpenHands | Multi-agent framework for software dev (formerly OpenDevin). | github.com/All-Hands-AI/OpenHands |
| Devon | Open-source pair programmer. Multi-step task execution. | github.com/entropy-research/Devon |
| Mentat | AI coding assistant that works with your whole codebase. | github.com/AbanteAI/mentat |
| GPT Engineer | Specify what you want, AI builds it. Full-project generation. | github.com/gpt-engineer-org/gpt-engineer |
| AutoCodeRover | Autonomous program improvement via reasoning + code search. | github.com/nus-apr/auto-code-rover |
| Plandex | AI coding engine for complex tasks. Multi-file, multi-step. | github.com/plandex-ai/plandex |
| Sweep | AI junior developer that turns issues into PRs. | github.com/sweepai/sweep |
| bolt.diy | Full-stack AI app generation in the browser. | github.com/stackblitz-labs/bolt.diy |
Most tools above are single-agent — one model does everything. Ralph Orchestra is a multi-model conductor architecture:
- Opus plans and orchestrates (expensive but strategic)
- Sonnet builds and reviews (fast and cheap)
- GPT-5.4 provides cross-model code review (independent perspective)
- Bash loop handles iteration, state, and recovery (no model needed)
The conductor pattern means the expensive model (Opus) only runs once per story, spending tokens on high-level decisions. Cheap models do the heavy lifting.
- Python 3.10+
- One or more agent CLIs installed:
claude(Anthropic Claude Code)codex(OpenAI Codex CLI)droid(Factory Droid)opencode(OpenCode)
- Git
- Bash
MIT