PARLEY: TWO AIs DEBATE UNTIL THEY AGREE

A planning-loop prototype with two models and a human interrupt button.

Oss 2026.02.02

What I Built

I wanted to watch a proposer and a critic work on the same plan without copying text between chat windows. Parley streams both turns in one page and lets me interrupt with more context.

I have not measured whether agreement produces a better plan. This is an interface experiment, not evidence for multi-model reliability.

The Loop

Two model calls alternate. The run stops when each emits an agreement action, or when the human stops it.

User Prompt ──→ Agent A (Proposer)
                    │
                    ↓
              Initial Plan
                    │
                    ↓
              Agent B (Critic)
                    │
                    ↓
              Critique/Questions
                    │
                    ↓
              Agent A responds
                    │
                    ↓
              ... loop continues ...
                    │
                    ↓
           Both agents agree → DONE

Human can intervene anytime..add context, steer direction, or just watch the debate unfold.

The Architecture

Built on Cloudflare Workers with SvelteKit. Real-time streaming via Server-Sent Events. Model selection through OpenRouter (300+ models).

Agent A: The Proposer

## Your Role
You are the PRIMARY PROPOSER. Your job is to:
1. Propose initial plans and refinements
2. Respond constructively to Agent B's critiques
3. Ask clarifying questions when needed
4. Agree when the plan is complete

## Tools
- <think>...</think> .. Internal reasoning
- <propose_plan>...</propose_plan> .. Submit the plan
- <respond>...</respond> .. Address feedback
- <agree>...</agree> .. Accept the current plan

Agent B: The Critic

## Your Role
You are the CRITICAL REVIEWER. Your job is to:
1. Evaluate Agent A's proposals carefully
2. Identify gaps, risks, or improvements
3. Ask clarifying questions
4. Agree when the plan is truly complete

## Tools
- <think>...</think> .. Internal reasoning
- <critique>...</critique> .. Provide feedback
- <respond>...</respond> .. Answer questions
- <agree>...</agree> .. Accept the current plan

The Planning Loop

Server-side orchestration

// Planning continues until both agents agree
while (!agentAAgreed || !agentBAgreed) {
  // Get current agent's model
  const model = currentAgent === 'agent-a' ? modelA : modelB;

  // Stream the response
  const response = await streamCompletion(model, messages);

  // Parse tool usage
  const parsed = parseAgentResponse(response);

  // Check for agreement
  if (isAgreement(parsed)) {
    if (currentAgent === 'agent-a') agentAAgreed = true;
    else agentBAgreed = true;
  }

  // Switch turns
  currentAgent = currentAgent === 'agent-a' ? 'agent-b' : 'agent-a';
}

Each agent's response is parsed for tool calls. Agreement detection ensures both agents must explicitly agree..one "looks good" isn't enough.

Human in the Loop

This isn't a black box. You can intervene at any point:

Add context .. "Don't forget about the database migration"
Steer direction .. "Focus more on error handling"
Ask questions .. "What happens if the API is down?"
Stop planning .. Can cut it short when needed

Your input gets injected into both agents' context as a human intervention. They treat it as authoritative steering.

What This Does Not Prove

Agreement is not verification. Two models can approve the same bad plan. Different providers can repeat the same assumption.

A useful next experiment would compare one-model and two-model plans on real implementations, cost, and failed gates. I have not run that experiment yet.

Try It

parley.coey.dev .. live demo

github.com/acoyfellow/parley .. source code

Deploy your own:

Quick start

git clone https://github.com/acoyfellow/parley
cd parley
bun install
echo "OPENROUTER_API_KEY=your_key" > .dev.vars
bun run dev

Inspiration

This builds on ideas from planning-with-files .. using the filesystem as extended memory for planning agents.

Also inspired by Manus AI's context engineering: persistent markdown files as checkpoints, hooks for attention manipulation, and treating the filesystem as unlimited storage for finite context windows.