AI Design Agent Harness

A project-level OpenCode multi-agent system that turns a natural-language design brief into a curated set of professional design deliverables.

Built for the AI for Design 实训 brief: "让 AI 像团队一样设计 — 打造个人专属的多智能体创意系统". Implements the full Planner / Designer / Critic + Research + Primary topology described in the project requirements and the GPT-5.5-Pro implementation guidelines.

Everything lives in .opencode/ at the project root. There is no global install step beyond pnpm install. Hand the folder to a teammate and /design works.

1. What it does

Given a user brief such as:

/design 请为创智学院做一套品牌形象设计

the harness:

Asks (only if necessary) for clarification with structured options via the question tool.
Researches the subject online and writes evidence.json + brand_lock.md, and downloads reference images (official logo, campus photos, peer references) into research/assets/ via research_asset_fetch. The Designer later passes those binaries to image_edit so the final PNG set is grounded in real research, never duplicating any official identity.
Plans a deliverable manifest: a curated PNG image set (≥ 10 PNGs) plus a single self-contained gallery HTML, choosing method=image_edit vs method=image_generate for each deliverable based on the asset library.
Designs every PNG via either image_edit (with references from research/assets/) or image_generate, with copy / layout baked directly into the rendered pixel. Wraps the set in artifacts/00-gallery.html — the ONLY HTML deliverable.
Critiques against a 10-dimension rubric with hard-fail gates (non-duplication, reference grounding, gallery completeness, lint-clean), issues a precise patch list, and triggers up to 2 focused revision rounds.
Packages the final deliverable set into outputs/runs/<runId>/final/ with a clickable 00-index.html, an inline thumb-grid, the research-asset sidebar, a sha256 manifest, a revisions-history section (v1 vs. v2 PNGs preserved automatically) and a technical-notes report (with reference-grounding stats).

The user never sees the subagents directly — they are coordinated by design-primary through a typed JSONL message bus + structured file blackboard.

2. Architecture at a glance

                      ┌──────────────────────────┐
                      │     user (TUI / CLI)     │
                      └────────────┬─────────────┘
                                   │  /design "<brief>"
                                   ▼
                      ┌──────────────────────────┐
                      │    design-primary        │   Claude Opus 4.7 (max)
                      │    (orchestrator)        │
                      └─┬──────┬──────┬──────┬───┘
                        │      │      │      │             task tool
              ┌─────────┘      │      │      └────────┐    (only Primary →)
              ▼                ▼      ▼               ▼
    ┌──────────────────┐ ┌────────────┐ ┌─────────────┐ ┌──────────────┐
    │ design-research  │ │ design-    │ │ design-     │ │ design-      │
    │ (GPT 5.5 high)   │ │ planner    │ │ designer    │ │ critic       │
    │ websearch/fetch  │ │ (Opus 4.7) │ │ (Opus 4.7)  │ │ (GPT 5.5)    │
    └────────┬─────────┘ └─────┬──────┘ └─────┬───────┘ └──────┬───────┘
             │                 │              │                │
             ▼                 ▼              ▼                ▼
   ┌──────────────────────────────────────────────────────────────────┐
   │       .design-harness/runs/<runId>/ (typed file blackboard)      │
   │   research/   plan/   artifacts/   review/   bus.jsonl           │
   └──────────────────────────────────────────────────────────────────┘
                                   │
                                   ▼  export_package
                      ┌────────────────────────────────────────┐
                      │ outputs/runs/<runId>/final/            │
                      │  00-index.html                         │
                      │  artifacts/                            │
                      │    edits/, generated-images/  (latest) │
                      │    _revisions/r<N>/  (v1, v2, …)       │
                      │  plan/, research/, review/             │
                      │  package-manifest.json                 │
                      └────────────────────────────────────────┘

Communication rules:

Only Primary may invoke subagents (task is denied for every subagent).
Subagents communicate bidirectionally through bus.jsonl and shared files — they read each other's outputs and post follow-up requests; they never call each other directly.
Revision loop: Designer → Critic → Designer revise → Critic round 2 (max 2 rounds).
Hard fail gates (non-duplication, missing deliverable, broken lint) stop the loop early.

Model routing principle. Designer (Opus 4.7) and Critic (GPT-5.5) are intentionally on different model families so the verifier doesn't inherit the builder's blind spots. The highest-leverage production role (Designer) gets the strongest structured-production model; the deterministic grading role (Critic) gets the rigorous, low-temperature cross-family verifier. Research (GPT-5.5) feeding Planner / Designer (both Opus) gives the same property for upstream evidence.

See ARCHITECTURE.md for a deeper write-up.

3. Repository layout

ai-design-harness/
├── opencode.json                    # agents · permissions · command
├── package.json
├── tsconfig.json
├── README.md                        # this file
├── ARCHITECTURE.md                  # system design, data flow, failure modes
├── .env.example                     # image backend + concurrency config
│
├── .opencode/
│   ├── agents/
│   │   ├── design-primary.md
│   │   ├── design-research.md
│   │   ├── design-planner.md
│   │   ├── design-designer.md
│   │   └── design-critic.md
│   ├── tools/
│   │   ├── run_init.ts              # allocate runId + scaffold dirs
│   │   ├── design_bus.ts            # post + read (typed JSONL bus)
│   │   ├── image_generate.ts        # gpt-image-2 text-to-image client (with retry)
│   │   ├── image_edit.ts            # gpt-image-2 multipart /v1/images/edits client (with retry)
│   │   ├── research_fetch.ts        # record evidence entries (citations)
│   │   ├── research_asset_fetch.ts  # download reference images into research/assets/
│   │   ├── artifact_lint.ts         # validate Designer outputs + grounding stats
│   │   └── export_package.ts        # assemble final set + index.html + grounding report
│   ├── skills/
│   │   ├── design-harness-protocol/SKILL.md   # inter-agent protocol
│   │   ├── brand-identity/SKILL.md
│   │   ├── visual-composition/SKILL.md
│   │   ├── copywriting-bilingual/SKILL.md
│   │   ├── poster-layout/SKILL.md
│   │   ├── image-prompting/SKILL.md
│   │   └── critic-rubric/SKILL.md
│   └── commands/
│       └── design.md                # /design <brief>
│
├── harness/
│   └── run.ts                       # optional headless SDK runner
│
├── outputs/                         # final, shippable deliverable sets
│   ├── runs/                        # real production runs: outputs/runs/<runId>/final/
│   └── _tests/                      # test-script artifacts kept off the top level
└── .design-harness/                 # runtime scratch (per-run, gitignored)

4. Setup

4.1 Prerequisites

Node ≥ 20 and pnpm (or npm)
OpenCode CLI installed: npm install -g opencode-ai
An OpenAI-compatible image-generation endpoint reachable from your machine (defaults to https://api.openai.com/v1/images/* — swap to your own router via DESIGN_IMAGE_ENDPOINT / DESIGN_IMAGE_EDIT_ENDPOINT)
A reasoning LLM that OpenCode can reach. Two model IDs are needed — one BUILDER (used by Primary / Planner / Designer) and one VERIFIER (used by Research / Critic). They are configured purely via environment variables — no hard-coded model IDs anywhere in the repo.

4.2 Quickstart for graders (≈ 3 min)

# 1. Clone + install
git clone https://github.com/KevinWang676/ai-design-harness.git
cd ai-design-harness
pnpm install

# 2. Authenticate at least one LLM provider with OpenCode
opencode auth login        # interactive — pick anthropic / openai / openrouter
opencode models | head     # confirm at least one provider/model is listed

# 3. Configure the harness env vars
cp .env.example .env
# Then edit .env:
#   - DESIGN_LLM_BUILDER_MODEL   = one of the IDs from `opencode models`
#   - DESIGN_LLM_VERIFIER_MODEL  = a DIFFERENT ID (cross-family preferred)
#   - DESIGN_IMAGE_API_KEY       = an OpenAI key that can call
#                                  /v1/images/generations + /v1/images/edits
#                                  (or swap DESIGN_IMAGE_ENDPOINT to your
#                                   own gpt-image-2 compatible router)

# 4. Sanity-check that OpenCode resolves the configured models
opencode models | grep -E "$(grep -E '^DESIGN_LLM_(BUILDER|VERIFIER)_MODEL=' .env | cut -d= -f2- | tr -d '"' | sed 's|/|.|g' | paste -sd '|' -)"
# both BUILDER + VERIFIER IDs should print

# 5. Offline smoke test (no network)
DESIGN_IMAGE_BACKEND=mock npx tsx scripts/smoke-test.ts
# expected: [smoke] ALL CHECKS PASSED ✓

# 6. Run the real pipeline against the official brief
#    IMPORTANT: launch via the wrapper so .env is exported BEFORE opencode
#    parses opencode.json. Plain `opencode` does NOT pick up DESIGN_LLM_*
#    values from .env for {env:...} substitutions (you'd get
#    "/chat/completions" cannot be parsed as a URL).
./bin/start.sh              # macOS / Linux  — launches the TUI
# .\bin\start.ps1           # Windows PowerShell equivalent

# then inside the TUI:
#   /design 请为上海创智学院做一套品牌形象设计

The final deliverable lands in outputs/runs/<runId>/final/; open 00-index.html to inspect.

4.3 The four LLM env vars at a glance

opencode.json reads these at startup (auto-loaded from .env):

Variable	Purpose	Example
`DESIGN_LLM_BUILDER_MODEL`	Model ID for Primary / Planner / Designer	`anthropic/claude-opus-4-1-20250805`
`DESIGN_LLM_VERIFIER_MODEL`	Model ID for Research / Critic	`openai/gpt-5`
`DESIGN_LLM_BASE_URL`	(Setup B only) baseURL for the bundled `design_harness` custom provider	`http://127.0.0.1:8318/v1`
`DESIGN_LLM_API_KEY`	(Setup B only) apiKey for the `design_harness` custom provider	`sk-...`

Three ready-to-use setups (each fully documented inline in .env.example):

Setup A: built-in providers (easiest for graders) — opencode auth login to any provider OpenCode supports natively (anthropic, openai, openrouter, azure-openai, amazon-bedrock, google, …); set the model IDs to whatever opencode models lists. Do NOT set DESIGN_LLM_BASE_URL / DESIGN_LLM_API_KEY.
Setup B: custom OpenAI-compatible router — set DESIGN_LLM_BASE_URL=https://your-router/v1 + DESIGN_LLM_API_KEY=... and point both model IDs at the design_harness provider: DESIGN_LLM_BUILDER_MODEL=design_harness/<model-name-your-router-serves>. The shipped opencode.json pre-declares two model entries under design_harness (claude-opus-4-7(max) and gpt-5.5(high)); if your router serves different names, either edit the models: block in opencode.json or rename your router's models to match.
Setup C: any other OpenCode provider — just set the model IDs to any provider/model string that appears in opencode models.

4.4 Run modes

Why the wrapper script? OpenCode parses opencode.json (and resolves {env:...} substitutions like DESIGN_LLM_BUILDER_MODEL) at startup, reading from the OS process environment. Two gotchas combine to make plain opencode unreliable for this project:

OpenCode does NOT auto-load .env, so any {env:VAR} placeholder in opencode.json is replaced with an empty string at config-parse time.

Even with the vars exported, on OpenCode 1.15.x the {env:VAR} substitution does not always run on nested fields such as agent.<name>.model — the literal placeholder leaks through and you see Model not found: {env:DESIGN_LLM_BUILDER_MODEL}/..

The bin/start.sh (macOS/Linux) and bin/start.ps1 (Windows) wrappers in this repo work around both problems by:

source-ing .env into the shell process,

running bin/resolve-config.mjs (a tiny Node helper) to pre-substitute every {env:VAR} / {file:...} placeholder in opencode.json,

passing the fully-resolved JSON to OpenCode via the OPENCODE_CONFIG_CONTENT env var (OpenCode's highest non-managed config-source priority — bypasses on-disk parsing entirely),

exec'ing opencode.

The headless pnpm design:run / pnpm design:demo path does the same in-process: it calls process.loadEnvFile('.env') and then shells out to bin/resolve-config.mjs to set OPENCODE_CONFIG_CONTENT before createOpencodeServer() spawns the embedded server.

A. OpenCode TUI — interactive, recommended for first run:

./bin/start.sh                        # macOS / Linux
# .\bin\start.ps1                     # Windows PowerShell
# then inside the TUI:
/design 请为创智学院做一套品牌形象设计

pnpm start is wired to the same wrapper as a shortcut.

B. Headless SDK runner — for CI / scripted runs (auto-loads .env):

pnpm design:demo
# or:
pnpm design:run "Design a campaign identity extension for ..."

C. Manual env-var export — if you don't want to use the wrapper:

set -a && source .env && set +a        # macOS / Linux (bash / zsh)
opencode

The harness writes:

per-run scratch under .design-harness/runs/<runId>/
final, shippable set under outputs/runs/<runId>/final/ (open 00-index.html)

Test scripts (pnpm smoke, live-probe, etc.) write into outputs/_tests/<category>/runs/<runId>/ instead, so test artifacts never crowd the production runs in outputs/runs/.

5. Image generation + editing

Configured for any OpenAI-compatible endpoint with two routes — text-to-image and image-edit. Defaults to OpenAI's public API so the harness works out of the box once you supply a key; swap the endpoints for any router (a local gpt-image-2 sandbox, Azure OpenAI, …) by setting the env vars below.

Variable	Default
`DESIGN_IMAGE_BACKEND`	`codex`
`DESIGN_IMAGE_ENDPOINT`	`https://api.openai.com/v1/images/generations`
`DESIGN_IMAGE_EDIT_ENDPOINT`	`https://api.openai.com/v1/images/edits`
`DESIGN_IMAGE_API_KEY`	(required for `codex` backend; set in `.env`)
`DESIGN_IMAGE_MODEL`	`gpt-image-1`
`DESIGN_IMAGE_DEFAULT_SIZE`	`1024x1024`

An optional second backend, gemini-3-pro-image-preview, routes both text-to-image and image-edit through an OpenAI-compatible /v1/chat/completions URL. Enable it by setting DESIGN_IMAGE_PROVIDER=gemini-3-pro-image-preview and supplying DESIGN_IMAGE_REMOTE_ENDPOINT + DESIGN_IMAGE_REMOTE_API_KEY in .env.

image_generate (text-to-image):

Sends POST <endpoint> JSON with model, prompt, n, size, optional background / quality.
Accepts either b64_json (base64-decoded) or url (downloaded) in the response.
Writes the PNG plus a sidecar JSON (tool: "image_generate", prompt, model, sha256) into artifacts/generated-images/.

image_edit (compose references into a new PNG):

Sends POST <edit_endpoint> multipart/form-data with model, prompt, size, n, one or more image parts (the reference binaries from research/assets/) and an optional mask.
Designer prefers this whenever a research asset can ground a deliverable; the sidecar records tool: "image_edit" plus the sha256 of every reference so Critic can verify the official logo was never regenerated.
Writes the PNG into artifacts/edits/.

Both tools:

Validate size against the live backend's minimum (1024 × 1024); sub-1024 sizes are rejected at argument validation, not at upstream call time.
Retry transient 5xx / network failures with exponential backoff (502 stream disconnected before completion is occasional under load).
Fall back to mock (writes a real 1×1 PNG placeholder) when DESIGN_IMAGE_BACKEND=mock — used by pnpm smoke.

Use pnpm live-probe to confirm both routes work end-to-end against a running endpoint without invoking the full agent pipeline.

6. Concurrency & safety

Setting	Default	Notes
`DESIGN_AGENT_CONCURRENCY`	`2`	Internal cap; agents are mostly sequenced.
`DESIGN_MAX_REVISION_ROUNDS`	`2`	Designer ↔ Critic revision rounds.
`task` permission on subagents	deny	Subagents cannot recursively spawn subagents.

Hard-fail gates prevent the most common multi-agent failure modes:

recursive subagent fan-out (denied at permission level)
mid-run hallucination of nonexistent files (artifact_lint)
Critic rubber-stamping with bad scores (rubric thresholds)
single-file deliverable when the brief implies a set (Planner requires ≥ 10 items)

7. Final deliverable shape

outputs/runs/<runId>/final/ always contains:

00-index.html               clickable navigation (with revisions-history section)
00-brief.json               user brief + resolved scope
package-manifest.json       sha256 inventory (includes archived_revision_rounds)
11-technical-notes.md       short report (verdict, scores, outstanding issues,
                            revision history)

research/
  research.md
  evidence.json
  brand_lock.md
  assets/
    manifest.json          {id, kind, do_not_replace, allowed_for_edit, ...}
    official-logo.png       protected reference; Designer MUST NOT regenerate
    campus-1.jpg            editable reference photo
    ...
plan/
  design_system.json          must-use design contract (palette, type, motif, …)
  design_plan.json
  acceptance_criteria.md
  task_breakdown.md
  deliverable_manifest.json   ≥ 10 PNGs + 1 gallery HTML
artifacts/
  00-gallery.html              the ONLY HTML deliverable; self-contained
  generated-images/            latest revision — output of image_generate
    02-campaign-poster-zh.png + .png.json
    ...
  edits/                       latest revision — output of image_edit (uses Research references)
    01-logo-application-poster.png + .png.json
    ...
  _revisions/                  prior-round versions, auto-archived
    r1/edits/01-logo-application-poster.png + .png.json
    r1/generated-images/02-campaign-poster-zh.png + .png.json
    ...                        (one r<N>/ subtree per rejected revision round)
  artifact-manifest.json
review/
  critique_round_1.json + critique_round_1.md
  critique_round_2.json + critique_round_2.md   (if a revision happened)
bus.jsonl                      full inter-agent message trace

The headline deliverable is a curated PNG image set + a single self-contained gallery HTML — not a single file, not a stack of HTML/SVG/JSON. Copy is baked into the rendered PNG via the image_edit / image_generate prompt.

When Critic rejects round N and Designer re-renders round N+1, the round-N PNGs are NOT overwritten: image_generate / image_edit move them into artifacts/_revisions/r<N>/ first. The live path (artifacts/edits/<id>.png, artifacts/generated-images/<id>.png) always holds the latest revision, so the gallery + index keep working; v1 (and any earlier) remain inspectable side-by-side under _revisions/ and are surfaced as thumbnails in 00-index.html's "Revisions history" section.

8. License

MIT. The two reference markdown briefs at the project root (AI_Design_Harness_Project_Requirements.md, Harness_Implementation_Guidelines.md) are kept verbatim for traceability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Design Agent Harness

1. What it does

2. Architecture at a glance

3. Repository layout

4. Setup

4.1 Prerequisites

4.2 Quickstart for graders (≈ 3 min)

4.3 The four LLM env vars at a glance

4.4 Run modes

5. Image generation + editing

6. Concurrency & safety

7. Final deliverable shape

8. License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.opencode		.opencode
bin		bin
harness		harness
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
README.md		README.md
bun.lock		bun.lock
opencode.json		opencode.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

AI Design Agent Harness

1. What it does

2. Architecture at a glance

3. Repository layout

4. Setup

4.1 Prerequisites

4.2 Quickstart for graders (≈ 3 min)

4.3 The four LLM env vars at a glance

4.4 Run modes

5. Image generation + editing

6. Concurrency & safety

7. Final deliverable shape

8. License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages