A project-level OpenCode multi-agent system that turns a natural-language design brief into a curated set of professional design deliverables.
Built for the AI for Design 实训 brief: "让 AI 像团队一样设计 — 打造个人专属的多智能体创意系统". Implements the full Planner / Designer / Critic + Research + Primary topology described in the project requirements and the GPT-5.5-Pro implementation guidelines.
Everything lives in
.opencode/at the project root. There is no global install step beyondpnpm install. Hand the folder to a teammate and/designworks.
Given a user brief such as:
/design 请为创智学院做一套品牌形象设计
the harness:
- Asks (only if necessary) for clarification with structured options via the
questiontool. - Researches the subject online and writes
evidence.json+brand_lock.md, and downloads reference images (official logo, campus photos, peer references) intoresearch/assets/viaresearch_asset_fetch. The Designer later passes those binaries toimage_editso the final PNG set is grounded in real research, never duplicating any official identity. - Plans a deliverable manifest: a curated PNG image set (≥ 10 PNGs) plus a single self-contained gallery HTML, choosing
method=image_editvsmethod=image_generatefor each deliverable based on the asset library. - Designs every PNG via either
image_edit(with references fromresearch/assets/) orimage_generate, with copy / layout baked directly into the rendered pixel. Wraps the set inartifacts/00-gallery.html— the ONLY HTML deliverable. - Critiques against a 10-dimension rubric with hard-fail gates (non-duplication, reference grounding, gallery completeness, lint-clean), issues a precise patch list, and triggers up to 2 focused revision rounds.
- Packages the final deliverable set into
outputs/runs/<runId>/final/with a clickable00-index.html, an inline thumb-grid, the research-asset sidebar, a sha256 manifest, a revisions-history section (v1 vs. v2 PNGs preserved automatically) and a technical-notes report (with reference-grounding stats).
The user never sees the subagents directly — they are coordinated by design-primary through a typed JSONL message bus + structured file blackboard.
┌──────────────────────────┐
│ user (TUI / CLI) │
└────────────┬─────────────┘
│ /design "<brief>"
▼
┌──────────────────────────┐
│ design-primary │ Claude Opus 4.7 (max)
│ (orchestrator) │
└─┬──────┬──────┬──────┬───┘
│ │ │ │ task tool
┌─────────┘ │ │ └────────┐ (only Primary →)
▼ ▼ ▼ ▼
┌──────────────────┐ ┌────────────┐ ┌─────────────┐ ┌──────────────┐
│ design-research │ │ design- │ │ design- │ │ design- │
│ (GPT 5.5 high) │ │ planner │ │ designer │ │ critic │
│ websearch/fetch │ │ (Opus 4.7) │ │ (Opus 4.7) │ │ (GPT 5.5) │
└────────┬─────────┘ └─────┬──────┘ └─────┬───────┘ └──────┬───────┘
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────┐
│ .design-harness/runs/<runId>/ (typed file blackboard) │
│ research/ plan/ artifacts/ review/ bus.jsonl │
└──────────────────────────────────────────────────────────────────┘
│
▼ export_package
┌────────────────────────────────────────┐
│ outputs/runs/<runId>/final/ │
│ 00-index.html │
│ artifacts/ │
│ edits/, generated-images/ (latest) │
│ _revisions/r<N>/ (v1, v2, …) │
│ plan/, research/, review/ │
│ package-manifest.json │
└────────────────────────────────────────┘
Communication rules:
- Only Primary may invoke subagents (
taskis denied for every subagent). - Subagents communicate bidirectionally through
bus.jsonland shared files — they read each other's outputs and post follow-up requests; they never call each other directly. - Revision loop:
Designer → Critic → Designer revise → Critic round 2(max 2 rounds). - Hard fail gates (non-duplication, missing deliverable, broken lint) stop the loop early.
Model routing principle. Designer (Opus 4.7) and Critic (GPT-5.5) are intentionally on different model families so the verifier doesn't inherit the builder's blind spots. The highest-leverage production role (Designer) gets the strongest structured-production model; the deterministic grading role (Critic) gets the rigorous, low-temperature cross-family verifier. Research (GPT-5.5) feeding Planner / Designer (both Opus) gives the same property for upstream evidence.
See ARCHITECTURE.md for a deeper write-up.
ai-design-harness/
├── opencode.json # agents · permissions · command
├── package.json
├── tsconfig.json
├── README.md # this file
├── ARCHITECTURE.md # system design, data flow, failure modes
├── .env.example # image backend + concurrency config
│
├── .opencode/
│ ├── agents/
│ │ ├── design-primary.md
│ │ ├── design-research.md
│ │ ├── design-planner.md
│ │ ├── design-designer.md
│ │ └── design-critic.md
│ ├── tools/
│ │ ├── run_init.ts # allocate runId + scaffold dirs
│ │ ├── design_bus.ts # post + read (typed JSONL bus)
│ │ ├── image_generate.ts # gpt-image-2 text-to-image client (with retry)
│ │ ├── image_edit.ts # gpt-image-2 multipart /v1/images/edits client (with retry)
│ │ ├── research_fetch.ts # record evidence entries (citations)
│ │ ├── research_asset_fetch.ts # download reference images into research/assets/
│ │ ├── artifact_lint.ts # validate Designer outputs + grounding stats
│ │ └── export_package.ts # assemble final set + index.html + grounding report
│ ├── skills/
│ │ ├── design-harness-protocol/SKILL.md # inter-agent protocol
│ │ ├── brand-identity/SKILL.md
│ │ ├── visual-composition/SKILL.md
│ │ ├── copywriting-bilingual/SKILL.md
│ │ ├── poster-layout/SKILL.md
│ │ ├── image-prompting/SKILL.md
│ │ └── critic-rubric/SKILL.md
│ └── commands/
│ └── design.md # /design <brief>
│
├── harness/
│ └── run.ts # optional headless SDK runner
│
├── outputs/ # final, shippable deliverable sets
│ ├── runs/ # real production runs: outputs/runs/<runId>/final/
│ └── _tests/ # test-script artifacts kept off the top level
└── .design-harness/ # runtime scratch (per-run, gitignored)
- Node ≥ 20 and pnpm (or npm)
- OpenCode CLI installed:
npm install -g opencode-ai - An OpenAI-compatible image-generation endpoint reachable from your machine
(defaults to
https://api.openai.com/v1/images/*— swap to your own router viaDESIGN_IMAGE_ENDPOINT/DESIGN_IMAGE_EDIT_ENDPOINT) - A reasoning LLM that OpenCode can reach. Two model IDs are needed — one BUILDER (used by Primary / Planner / Designer) and one VERIFIER (used by Research / Critic). They are configured purely via environment variables — no hard-coded model IDs anywhere in the repo.
# 1. Clone + install
git clone https://github.com/KevinWang676/ai-design-harness.git
cd ai-design-harness
pnpm install
# 2. Authenticate at least one LLM provider with OpenCode
opencode auth login # interactive — pick anthropic / openai / openrouter
opencode models | head # confirm at least one provider/model is listed
# 3. Configure the harness env vars
cp .env.example .env
# Then edit .env:
# - DESIGN_LLM_BUILDER_MODEL = one of the IDs from `opencode models`
# - DESIGN_LLM_VERIFIER_MODEL = a DIFFERENT ID (cross-family preferred)
# - DESIGN_IMAGE_API_KEY = an OpenAI key that can call
# /v1/images/generations + /v1/images/edits
# (or swap DESIGN_IMAGE_ENDPOINT to your
# own gpt-image-2 compatible router)
# 4. Sanity-check that OpenCode resolves the configured models
opencode models | grep -E "$(grep -E '^DESIGN_LLM_(BUILDER|VERIFIER)_MODEL=' .env | cut -d= -f2- | tr -d '"' | sed 's|/|.|g' | paste -sd '|' -)"
# both BUILDER + VERIFIER IDs should print
# 5. Offline smoke test (no network)
DESIGN_IMAGE_BACKEND=mock npx tsx scripts/smoke-test.ts
# expected: [smoke] ALL CHECKS PASSED ✓
# 6. Run the real pipeline against the official brief
# IMPORTANT: launch via the wrapper so .env is exported BEFORE opencode
# parses opencode.json. Plain `opencode` does NOT pick up DESIGN_LLM_*
# values from .env for {env:...} substitutions (you'd get
# "/chat/completions" cannot be parsed as a URL).
./bin/start.sh # macOS / Linux — launches the TUI
# .\bin\start.ps1 # Windows PowerShell equivalent
# then inside the TUI:
# /design 请为上海创智学院做一套品牌形象设计The final deliverable lands in outputs/runs/<runId>/final/; open
00-index.html to inspect.
opencode.json reads these at startup (auto-loaded from .env):
| Variable | Purpose | Example |
|---|---|---|
DESIGN_LLM_BUILDER_MODEL |
Model ID for Primary / Planner / Designer | anthropic/claude-opus-4-1-20250805 |
DESIGN_LLM_VERIFIER_MODEL |
Model ID for Research / Critic | openai/gpt-5 |
DESIGN_LLM_BASE_URL |
(Setup B only) baseURL for the bundled design_harness custom provider |
http://127.0.0.1:8318/v1 |
DESIGN_LLM_API_KEY |
(Setup B only) apiKey for the design_harness custom provider |
sk-... |
Three ready-to-use setups (each fully documented inline in .env.example):
- Setup A: built-in providers (easiest for graders) —
opencode auth loginto any provider OpenCode supports natively (anthropic, openai, openrouter, azure-openai, amazon-bedrock, google, …); set the model IDs to whateveropencode modelslists. Do NOT setDESIGN_LLM_BASE_URL/DESIGN_LLM_API_KEY. - Setup B: custom OpenAI-compatible router — set
DESIGN_LLM_BASE_URL=https://your-router/v1+DESIGN_LLM_API_KEY=...and point both model IDs at thedesign_harnessprovider:DESIGN_LLM_BUILDER_MODEL=design_harness/<model-name-your-router-serves>. The shippedopencode.jsonpre-declares two model entries underdesign_harness(claude-opus-4-7(max)andgpt-5.5(high)); if your router serves different names, either edit themodels:block inopencode.jsonor rename your router's models to match. - Setup C: any other OpenCode provider — just set the model IDs to any
provider/modelstring that appears inopencode models.
Why the wrapper script? OpenCode parses
opencode.json(and resolves{env:...}substitutions likeDESIGN_LLM_BUILDER_MODEL) at startup, reading from the OS process environment. Two gotchas combine to make plainopencodeunreliable for this project:
- OpenCode does NOT auto-load
.env, so any{env:VAR}placeholder inopencode.jsonis replaced with an empty string at config-parse time.- Even with the vars exported, on OpenCode 1.15.x the
{env:VAR}substitution does not always run on nested fields such asagent.<name>.model— the literal placeholder leaks through and you seeModel not found: {env:DESIGN_LLM_BUILDER_MODEL}/..The
bin/start.sh(macOS/Linux) andbin/start.ps1(Windows) wrappers in this repo work around both problems by:
source-ing.envinto the shell process,- running
bin/resolve-config.mjs(a tiny Node helper) to pre-substitute every{env:VAR}/{file:...}placeholder inopencode.json,- passing the fully-resolved JSON to OpenCode via the
OPENCODE_CONFIG_CONTENTenv var (OpenCode's highest non-managed config-source priority — bypasses on-disk parsing entirely),- exec'ing
opencode.The headless
pnpm design:run/pnpm design:demopath does the same in-process: it callsprocess.loadEnvFile('.env')and then shells out tobin/resolve-config.mjsto setOPENCODE_CONFIG_CONTENTbeforecreateOpencodeServer()spawns the embedded server.
A. OpenCode TUI — interactive, recommended for first run:
./bin/start.sh # macOS / Linux
# .\bin\start.ps1 # Windows PowerShell
# then inside the TUI:
/design 请为创智学院做一套品牌形象设计pnpm start is wired to the same wrapper as a shortcut.
B. Headless SDK runner — for CI / scripted runs (auto-loads .env):
pnpm design:demo
# or:
pnpm design:run "Design a campaign identity extension for ..."C. Manual env-var export — if you don't want to use the wrapper:
set -a && source .env && set +a # macOS / Linux (bash / zsh)
opencodeThe harness writes:
- per-run scratch under
.design-harness/runs/<runId>/ - final, shippable set under
outputs/runs/<runId>/final/(open00-index.html)
Test scripts (pnpm smoke, live-probe, etc.) write into outputs/_tests/<category>/runs/<runId>/ instead, so test artifacts never crowd the production runs in outputs/runs/.
Configured for any OpenAI-compatible endpoint with two routes — text-to-image and image-edit. Defaults to OpenAI's public API so the harness works out of the box once you supply a key; swap the endpoints for any router (a local gpt-image-2 sandbox, Azure OpenAI, …) by setting the env vars below.
| Variable | Default |
|---|---|
DESIGN_IMAGE_BACKEND |
codex |
DESIGN_IMAGE_ENDPOINT |
https://api.openai.com/v1/images/generations |
DESIGN_IMAGE_EDIT_ENDPOINT |
https://api.openai.com/v1/images/edits |
DESIGN_IMAGE_API_KEY |
(required for codex backend; set in .env) |
DESIGN_IMAGE_MODEL |
gpt-image-1 |
DESIGN_IMAGE_DEFAULT_SIZE |
1024x1024 |
An optional second backend, gemini-3-pro-image-preview, routes both
text-to-image and image-edit through an OpenAI-compatible
/v1/chat/completions URL. Enable it by setting
DESIGN_IMAGE_PROVIDER=gemini-3-pro-image-preview and supplying
DESIGN_IMAGE_REMOTE_ENDPOINT + DESIGN_IMAGE_REMOTE_API_KEY in .env.
image_generate (text-to-image):
- Sends
POST <endpoint>JSON withmodel,prompt,n,size, optionalbackground/quality. - Accepts either
b64_json(base64-decoded) orurl(downloaded) in the response. - Writes the PNG plus a sidecar JSON (
tool: "image_generate", prompt, model, sha256) intoartifacts/generated-images/.
image_edit (compose references into a new PNG):
- Sends
POST <edit_endpoint>multipart/form-data withmodel,prompt,size,n, one or moreimageparts (the reference binaries fromresearch/assets/) and an optionalmask. - Designer prefers this whenever a research asset can ground a deliverable; the sidecar records
tool: "image_edit"plus the sha256 of every reference so Critic can verify the official logo was never regenerated. - Writes the PNG into
artifacts/edits/.
Both tools:
- Validate
sizeagainst the live backend's minimum (1024 × 1024); sub-1024 sizes are rejected at argument validation, not at upstream call time. - Retry transient 5xx / network failures with exponential backoff (
502 stream disconnected before completionis occasional under load). - Fall back to
mock(writes a real 1×1 PNG placeholder) whenDESIGN_IMAGE_BACKEND=mock— used bypnpm smoke.
Use pnpm live-probe to confirm both routes work end-to-end against a running endpoint without invoking the full agent pipeline.
| Setting | Default | Notes |
|---|---|---|
DESIGN_AGENT_CONCURRENCY |
2 |
Internal cap; agents are mostly sequenced. |
DESIGN_MAX_REVISION_ROUNDS |
2 |
Designer ↔ Critic revision rounds. |
task permission on subagents |
deny | Subagents cannot recursively spawn subagents. |
Hard-fail gates prevent the most common multi-agent failure modes:
- recursive subagent fan-out (denied at permission level)
- mid-run hallucination of nonexistent files (artifact_lint)
- Critic rubber-stamping with bad scores (rubric thresholds)
- single-file deliverable when the brief implies a set (Planner requires ≥ 10 items)
outputs/runs/<runId>/final/ always contains:
00-index.html clickable navigation (with revisions-history section)
00-brief.json user brief + resolved scope
package-manifest.json sha256 inventory (includes archived_revision_rounds)
11-technical-notes.md short report (verdict, scores, outstanding issues,
revision history)
research/
research.md
evidence.json
brand_lock.md
assets/
manifest.json {id, kind, do_not_replace, allowed_for_edit, ...}
official-logo.png protected reference; Designer MUST NOT regenerate
campus-1.jpg editable reference photo
...
plan/
design_system.json must-use design contract (palette, type, motif, …)
design_plan.json
acceptance_criteria.md
task_breakdown.md
deliverable_manifest.json ≥ 10 PNGs + 1 gallery HTML
artifacts/
00-gallery.html the ONLY HTML deliverable; self-contained
generated-images/ latest revision — output of image_generate
02-campaign-poster-zh.png + .png.json
...
edits/ latest revision — output of image_edit (uses Research references)
01-logo-application-poster.png + .png.json
...
_revisions/ prior-round versions, auto-archived
r1/edits/01-logo-application-poster.png + .png.json
r1/generated-images/02-campaign-poster-zh.png + .png.json
... (one r<N>/ subtree per rejected revision round)
artifact-manifest.json
review/
critique_round_1.json + critique_round_1.md
critique_round_2.json + critique_round_2.md (if a revision happened)
bus.jsonl full inter-agent message trace
The headline deliverable is a curated PNG image set + a single self-contained gallery HTML — not a single file, not a stack of HTML/SVG/JSON. Copy is baked into the rendered PNG via the image_edit / image_generate prompt.
When Critic rejects round N and Designer re-renders round N+1, the round-N PNGs are NOT overwritten: image_generate / image_edit move them into artifacts/_revisions/r<N>/ first. The live path (artifacts/edits/<id>.png, artifacts/generated-images/<id>.png) always holds the latest revision, so the gallery + index keep working; v1 (and any earlier) remain inspectable side-by-side under _revisions/ and are surfaced as thumbnails in 00-index.html's "Revisions history" section.
MIT. The two reference markdown briefs at the project root (AI_Design_Harness_Project_Requirements.md, Harness_Implementation_Guidelines.md) are kept verbatim for traceability.