Skip to content

idvxlab/OpenInfoDesign

Repository files navigation

AI Design Agent Harness

A project-level OpenCode multi-agent system that turns a natural-language design brief into a curated set of professional design deliverables.

Built for the AI for Design 实训 brief: "让 AI 像团队一样设计 — 打造个人专属的多智能体创意系统". Implements the full Planner / Designer / Critic + Research + Primary topology described in the project requirements and the GPT-5.5-Pro implementation guidelines.

Everything lives in .opencode/ at the project root. There is no global install step beyond pnpm install. Hand the folder to a teammate and /design works.


1. What it does

Given a user brief such as:

/design 请为创智学院做一套品牌形象设计

the harness:

  1. Asks (only if necessary) for clarification with structured options via the question tool.
  2. Researches the subject online and writes evidence.json + brand_lock.md, and downloads reference images (official logo, campus photos, peer references) into research/assets/ via research_asset_fetch. The Designer later passes those binaries to image_edit so the final PNG set is grounded in real research, never duplicating any official identity.
  3. Plans a deliverable manifest: a curated PNG image set (≥ 10 PNGs) plus a single self-contained gallery HTML, choosing method=image_edit vs method=image_generate for each deliverable based on the asset library.
  4. Designs every PNG via either image_edit (with references from research/assets/) or image_generate, with copy / layout baked directly into the rendered pixel. Wraps the set in artifacts/00-gallery.html — the ONLY HTML deliverable.
  5. Critiques against a 10-dimension rubric with hard-fail gates (non-duplication, reference grounding, gallery completeness, lint-clean), issues a precise patch list, and triggers up to 2 focused revision rounds.
  6. Packages the final deliverable set into outputs/runs/<runId>/final/ with a clickable 00-index.html, an inline thumb-grid, the research-asset sidebar, a sha256 manifest, a revisions-history section (v1 vs. v2 PNGs preserved automatically) and a technical-notes report (with reference-grounding stats).

The user never sees the subagents directly — they are coordinated by design-primary through a typed JSONL message bus + structured file blackboard.


2. Architecture at a glance

                      ┌──────────────────────────┐
                      │     user (TUI / CLI)     │
                      └────────────┬─────────────┘
                                   │  /design "<brief>"
                                   ▼
                      ┌──────────────────────────┐
                      │    design-primary        │   Claude Opus 4.7 (max)
                      │    (orchestrator)        │
                      └─┬──────┬──────┬──────┬───┘
                        │      │      │      │             task tool
              ┌─────────┘      │      │      └────────┐    (only Primary →)
              ▼                ▼      ▼               ▼
    ┌──────────────────┐ ┌────────────┐ ┌─────────────┐ ┌──────────────┐
    │ design-research  │ │ design-    │ │ design-     │ │ design-      │
    │ (GPT 5.5 high)   │ │ planner    │ │ designer    │ │ critic       │
    │ websearch/fetch  │ │ (Opus 4.7) │ │ (Opus 4.7)  │ │ (GPT 5.5)    │
    └────────┬─────────┘ └─────┬──────┘ └─────┬───────┘ └──────┬───────┘
             │                 │              │                │
             ▼                 ▼              ▼                ▼
   ┌──────────────────────────────────────────────────────────────────┐
   │       .design-harness/runs/<runId>/ (typed file blackboard)      │
   │   research/   plan/   artifacts/   review/   bus.jsonl           │
   └──────────────────────────────────────────────────────────────────┘
                                   │
                                   ▼  export_package
                      ┌────────────────────────────────────────┐
                      │ outputs/runs/<runId>/final/            │
                      │  00-index.html                         │
                      │  artifacts/                            │
                      │    edits/, generated-images/  (latest) │
                      │    _revisions/r<N>/  (v1, v2, …)       │
                      │  plan/, research/, review/             │
                      │  package-manifest.json                 │
                      └────────────────────────────────────────┘

Communication rules:

  • Only Primary may invoke subagents (task is denied for every subagent).
  • Subagents communicate bidirectionally through bus.jsonl and shared files — they read each other's outputs and post follow-up requests; they never call each other directly.
  • Revision loop: Designer → Critic → Designer revise → Critic round 2 (max 2 rounds).
  • Hard fail gates (non-duplication, missing deliverable, broken lint) stop the loop early.

Model routing principle. Designer (Opus 4.7) and Critic (GPT-5.5) are intentionally on different model families so the verifier doesn't inherit the builder's blind spots. The highest-leverage production role (Designer) gets the strongest structured-production model; the deterministic grading role (Critic) gets the rigorous, low-temperature cross-family verifier. Research (GPT-5.5) feeding Planner / Designer (both Opus) gives the same property for upstream evidence.

See ARCHITECTURE.md for a deeper write-up.


3. Repository layout

ai-design-harness/
├── opencode.json                    # agents · permissions · command
├── package.json
├── tsconfig.json
├── README.md                        # this file
├── ARCHITECTURE.md                  # system design, data flow, failure modes
├── .env.example                     # image backend + concurrency config
│
├── .opencode/
│   ├── agents/
│   │   ├── design-primary.md
│   │   ├── design-research.md
│   │   ├── design-planner.md
│   │   ├── design-designer.md
│   │   └── design-critic.md
│   ├── tools/
│   │   ├── run_init.ts              # allocate runId + scaffold dirs
│   │   ├── design_bus.ts            # post + read (typed JSONL bus)
│   │   ├── image_generate.ts        # gpt-image-2 text-to-image client (with retry)
│   │   ├── image_edit.ts            # gpt-image-2 multipart /v1/images/edits client (with retry)
│   │   ├── research_fetch.ts        # record evidence entries (citations)
│   │   ├── research_asset_fetch.ts  # download reference images into research/assets/
│   │   ├── artifact_lint.ts         # validate Designer outputs + grounding stats
│   │   └── export_package.ts        # assemble final set + index.html + grounding report
│   ├── skills/
│   │   ├── design-harness-protocol/SKILL.md   # inter-agent protocol
│   │   ├── brand-identity/SKILL.md
│   │   ├── visual-composition/SKILL.md
│   │   ├── copywriting-bilingual/SKILL.md
│   │   ├── poster-layout/SKILL.md
│   │   ├── image-prompting/SKILL.md
│   │   └── critic-rubric/SKILL.md
│   └── commands/
│       └── design.md                # /design <brief>
│
├── harness/
│   └── run.ts                       # optional headless SDK runner
│
├── outputs/                         # final, shippable deliverable sets
│   ├── runs/                        # real production runs: outputs/runs/<runId>/final/
│   └── _tests/                      # test-script artifacts kept off the top level
└── .design-harness/                 # runtime scratch (per-run, gitignored)

4. Setup

4.1 Prerequisites

  • Node ≥ 20 and pnpm (or npm)
  • OpenCode CLI installed: npm install -g opencode-ai
  • An OpenAI-compatible image-generation endpoint reachable from your machine (defaults to https://api.openai.com/v1/images/* — swap to your own router via DESIGN_IMAGE_ENDPOINT / DESIGN_IMAGE_EDIT_ENDPOINT)
  • A reasoning LLM that OpenCode can reach. Two model IDs are needed — one BUILDER (used by Primary / Planner / Designer) and one VERIFIER (used by Research / Critic). They are configured purely via environment variables — no hard-coded model IDs anywhere in the repo.

4.2 Quickstart for graders (≈ 3 min)

# 1. Clone + install
git clone https://github.com/KevinWang676/ai-design-harness.git
cd ai-design-harness
pnpm install

# 2. Authenticate at least one LLM provider with OpenCode
opencode auth login        # interactive — pick anthropic / openai / openrouter
opencode models | head     # confirm at least one provider/model is listed

# 3. Configure the harness env vars
cp .env.example .env
# Then edit .env:
#   - DESIGN_LLM_BUILDER_MODEL   = one of the IDs from `opencode models`
#   - DESIGN_LLM_VERIFIER_MODEL  = a DIFFERENT ID (cross-family preferred)
#   - DESIGN_IMAGE_API_KEY       = an OpenAI key that can call
#                                  /v1/images/generations + /v1/images/edits
#                                  (or swap DESIGN_IMAGE_ENDPOINT to your
#                                   own gpt-image-2 compatible router)

# 4. Sanity-check that OpenCode resolves the configured models
opencode models | grep -E "$(grep -E '^DESIGN_LLM_(BUILDER|VERIFIER)_MODEL=' .env | cut -d= -f2- | tr -d '"' | sed 's|/|.|g' | paste -sd '|' -)"
# both BUILDER + VERIFIER IDs should print

# 5. Offline smoke test (no network)
DESIGN_IMAGE_BACKEND=mock npx tsx scripts/smoke-test.ts
# expected: [smoke] ALL CHECKS PASSED ✓

# 6. Run the real pipeline against the official brief
#    IMPORTANT: launch via the wrapper so .env is exported BEFORE opencode
#    parses opencode.json. Plain `opencode` does NOT pick up DESIGN_LLM_*
#    values from .env for {env:...} substitutions (you'd get
#    "/chat/completions" cannot be parsed as a URL).
./bin/start.sh              # macOS / Linux  — launches the TUI
# .\bin\start.ps1           # Windows PowerShell equivalent

# then inside the TUI:
#   /design 请为上海创智学院做一套品牌形象设计

The final deliverable lands in outputs/runs/<runId>/final/; open 00-index.html to inspect.

4.3 The four LLM env vars at a glance

opencode.json reads these at startup (auto-loaded from .env):

Variable Purpose Example
DESIGN_LLM_BUILDER_MODEL Model ID for Primary / Planner / Designer anthropic/claude-opus-4-1-20250805
DESIGN_LLM_VERIFIER_MODEL Model ID for Research / Critic openai/gpt-5
DESIGN_LLM_BASE_URL (Setup B only) baseURL for the bundled design_harness custom provider http://127.0.0.1:8318/v1
DESIGN_LLM_API_KEY (Setup B only) apiKey for the design_harness custom provider sk-...

Three ready-to-use setups (each fully documented inline in .env.example):

  • Setup A: built-in providers (easiest for graders) — opencode auth login to any provider OpenCode supports natively (anthropic, openai, openrouter, azure-openai, amazon-bedrock, google, …); set the model IDs to whatever opencode models lists. Do NOT set DESIGN_LLM_BASE_URL / DESIGN_LLM_API_KEY.
  • Setup B: custom OpenAI-compatible router — set DESIGN_LLM_BASE_URL=https://your-router/v1 + DESIGN_LLM_API_KEY=... and point both model IDs at the design_harness provider: DESIGN_LLM_BUILDER_MODEL=design_harness/<model-name-your-router-serves>. The shipped opencode.json pre-declares two model entries under design_harness (claude-opus-4-7(max) and gpt-5.5(high)); if your router serves different names, either edit the models: block in opencode.json or rename your router's models to match.
  • Setup C: any other OpenCode provider — just set the model IDs to any provider/model string that appears in opencode models.

4.4 Run modes

Why the wrapper script? OpenCode parses opencode.json (and resolves {env:...} substitutions like DESIGN_LLM_BUILDER_MODEL) at startup, reading from the OS process environment. Two gotchas combine to make plain opencode unreliable for this project:

  1. OpenCode does NOT auto-load .env, so any {env:VAR} placeholder in opencode.json is replaced with an empty string at config-parse time.
  2. Even with the vars exported, on OpenCode 1.15.x the {env:VAR} substitution does not always run on nested fields such as agent.<name>.model — the literal placeholder leaks through and you see Model not found: {env:DESIGN_LLM_BUILDER_MODEL}/..

The bin/start.sh (macOS/Linux) and bin/start.ps1 (Windows) wrappers in this repo work around both problems by:

  1. source-ing .env into the shell process,
  2. running bin/resolve-config.mjs (a tiny Node helper) to pre-substitute every {env:VAR} / {file:...} placeholder in opencode.json,
  3. passing the fully-resolved JSON to OpenCode via the OPENCODE_CONFIG_CONTENT env var (OpenCode's highest non-managed config-source priority — bypasses on-disk parsing entirely),
  4. exec'ing opencode.

The headless pnpm design:run / pnpm design:demo path does the same in-process: it calls process.loadEnvFile('.env') and then shells out to bin/resolve-config.mjs to set OPENCODE_CONFIG_CONTENT before createOpencodeServer() spawns the embedded server.

A. OpenCode TUI — interactive, recommended for first run:

./bin/start.sh                        # macOS / Linux
# .\bin\start.ps1                     # Windows PowerShell
# then inside the TUI:
/design 请为创智学院做一套品牌形象设计

pnpm start is wired to the same wrapper as a shortcut.

B. Headless SDK runner — for CI / scripted runs (auto-loads .env):

pnpm design:demo
# or:
pnpm design:run "Design a campaign identity extension for ..."

C. Manual env-var export — if you don't want to use the wrapper:

set -a && source .env && set +a        # macOS / Linux (bash / zsh)
opencode

The harness writes:

  • per-run scratch under .design-harness/runs/<runId>/
  • final, shippable set under outputs/runs/<runId>/final/ (open 00-index.html)

Test scripts (pnpm smoke, live-probe, etc.) write into outputs/_tests/<category>/runs/<runId>/ instead, so test artifacts never crowd the production runs in outputs/runs/.


5. Image generation + editing

Configured for any OpenAI-compatible endpoint with two routes — text-to-image and image-edit. Defaults to OpenAI's public API so the harness works out of the box once you supply a key; swap the endpoints for any router (a local gpt-image-2 sandbox, Azure OpenAI, …) by setting the env vars below.

Variable Default
DESIGN_IMAGE_BACKEND codex
DESIGN_IMAGE_ENDPOINT https://api.openai.com/v1/images/generations
DESIGN_IMAGE_EDIT_ENDPOINT https://api.openai.com/v1/images/edits
DESIGN_IMAGE_API_KEY (required for codex backend; set in .env)
DESIGN_IMAGE_MODEL gpt-image-1
DESIGN_IMAGE_DEFAULT_SIZE 1024x1024

An optional second backend, gemini-3-pro-image-preview, routes both text-to-image and image-edit through an OpenAI-compatible /v1/chat/completions URL. Enable it by setting DESIGN_IMAGE_PROVIDER=gemini-3-pro-image-preview and supplying DESIGN_IMAGE_REMOTE_ENDPOINT + DESIGN_IMAGE_REMOTE_API_KEY in .env.

image_generate (text-to-image):

  • Sends POST <endpoint> JSON with model, prompt, n, size, optional background / quality.
  • Accepts either b64_json (base64-decoded) or url (downloaded) in the response.
  • Writes the PNG plus a sidecar JSON (tool: "image_generate", prompt, model, sha256) into artifacts/generated-images/.

image_edit (compose references into a new PNG):

  • Sends POST <edit_endpoint> multipart/form-data with model, prompt, size, n, one or more image parts (the reference binaries from research/assets/) and an optional mask.
  • Designer prefers this whenever a research asset can ground a deliverable; the sidecar records tool: "image_edit" plus the sha256 of every reference so Critic can verify the official logo was never regenerated.
  • Writes the PNG into artifacts/edits/.

Both tools:

  • Validate size against the live backend's minimum (1024 × 1024); sub-1024 sizes are rejected at argument validation, not at upstream call time.
  • Retry transient 5xx / network failures with exponential backoff (502 stream disconnected before completion is occasional under load).
  • Fall back to mock (writes a real 1×1 PNG placeholder) when DESIGN_IMAGE_BACKEND=mock — used by pnpm smoke.

Use pnpm live-probe to confirm both routes work end-to-end against a running endpoint without invoking the full agent pipeline.


6. Concurrency & safety

Setting Default Notes
DESIGN_AGENT_CONCURRENCY 2 Internal cap; agents are mostly sequenced.
DESIGN_MAX_REVISION_ROUNDS 2 Designer ↔ Critic revision rounds.
task permission on subagents deny Subagents cannot recursively spawn subagents.

Hard-fail gates prevent the most common multi-agent failure modes:

  • recursive subagent fan-out (denied at permission level)
  • mid-run hallucination of nonexistent files (artifact_lint)
  • Critic rubber-stamping with bad scores (rubric thresholds)
  • single-file deliverable when the brief implies a set (Planner requires ≥ 10 items)

7. Final deliverable shape

outputs/runs/<runId>/final/ always contains:

00-index.html               clickable navigation (with revisions-history section)
00-brief.json               user brief + resolved scope
package-manifest.json       sha256 inventory (includes archived_revision_rounds)
11-technical-notes.md       short report (verdict, scores, outstanding issues,
                            revision history)

research/
  research.md
  evidence.json
  brand_lock.md
  assets/
    manifest.json          {id, kind, do_not_replace, allowed_for_edit, ...}
    official-logo.png       protected reference; Designer MUST NOT regenerate
    campus-1.jpg            editable reference photo
    ...
plan/
  design_system.json          must-use design contract (palette, type, motif, …)
  design_plan.json
  acceptance_criteria.md
  task_breakdown.md
  deliverable_manifest.json   ≥ 10 PNGs + 1 gallery HTML
artifacts/
  00-gallery.html              the ONLY HTML deliverable; self-contained
  generated-images/            latest revision — output of image_generate
    02-campaign-poster-zh.png + .png.json
    ...
  edits/                       latest revision — output of image_edit (uses Research references)
    01-logo-application-poster.png + .png.json
    ...
  _revisions/                  prior-round versions, auto-archived
    r1/edits/01-logo-application-poster.png + .png.json
    r1/generated-images/02-campaign-poster-zh.png + .png.json
    ...                        (one r<N>/ subtree per rejected revision round)
  artifact-manifest.json
review/
  critique_round_1.json + critique_round_1.md
  critique_round_2.json + critique_round_2.md   (if a revision happened)
bus.jsonl                      full inter-agent message trace

The headline deliverable is a curated PNG image set + a single self-contained gallery HTML — not a single file, not a stack of HTML/SVG/JSON. Copy is baked into the rendered PNG via the image_edit / image_generate prompt.

When Critic rejects round N and Designer re-renders round N+1, the round-N PNGs are NOT overwritten: image_generate / image_edit move them into artifacts/_revisions/r<N>/ first. The live path (artifacts/edits/<id>.png, artifacts/generated-images/<id>.png) always holds the latest revision, so the gallery + index keep working; v1 (and any earlier) remain inspectable side-by-side under _revisions/ and are surfaced as thumbnails in 00-index.html's "Revisions history" section.


8. License

MIT. The two reference markdown briefs at the project root (AI_Design_Harness_Project_Requirements.md, Harness_Implementation_Guidelines.md) are kept verbatim for traceability.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors