A self-improving harness for multi-agent systems.
zicato wraps a multi-agent system you already have — a coordinator + specialists, a deep sub-agent tree, a single LlmAgent, whatever shape — and turns it into the inner harness of a learning loop. It runs your system against a board of tasks, watches what goes wrong via structured runtime telemetry, and rewrites the inner harness so the next generation goes less wrong.
zicato is the third member of an ecosystem:
- goldfive — orchestration scaffolding:
goals, plans, per-turn drift analysis, an intervention ladder. Emits a typed
event stream (
goldfive.v1.Event) that names what went wrong in a run. - harmonograf — the observability
- HCI console: Gantt, graph, trajectory, intervention history. Renders the goldfive stream live and lets operators steer.
- zicato — the meta-loop: same telemetry stream, but consumed across many runs. zicato aggregates drift into loss patterns, proposes structured edits to the inner harness (agent instructions, tool descriptions, planner templates, role scopes), runs tournaments, and promotes the patches that reduce loss.
| Layer | Owner | Cadence |
|---|---|---|
| Single-turn refine (replan in response to drift) | goldfive | within one run |
| Operator-driven steering | harmonograf | within one run |
| Inner-harness rewrites across runs | zicato | across generations |
Goldfive owns plans; zicato owns the prompts and structure that produce the plans. The two are complementary: goldfive handles "this run wandered, replan this run", zicato handles "this kind of run keeps wandering the same way, rewrite the harness".
Alpha. Design and surface are under active iteration — the public API will
break. The first reference adapter targets Google ADK (the framework goldfive
itself wires deepest into). The design is framework-agnostic at its core:
any inner harness that fronts a HarnessAdapter and emits goldfive telemetry
can participate. LangChain and plain-callable adapters land after ADK.
zicato calls LLMs only through a narrow call_llm(system, user, model) -> str
callable supplied by the caller. No vendor SDK is imported by the library
itself; bring whatever model you want.
uv sync --all-extras # install package + dev tooling (ruff, mypy, pytest, pre-commit, ...)
make install-hooks # equivalent to `uv run pre-commit install`uv sync --all-extras always — bare uv sync will drop the dev extras from
.venv/. make install-hooks writes a .git/hooks/pre-commit shim that runs
the project's own pre-commit (from .venv/) so git commit checks match
uv run pre-commit run --all-files.
The full design lives under docs/design/. Read
ARCHITECTURE.md first; everything else assumes it.
docs/design/ARCHITECTURE.md— top-level: what zicato is, the meta-loop diagram, every component, the cadence comparison against goldfive and harmonograf.docs/design/MUTATION-SURFACE.md— annotated mutation points: span and file markers, AST resolution, theMutationPointshape, validator constraints, thezicato mutationsaudit CLI.docs/design/BOARD-FORMAT.md— JSONL board entry schema: common fields, the three entry kinds (single-turn, multi-turn scripted, multi-turn emulated), the five expectation kinds.docs/design/EPOCHS-AND-JOURNALING.md— epoch lifecycle, theExperimentartifact (hypothesis + patches + outcome),journal.mdand the closing analysis pass, cross-epoch lineage.docs/design/TELEMETRY.md— capturing goldfive'sgoldfive.v1.Eventstream via itsJSONLPersistenceSink, the post-run reducer, theLossProfileshape, the emulator'szicato:emulatoraudit lane.docs/design/SCORING.md— the weighted drift-loss formula, the pass-rate side, the tournament promotion gate (margin on drift + strict monotonicity on pass-rate), fast mode.docs/design/TOURNAMENT.md— the competition model: the king-of-the-hill gauntlet (champion vs successive challengers), the dashboard Tournament view (bracket + per-matchup detail), the tournament-detail analytics (verdict transparency, per-entry A/B grid, hypothesis ledger, optimization trajectory, mutation heat map, cost), and the harmonograf split — execution view vs competition view.docs/design/SELECTION.md— the decision theory under the tournament: how RL gating, racing, and bracket schedulers make the champion-vs-challenger decision; why zicato's gauntlet is a degenerate elitist iterated race; why brackets (single/double-elim, Swiss) are the wrong primitive here; and the phased path to replication-based racing (paired significance gate, winner's-curse confirmation, trust-region step bound). Diagrams + cited sources.docs/design/EMULATOR.md— the multi-turn user emulator: the two-callable rule (hard error on identity match), sealed context construction, answer-leak heuristic, audit-trail spans.docs/design/DOGFOOD-TARGETS.md— the three targets (presentation agent v0; goldfive's steering v0+1; zicato itself v0+2) and the v0 design commitments they force.docs/design/RUNTIME.md—.zicato/runtime/state file layout, the two processeszicato evolveauto-spawns (a Rust watchdog supervisor on :7920 and a separate Python dashboard service on :7892), heartbeat protocol, signal escalation, single-writer concurrency model.docs/design/DASHBOARD.md— the live console for an in-flight epoch: Starlette HTTP + SSE architecture, the home view's cross-epoch meta-loop ledger, the live racing hero (full-width scalar track + rung stepper, champion-gate rows, WHAT'S RUNNING / LIVE ACTIVITY), the first-class tournament Builder view (#/builder) and the routed Settings drawer (Contract tab reuses the builder's live preview), per-entry continuous score + precision/recall, the full GET API surface, and the control-file protocol for operator actions.docs/design/PROPOSER.md— the proposer as a first-class contract input: the default tool-using ADK agent (skill-composed is the explicit opt-in), the read-only proposer tool registry, the board-anonymized train-slice-only failure-mode feedback channel (outcome_summarizer_spec), and why a proposer/skills change rolls the epoch.docs/design/ROBUSTNESS.md— the six-layer defense model (asyncio timeouts → cancellation → subprocess workers → watchdog → circuit breaker → atomic writes), what each layer catches, failure-mode tables, the GIL discussion that makes subprocess isolation non-negotiable, phasing.docs/design/LOOP-HEALTH.md— loop-health diagnostics: detecting a running-but-meaningless loop (a degenerate, toothless evaluation), the five detectors and severities, theLoopHealthreport, thezicato healthCLI, and how the orchestrator surfaces critical findings.docs/design/STORAGE.md— the pluggableStorageBackend(file + memory backends) and theGenerationStoreprotocol with both directory and git backends shipping; the v0 directory-snapshot layout; the three-storage-concerns split; and the still-roadmap operator git CLI (zicato repo/log/diff/show/bisect/blame,workspace migrate-to-git).docs/design/ANALYTICAL-INDEX.md— the.zicato/index.dbSQLite analytical index: why cross-run views are queries not file-walks, the eight-table schema, the files-canonical / index-derived discipline,zicato reindex, and the nine-table schema (SCHEMA_VERSION 2).docs/design/CLI.md— full CLI reference: every subcommand, every flag, exit codes, scripting hints.docs/design/RATIONALE.md— the "why" behind every major decision: annotated mutation points, per-epoch contract, mandatory hypothesis, collusion-proof emulator, drift taxonomy as features.docs/design/VOCABULARY.md— glossary of load-bearing terms (epoch, generation, run, round, experiment, hypothesis, outcome, loss profile, pattern, tournament, lineage, rubric).
The mark, wordmark, lockups, tile, and favicons live in
docs/brand/ — see docs/brand/README.md
for the construction story (golden logarithmic spiral · damped-sine sparkline ·
one plucked-note accent), the color tokens, and the theme-adaptive rule
(currentColor ink + a --zicato-accent custom property).
Apache-2.0. See LICENSE.