Releases · evo-hq/evo

evo 0.5.3

Two main changes this release, plus a security bump.

The optimize loop runs in prose by default now. On Claude Code it used to reach for the dynamic workflow first and fall back to prose; that's reversed. The prose loop drives every host, and the workflow is something you opt into with evo config set default-orchestrator workflow. If you never set that, the only thing you'll notice is the loop running the prose way.

The dashboard surfaces gate failures and tidies up the score chart.

When the accuracy gate rejects an experiment, it's obvious at a glance now: the node gets a muted red marker and a small gate ✗ tag, and the detail panel says "Accuracy gate failed". Before, a gate rejection looked identical to an experiment that was simply slower than its parent.
The score-over-time chart has two buttons in its header: collapse it to a thin bar when you want the canvas back, or expand it to fill the column when you want to read the trend. It remembers the size you left it at.

Security: urllib3 bumped to 2.7.0 (CVE-2026-44431).

Full diff: v0.5.2...v0.5.3

evo 0.5.2 upgrades the meta controller in the optimize workflow (the default driver on Claude Code): it keeps notes across ticks, its prompt edits accumulate instead of overwriting each other, it can harden the verifier audits live, and model routing now follows your session model.

The meta keeps notes

Every meta tick is a fresh agent. Until now its only memory was a dedup list of findings it had already reported; the reasoning behind them was lost. Each tick can now leave a journal note (observations that aren't actionable yet, pending hypotheses with the evidence so far, watch-items to re-check), and recent notes are fed back into every subsequent tick. The full journal is returned in the workflow result as metaJournal, next to the harness edit log.

Prompt directives accumulate

When the meta edits a phase prompt with set-prompt, appended directives now stack as standing instructions instead of silently overwriting the previous one. A replace swaps the base prompt wholesale and keeps the accumulated appends on top. The meta also sees the full text of every standing directive each tick, so it neither clobbers nor repeats them.

The meta can harden the verifier

The two verifier gates (the pre-run design-time cheating audit and the post-run validity audit) are now set-prompt targets. When the meta spots a cheat pattern the audit missed, it can add checks to the audit prompts mid-run. The benchmark, grader, and scorer remain off-limits, so the score stays comparable across the tree.

Model routing follows your session

The meta and the implement/revise agents on hard briefs now inherit the session model instead of a pinned opus, so sessions on newer models (Claude Fable 5) are no longer routed down for the judgment-heavy work. Easy briefs stay on sonnet, as does the mechanical state reader.

Install / upgrade

uv tool install --force evo-hq-cli && evo install claude-code --force   # or codex / cursor / openclaw / pi

Also published: evo-hq-agent 0.5.2 (PyPI), @evo-hq/evo-agent and @evo-hq/pi-evo 0.5.2 (npm).

Full diff: v0.5.1...v0.5.2

evo 0.5.1 is a reliability release for the hook pipeline. If you ever saw SessionStart hook (failed): exit 127 in Codex, this is the fix.

Hooks that stay fixed

Hosts rebuild their plugin caches from a fresh git snapshot whenever they feel like it (Codex does it at every session start). That used to delete the hook binary evo had staged, and every hook fired exit 127 until you reinstalled. The binary now lives at ~/.evo/bin, outside anything the host manages, and the plugin ships a tiny fallback at the hook path that finds it. The host can re-stage all it wants; hooks keep working.

Less to babysit

evo install codex now trusts evo's hooks for you. Untrusted hooks register but never fire, which broke evo direct invisibly. Pass --no-trust-hooks to review them in codex via /hooks instead.
Every install and update now finishes by running evo doctor <host>, so a broken install fails loudly at install time instead of at hook-fire time.
evo doctor codex verifies hook trust, and catches the case where a plugin update changed hooks.json and silently un-trusted everything.

Install / upgrade

uv tool install --force evo-hq-cli && evo install codex --force   # or claude-code / cursor / openclaw / pi

Also published: evo-hq-agent 0.5.1 (PyPI), @evo-hq/evo-agent and @evo-hq/pi-evo 0.5.1 (npm).

evo 0.5.0 makes the loop optimize the whole system — the model weights and the harness — in one run, against one objective. Plus a new Claude Code workflow driver with a live meta-controller, subagents, and a much richer dashboard.

Optimize the model, not just the harness

evo can now fine-tune the base model (SFT / LoRA / RL) as a move inside the optimization loop, alongside the prompts, scaffold, and skills it already tuned. You hand it the whole stack and it decides what to spend the budget on.
New evo:finetuning skill: picks or diagnoses a training move (SFT, LoRA, DPO/KTO/ORPO, RFT, GRPO/PPO/RLOO) with a reward-shape decision tree, a smoke-run gate, and failure diagnostics. Warm-start from the parent policy by default (EVO_PARENT_POLICY).

Workflow driver + live meta-controller (Claude Code)

A dynamic-workflow driver for the optimize loop — now the default on Claude Code (prose orchestration is opt-out).
A concurrent meta-controller that watches a run and can restructure the loop live: set knobs, toggle phases, rewrite prompts, inject steps — plus a STOP signal with a gated enforcer. The autonomous stop-nudge is suppressed under the workflow driver.
Scan clusters experiments by failure class; a context capsule loads category skills and known learnings; cross-history pattern recognition before proposing.

Subagents

evo:verifier and evo:ideator now run as subagents.
New benchmark-reviewer subagent; the discover baseline is gated on its review.

Dashboard

Live log tail, trackio link/sparkline in the node drawer, and per-experiment annotations.
Cleaner tabs/logs; committed-experiment trace handling improvements.
EVO_DASHBOARD_HOST to bind 0.0.0.0 for Modal/cloud.

CLI & hooks

evo wait gained process / log / GPU probes and a --for ideators selector so the loop can block on proposals.
--per-exp-timeout on init with a --timeout per-call override; a PostToolUse hint when the agent starts a long-running command.
evo abort now finds the subprocess tree cross-platform (Windows included), so detached benchmark/training children don't survive as orphans.

Integrity & config

task-skills config: discover resolves category skills and agents load them on demand.
Literature research is required before the first experiment; training on the benchmark set is banned.

Fixes

hook-drain staging honors CLAUDE_CONFIG_DIR and from-path installs (fixes the SessionStart exit-127 warning).

Install

uv tool install evo-hq-cli==0.5.0
evo install claude-code   # or codex / cursor / openclaw / pi

Also published: evo-hq-agent 0.5.0 (PyPI), @evo-hq/evo-agent and @evo-hq/pi-evo 0.5.0 (npm).

Full changelog: v0.4.5...v0.5.0

What changed

feat(optimize): meta controller restructures the workflow live; workflow is the default driver on Claude Code
fix(host_install/claude-code): stage hook-drain into source tree for --from-path installs

Full diff: v0.5.0-alpha.12...v0.5.0-alpha.13

What changed

chore: bump 0.5.0-alpha.11 → 0.5.0-alpha.12, sync npm/
test(assets): unit cover the new evo abstractions
docs(optimize/workflow): make analyst STOP examples category-agnostic
docs(finetuning): make device-placement a generic principle, not hardcoded HF/hardware specifics
feat(optimize/workflow): cluster-on-failure_class in scan + clean loop-resume after STOP (#7)
feat(optimize/workflow): analyst STOP signal + gated enforcer (#6)
test(optimize): task-skill loading parity + workflow-loads-when-instructed
feat(config): task-skills field — discover resolves category skills, agents load them
chore: refresh uv.lock
feat(assets): failure classifier, artifact reuse, mid-run circuit-breaker
feat(optimize/workflow): context capsule — load category skills + apply known learnings
fix(discard): preserve declared artifacts (#64)

Full diff: v0.5.0-alpha.11...v0.5.0-alpha.12

What changed

chore: bump 0.5.0-alpha.8 → 0.5.0-alpha.9, sync npm/
feat(optimize): concurrent analyst thread for the workflow driver

Full diff: v0.5.0-alpha.8...v0.5.0-alpha.9

What changed

chore: bump 0.5.0-alpha.7 → 0.5.0-alpha.8, sync npm/
feat(optimize): configurable scan batch size + compact scan-batch labels
feat(optimize): Claude Code dynamic-workflow driver for the optimize loop
docs(readme): codex exit-127 recovery in Upgrading (#62)
fix(codex): stage hook binary under owner marketplace name + bump 0.4.5 (#61)
feat(dashboard): render annotations + clean up tabs and logs
skills(finetuning,subagent): training-scale discipline
skills: post-commit per-task review + training observability
test: end-to-end coverage for directive delivery pipeline (#58)
skills/optimize: reframe description as structured autoresearch iteration
skills: reframe Evo surface as general guidance + skills-before-references principle

Full diff: v0.5.0-alpha.7...v0.5.0-alpha.8

What changed

chore: bump 0.5.0-alpha.10 → 0.5.0-alpha.11, sync npm/
fix(optimize/workflow): run-lane must finish the build/train step before evo run

Full diff: v0.5.0-alpha.10...v0.5.0-alpha.11

What changed

chore: bump 0.5.0-alpha.9 → 0.5.0-alpha.10, sync npm/

Full diff: v0.5.0-alpha.9...v0.5.0-alpha.10

Releases: evo-hq/evo

v0.5.3

evo 0.5.3

Uh oh!

evo 0.5.2

The meta keeps notes

Prompt directives accumulate

The meta can harden the verifier

Model routing follows your session

Install / upgrade

Uh oh!

evo 0.5.1

Hooks that stay fixed

Less to babysit

Install / upgrade

Uh oh!

evo 0.5.0

Optimize the model, not just the harness

Workflow driver + live meta-controller (Claude Code)

Subagents

Dashboard

CLI & hooks

Integrity & config

Fixes

Install

Uh oh!

v0.5.0-alpha.13

What changed

Uh oh!

v0.5.0-alpha.12

What changed

Uh oh!

v0.5.0-alpha.9

What changed

Uh oh!

v0.5.0-alpha.8

What changed

Uh oh!

v0.5.0-alpha.11

What changed

Uh oh!

v0.5.0-alpha.10

What changed

Uh oh!