Skip to content

Releases: evo-hq/evo

v0.5.3

14 Jun 15:32

Choose a tag to compare

evo 0.5.3

Two main changes this release, plus a security bump.

The optimize loop runs in prose by default now. On Claude Code it used to reach for the dynamic workflow first and fall back to prose; that's reversed. The prose loop drives every host, and the workflow is something you opt into with evo config set default-orchestrator workflow. If you never set that, the only thing you'll notice is the loop running the prose way.

The dashboard surfaces gate failures and tidies up the score chart.

  • When the accuracy gate rejects an experiment, it's obvious at a glance now: the node gets a muted red marker and a small gate ✗ tag, and the detail panel says "Accuracy gate failed". Before, a gate rejection looked identical to an experiment that was simply slower than its parent.
  • The score-over-time chart has two buttons in its header: collapse it to a thin bar when you want the canvas back, or expand it to fill the column when you want to read the trend. It remembers the size you left it at.

Security: urllib3 bumped to 2.7.0 (CVE-2026-44431).

Full diff: v0.5.2...v0.5.3

evo 0.5.2

11 Jun 10:38
df21053

Choose a tag to compare

evo 0.5.2 upgrades the meta controller in the optimize workflow (the default driver on Claude Code): it keeps notes across ticks, its prompt edits accumulate instead of overwriting each other, it can harden the verifier audits live, and model routing now follows your session model.

The meta keeps notes

Every meta tick is a fresh agent. Until now its only memory was a dedup list of findings it had already reported; the reasoning behind them was lost. Each tick can now leave a journal note (observations that aren't actionable yet, pending hypotheses with the evidence so far, watch-items to re-check), and recent notes are fed back into every subsequent tick. The full journal is returned in the workflow result as metaJournal, next to the harness edit log.

Prompt directives accumulate

When the meta edits a phase prompt with set-prompt, appended directives now stack as standing instructions instead of silently overwriting the previous one. A replace swaps the base prompt wholesale and keeps the accumulated appends on top. The meta also sees the full text of every standing directive each tick, so it neither clobbers nor repeats them.

The meta can harden the verifier

The two verifier gates (the pre-run design-time cheating audit and the post-run validity audit) are now set-prompt targets. When the meta spots a cheat pattern the audit missed, it can add checks to the audit prompts mid-run. The benchmark, grader, and scorer remain off-limits, so the score stays comparable across the tree.

Model routing follows your session

The meta and the implement/revise agents on hard briefs now inherit the session model instead of a pinned opus, so sessions on newer models (Claude Fable 5) are no longer routed down for the judgment-heavy work. Easy briefs stay on sonnet, as does the mechanical state reader.

Install / upgrade

uv tool install --force evo-hq-cli && evo install claude-code --force   # or codex / cursor / openclaw / pi

Also published: evo-hq-agent 0.5.2 (PyPI), @evo-hq/evo-agent and @evo-hq/pi-evo 0.5.2 (npm).

Full diff: v0.5.1...v0.5.2

evo 0.5.1

11 Jun 03:55
934cb79

Choose a tag to compare

evo 0.5.1 is a reliability release for the hook pipeline. If you ever saw SessionStart hook (failed): exit 127 in Codex, this is the fix.

Hooks that stay fixed

Hosts rebuild their plugin caches from a fresh git snapshot whenever they feel like it (Codex does it at every session start). That used to delete the hook binary evo had staged, and every hook fired exit 127 until you reinstalled. The binary now lives at ~/.evo/bin, outside anything the host manages, and the plugin ships a tiny fallback at the hook path that finds it. The host can re-stage all it wants; hooks keep working.

Less to babysit

  • evo install codex now trusts evo's hooks for you. Untrusted hooks register but never fire, which broke evo direct invisibly. Pass --no-trust-hooks to review them in codex via /hooks instead.
  • Every install and update now finishes by running evo doctor <host>, so a broken install fails loudly at install time instead of at hook-fire time.
  • evo doctor codex verifies hook trust, and catches the case where a plugin update changed hooks.json and silently un-trusted everything.

Install / upgrade

uv tool install --force evo-hq-cli && evo install codex --force   # or claude-code / cursor / openclaw / pi

Also published: evo-hq-agent 0.5.1 (PyPI), @evo-hq/evo-agent and @evo-hq/pi-evo 0.5.1 (npm).

evo 0.5.0

06 Jun 09:40
0090ce9

Choose a tag to compare

evo 0.5.0 makes the loop optimize the whole system — the model weights and the harness — in one run, against one objective. Plus a new Claude Code workflow driver with a live meta-controller, subagents, and a much richer dashboard.

Optimize the model, not just the harness

  • evo can now fine-tune the base model (SFT / LoRA / RL) as a move inside the optimization loop, alongside the prompts, scaffold, and skills it already tuned. You hand it the whole stack and it decides what to spend the budget on.
  • New evo:finetuning skill: picks or diagnoses a training move (SFT, LoRA, DPO/KTO/ORPO, RFT, GRPO/PPO/RLOO) with a reward-shape decision tree, a smoke-run gate, and failure diagnostics. Warm-start from the parent policy by default (EVO_PARENT_POLICY).

Workflow driver + live meta-controller (Claude Code)

  • A dynamic-workflow driver for the optimize loop — now the default on Claude Code (prose orchestration is opt-out).
  • A concurrent meta-controller that watches a run and can restructure the loop live: set knobs, toggle phases, rewrite prompts, inject steps — plus a STOP signal with a gated enforcer. The autonomous stop-nudge is suppressed under the workflow driver.
  • Scan clusters experiments by failure class; a context capsule loads category skills and known learnings; cross-history pattern recognition before proposing.

Subagents

  • evo:verifier and evo:ideator now run as subagents.
  • New benchmark-reviewer subagent; the discover baseline is gated on its review.

Dashboard

  • Live log tail, trackio link/sparkline in the node drawer, and per-experiment annotations.
  • Cleaner tabs/logs; committed-experiment trace handling improvements.
  • EVO_DASHBOARD_HOST to bind 0.0.0.0 for Modal/cloud.

CLI & hooks

  • evo wait gained process / log / GPU probes and a --for ideators selector so the loop can block on proposals.
  • --per-exp-timeout on init with a --timeout per-call override; a PostToolUse hint when the agent starts a long-running command.
  • evo abort now finds the subprocess tree cross-platform (Windows included), so detached benchmark/training children don't survive as orphans.

Integrity & config

  • task-skills config: discover resolves category skills and agents load them on demand.
  • Literature research is required before the first experiment; training on the benchmark set is banned.

Fixes

  • hook-drain staging honors CLAUDE_CONFIG_DIR and from-path installs (fixes the SessionStart exit-127 warning).

Install

uv tool install evo-hq-cli==0.5.0
evo install claude-code   # or codex / cursor / openclaw / pi

Also published: evo-hq-agent 0.5.0 (PyPI), @evo-hq/evo-agent and @evo-hq/pi-evo 0.5.0 (npm).

Full changelog: v0.4.5...v0.5.0

v0.5.0-alpha.13

05 Jun 22:10

Choose a tag to compare

v0.5.0-alpha.13 Pre-release
Pre-release

What changed

  • feat(optimize): meta controller restructures the workflow live; workflow is the default driver on Claude Code
  • fix(host_install/claude-code): stage hook-drain into source tree for --from-path installs

Full diff: v0.5.0-alpha.12...v0.5.0-alpha.13

v0.5.0-alpha.12

05 Jun 14:09

Choose a tag to compare

v0.5.0-alpha.12 Pre-release
Pre-release

What changed

  • chore: bump 0.5.0-alpha.11 → 0.5.0-alpha.12, sync npm/
  • test(assets): unit cover the new evo abstractions
  • docs(optimize/workflow): make analyst STOP examples category-agnostic
  • docs(finetuning): make device-placement a generic principle, not hardcoded HF/hardware specifics
  • feat(optimize/workflow): cluster-on-failure_class in scan + clean loop-resume after STOP (#7)
  • feat(optimize/workflow): analyst STOP signal + gated enforcer (#6)
  • test(optimize): task-skill loading parity + workflow-loads-when-instructed
  • feat(config): task-skills field — discover resolves category skills, agents load them
  • chore: refresh uv.lock
  • feat(assets): failure classifier, artifact reuse, mid-run circuit-breaker
  • feat(optimize/workflow): context capsule — load category skills + apply known learnings
  • fix(discard): preserve declared artifacts (#64)

Full diff: v0.5.0-alpha.11...v0.5.0-alpha.12

v0.5.0-alpha.9

04 Jun 21:08

Choose a tag to compare

v0.5.0-alpha.9 Pre-release
Pre-release

What changed

  • chore: bump 0.5.0-alpha.8 → 0.5.0-alpha.9, sync npm/
  • feat(optimize): concurrent analyst thread for the workflow driver

Full diff: v0.5.0-alpha.8...v0.5.0-alpha.9

v0.5.0-alpha.8

04 Jun 19:50

Choose a tag to compare

v0.5.0-alpha.8 Pre-release
Pre-release

What changed

  • chore: bump 0.5.0-alpha.7 → 0.5.0-alpha.8, sync npm/
  • feat(optimize): configurable scan batch size + compact scan-batch labels
  • feat(optimize): Claude Code dynamic-workflow driver for the optimize loop
  • docs(readme): codex exit-127 recovery in Upgrading (#62)
  • fix(codex): stage hook binary under owner marketplace name + bump 0.4.5 (#61)
  • feat(dashboard): render annotations + clean up tabs and logs
  • skills(finetuning,subagent): training-scale discipline
  • skills: post-commit per-task review + training observability
  • test: end-to-end coverage for directive delivery pipeline (#58)
  • skills/optimize: reframe description as structured autoresearch iteration
  • skills: reframe Evo surface as general guidance + skills-before-references principle

Full diff: v0.5.0-alpha.7...v0.5.0-alpha.8

v0.5.0-alpha.11

04 Jun 22:44

Choose a tag to compare

v0.5.0-alpha.11 Pre-release
Pre-release

What changed

  • chore: bump 0.5.0-alpha.10 → 0.5.0-alpha.11, sync npm/
  • fix(optimize/workflow): run-lane must finish the build/train step before evo run

Full diff: v0.5.0-alpha.10...v0.5.0-alpha.11

v0.5.0-alpha.10

04 Jun 21:42

Choose a tag to compare

v0.5.0-alpha.10 Pre-release
Pre-release

What changed

  • chore: bump 0.5.0-alpha.9 → 0.5.0-alpha.10, sync npm/

Full diff: v0.5.0-alpha.9...v0.5.0-alpha.10