Skip to content

Releases: evo-hq/evo

evo 0.5.2

11 Jun 10:38
df21053

Choose a tag to compare

evo 0.5.2 upgrades the meta controller in the optimize workflow (the default driver on Claude Code): it keeps notes across ticks, its prompt edits accumulate instead of overwriting each other, it can harden the verifier audits live, and model routing now follows your session model.

The meta keeps notes

Every meta tick is a fresh agent. Until now its only memory was a dedup list of findings it had already reported; the reasoning behind them was lost. Each tick can now leave a journal note (observations that aren't actionable yet, pending hypotheses with the evidence so far, watch-items to re-check), and recent notes are fed back into every subsequent tick. The full journal is returned in the workflow result as metaJournal, next to the harness edit log.

Prompt directives accumulate

When the meta edits a phase prompt with set-prompt, appended directives now stack as standing instructions instead of silently overwriting the previous one. A replace swaps the base prompt wholesale and keeps the accumulated appends on top. The meta also sees the full text of every standing directive each tick, so it neither clobbers nor repeats them.

The meta can harden the verifier

The two verifier gates (the pre-run design-time cheating audit and the post-run validity audit) are now set-prompt targets. When the meta spots a cheat pattern the audit missed, it can add checks to the audit prompts mid-run. The benchmark, grader, and scorer remain off-limits, so the score stays comparable across the tree.

Model routing follows your session

The meta and the implement/revise agents on hard briefs now inherit the session model instead of a pinned opus, so sessions on newer models (Claude Fable 5) are no longer routed down for the judgment-heavy work. Easy briefs stay on sonnet, as does the mechanical state reader.

Install / upgrade

uv tool install --force evo-hq-cli && evo install claude-code --force   # or codex / cursor / openclaw / pi

Also published: evo-hq-agent 0.5.2 (PyPI), @evo-hq/evo-agent and @evo-hq/pi-evo 0.5.2 (npm).

Full diff: v0.5.1...v0.5.2

evo 0.5.1

11 Jun 03:55
934cb79

Choose a tag to compare

evo 0.5.1 is a reliability release for the hook pipeline. If you ever saw SessionStart hook (failed): exit 127 in Codex, this is the fix.

Hooks that stay fixed

Hosts rebuild their plugin caches from a fresh git snapshot whenever they feel like it (Codex does it at every session start). That used to delete the hook binary evo had staged, and every hook fired exit 127 until you reinstalled. The binary now lives at ~/.evo/bin, outside anything the host manages, and the plugin ships a tiny fallback at the hook path that finds it. The host can re-stage all it wants; hooks keep working.

Less to babysit

  • evo install codex now trusts evo's hooks for you. Untrusted hooks register but never fire, which broke evo direct invisibly. Pass --no-trust-hooks to review them in codex via /hooks instead.
  • Every install and update now finishes by running evo doctor <host>, so a broken install fails loudly at install time instead of at hook-fire time.
  • evo doctor codex verifies hook trust, and catches the case where a plugin update changed hooks.json and silently un-trusted everything.

Install / upgrade

uv tool install --force evo-hq-cli && evo install codex --force   # or claude-code / cursor / openclaw / pi

Also published: evo-hq-agent 0.5.1 (PyPI), @evo-hq/evo-agent and @evo-hq/pi-evo 0.5.1 (npm).

evo 0.5.0

06 Jun 09:40
0090ce9

Choose a tag to compare

evo 0.5.0 makes the loop optimize the whole system — the model weights and the harness — in one run, against one objective. Plus a new Claude Code workflow driver with a live meta-controller, subagents, and a much richer dashboard.

Optimize the model, not just the harness

  • evo can now fine-tune the base model (SFT / LoRA / RL) as a move inside the optimization loop, alongside the prompts, scaffold, and skills it already tuned. You hand it the whole stack and it decides what to spend the budget on.
  • New evo:finetuning skill: picks or diagnoses a training move (SFT, LoRA, DPO/KTO/ORPO, RFT, GRPO/PPO/RLOO) with a reward-shape decision tree, a smoke-run gate, and failure diagnostics. Warm-start from the parent policy by default (EVO_PARENT_POLICY).

Workflow driver + live meta-controller (Claude Code)

  • A dynamic-workflow driver for the optimize loop — now the default on Claude Code (prose orchestration is opt-out).
  • A concurrent meta-controller that watches a run and can restructure the loop live: set knobs, toggle phases, rewrite prompts, inject steps — plus a STOP signal with a gated enforcer. The autonomous stop-nudge is suppressed under the workflow driver.
  • Scan clusters experiments by failure class; a context capsule loads category skills and known learnings; cross-history pattern recognition before proposing.

Subagents

  • evo:verifier and evo:ideator now run as subagents.
  • New benchmark-reviewer subagent; the discover baseline is gated on its review.

Dashboard

  • Live log tail, trackio link/sparkline in the node drawer, and per-experiment annotations.
  • Cleaner tabs/logs; committed-experiment trace handling improvements.
  • EVO_DASHBOARD_HOST to bind 0.0.0.0 for Modal/cloud.

CLI & hooks

  • evo wait gained process / log / GPU probes and a --for ideators selector so the loop can block on proposals.
  • --per-exp-timeout on init with a --timeout per-call override; a PostToolUse hint when the agent starts a long-running command.
  • evo abort now finds the subprocess tree cross-platform (Windows included), so detached benchmark/training children don't survive as orphans.

Integrity & config

  • task-skills config: discover resolves category skills and agents load them on demand.
  • Literature research is required before the first experiment; training on the benchmark set is banned.

Fixes

  • hook-drain staging honors CLAUDE_CONFIG_DIR and from-path installs (fixes the SessionStart exit-127 warning).

Install

uv tool install evo-hq-cli==0.5.0
evo install claude-code   # or codex / cursor / openclaw / pi

Also published: evo-hq-agent 0.5.0 (PyPI), @evo-hq/evo-agent and @evo-hq/pi-evo 0.5.0 (npm).

Full changelog: v0.4.5...v0.5.0

v0.5.0-alpha.13

05 Jun 22:10

Choose a tag to compare

v0.5.0-alpha.13 Pre-release
Pre-release

What changed

  • feat(optimize): meta controller restructures the workflow live; workflow is the default driver on Claude Code
  • fix(host_install/claude-code): stage hook-drain into source tree for --from-path installs

Full diff: v0.5.0-alpha.12...v0.5.0-alpha.13

v0.5.0-alpha.12

05 Jun 14:09

Choose a tag to compare

v0.5.0-alpha.12 Pre-release
Pre-release

What changed

  • chore: bump 0.5.0-alpha.11 → 0.5.0-alpha.12, sync npm/
  • test(assets): unit cover the new evo abstractions
  • docs(optimize/workflow): make analyst STOP examples category-agnostic
  • docs(finetuning): make device-placement a generic principle, not hardcoded HF/hardware specifics
  • feat(optimize/workflow): cluster-on-failure_class in scan + clean loop-resume after STOP (#7)
  • feat(optimize/workflow): analyst STOP signal + gated enforcer (#6)
  • test(optimize): task-skill loading parity + workflow-loads-when-instructed
  • feat(config): task-skills field — discover resolves category skills, agents load them
  • chore: refresh uv.lock
  • feat(assets): failure classifier, artifact reuse, mid-run circuit-breaker
  • feat(optimize/workflow): context capsule — load category skills + apply known learnings
  • fix(discard): preserve declared artifacts (#64)

Full diff: v0.5.0-alpha.11...v0.5.0-alpha.12

v0.5.0-alpha.9

04 Jun 21:08

Choose a tag to compare

v0.5.0-alpha.9 Pre-release
Pre-release

What changed

  • chore: bump 0.5.0-alpha.8 → 0.5.0-alpha.9, sync npm/
  • feat(optimize): concurrent analyst thread for the workflow driver

Full diff: v0.5.0-alpha.8...v0.5.0-alpha.9

v0.5.0-alpha.8

04 Jun 19:50

Choose a tag to compare

v0.5.0-alpha.8 Pre-release
Pre-release

What changed

  • chore: bump 0.5.0-alpha.7 → 0.5.0-alpha.8, sync npm/
  • feat(optimize): configurable scan batch size + compact scan-batch labels
  • feat(optimize): Claude Code dynamic-workflow driver for the optimize loop
  • docs(readme): codex exit-127 recovery in Upgrading (#62)
  • fix(codex): stage hook binary under owner marketplace name + bump 0.4.5 (#61)
  • feat(dashboard): render annotations + clean up tabs and logs
  • skills(finetuning,subagent): training-scale discipline
  • skills: post-commit per-task review + training observability
  • test: end-to-end coverage for directive delivery pipeline (#58)
  • skills/optimize: reframe description as structured autoresearch iteration
  • skills: reframe Evo surface as general guidance + skills-before-references principle

Full diff: v0.5.0-alpha.7...v0.5.0-alpha.8

v0.5.0-alpha.11

04 Jun 22:44

Choose a tag to compare

v0.5.0-alpha.11 Pre-release
Pre-release

What changed

  • chore: bump 0.5.0-alpha.10 → 0.5.0-alpha.11, sync npm/
  • fix(optimize/workflow): run-lane must finish the build/train step before evo run

Full diff: v0.5.0-alpha.10...v0.5.0-alpha.11

v0.5.0-alpha.10

04 Jun 21:42

Choose a tag to compare

v0.5.0-alpha.10 Pre-release
Pre-release

What changed

  • chore: bump 0.5.0-alpha.9 → 0.5.0-alpha.10, sync npm/

Full diff: v0.5.0-alpha.9...v0.5.0-alpha.10

v0.4.5

04 Jun 08:26
4a45773

Choose a tag to compare

Codex hook fix

evo install codex staged the evo-hook-drain binary (and registered the plugin) under the marketplace name from marketplace.json (evo-hq-evo), but Codex 0.130+ loads the plugin under the repo-owner name (evo@evo-hq). Every hook resolved ${CLAUDE_PLUGIN_ROOT}/bin/evo-hook-drain to a cache directory the installer never populated and fired exit 127. evo doctor codex passed because it checked config, not the binary.

  • Stage the binary, copy the plugin, and register under the owner name (evo-hq), matching what Codex resolves and codex plugin marketplace add evo-hq/evo registers.
  • Legacy cleanup removes orphaned evo@evo-hq-evo registrations and their caches.
  • evo doctor codex now verifies evo-hook-drain exists and is executable at the resolved path; version-dir selection is numeric and comment-safe.
  • Uninstall no longer leaves a stray enabled = true line behind the removed section header.

Upgrading

Existing Codex installs hitting exit-127 hooks do not self-heal from evo update — the broken install reports unhealthy and is skipped. Recover explicitly:

uv tool install --force evo-hq-cli && evo install codex --force

This stages the binary into the cache directory Codex loads and clears the stale registration. Other hosts: evo update.

PR #61. Full diff: v0.4.4...v0.4.5