Releases: evo-hq/evo
evo 0.5.2
evo 0.5.2 upgrades the meta controller in the optimize workflow (the default driver on Claude Code): it keeps notes across ticks, its prompt edits accumulate instead of overwriting each other, it can harden the verifier audits live, and model routing now follows your session model.
The meta keeps notes
Every meta tick is a fresh agent. Until now its only memory was a dedup list of findings it had already reported; the reasoning behind them was lost. Each tick can now leave a journal note (observations that aren't actionable yet, pending hypotheses with the evidence so far, watch-items to re-check), and recent notes are fed back into every subsequent tick. The full journal is returned in the workflow result as metaJournal, next to the harness edit log.
Prompt directives accumulate
When the meta edits a phase prompt with set-prompt, appended directives now stack as standing instructions instead of silently overwriting the previous one. A replace swaps the base prompt wholesale and keeps the accumulated appends on top. The meta also sees the full text of every standing directive each tick, so it neither clobbers nor repeats them.
The meta can harden the verifier
The two verifier gates (the pre-run design-time cheating audit and the post-run validity audit) are now set-prompt targets. When the meta spots a cheat pattern the audit missed, it can add checks to the audit prompts mid-run. The benchmark, grader, and scorer remain off-limits, so the score stays comparable across the tree.
Model routing follows your session
The meta and the implement/revise agents on hard briefs now inherit the session model instead of a pinned opus, so sessions on newer models (Claude Fable 5) are no longer routed down for the judgment-heavy work. Easy briefs stay on sonnet, as does the mechanical state reader.
Install / upgrade
uv tool install --force evo-hq-cli && evo install claude-code --force # or codex / cursor / openclaw / pi
Also published: evo-hq-agent 0.5.2 (PyPI), @evo-hq/evo-agent and @evo-hq/pi-evo 0.5.2 (npm).
Full diff: v0.5.1...v0.5.2
evo 0.5.1
evo 0.5.1 is a reliability release for the hook pipeline. If you ever saw SessionStart hook (failed): exit 127 in Codex, this is the fix.
Hooks that stay fixed
Hosts rebuild their plugin caches from a fresh git snapshot whenever they feel like it (Codex does it at every session start). That used to delete the hook binary evo had staged, and every hook fired exit 127 until you reinstalled. The binary now lives at ~/.evo/bin, outside anything the host manages, and the plugin ships a tiny fallback at the hook path that finds it. The host can re-stage all it wants; hooks keep working.
Less to babysit
evo install codexnow trusts evo's hooks for you. Untrusted hooks register but never fire, which brokeevo directinvisibly. Pass--no-trust-hooksto review them in codex via/hooksinstead.- Every install and update now finishes by running
evo doctor <host>, so a broken install fails loudly at install time instead of at hook-fire time. evo doctor codexverifies hook trust, and catches the case where a plugin update changedhooks.jsonand silently un-trusted everything.
Install / upgrade
uv tool install --force evo-hq-cli && evo install codex --force # or claude-code / cursor / openclaw / pi
Also published: evo-hq-agent 0.5.1 (PyPI), @evo-hq/evo-agent and @evo-hq/pi-evo 0.5.1 (npm).
evo 0.5.0
evo 0.5.0 makes the loop optimize the whole system — the model weights and the harness — in one run, against one objective. Plus a new Claude Code workflow driver with a live meta-controller, subagents, and a much richer dashboard.
Optimize the model, not just the harness
- evo can now fine-tune the base model (SFT / LoRA / RL) as a move inside the optimization loop, alongside the prompts, scaffold, and skills it already tuned. You hand it the whole stack and it decides what to spend the budget on.
- New
evo:finetuningskill: picks or diagnoses a training move (SFT, LoRA, DPO/KTO/ORPO, RFT, GRPO/PPO/RLOO) with a reward-shape decision tree, a smoke-run gate, and failure diagnostics. Warm-start from the parent policy by default (EVO_PARENT_POLICY).
Workflow driver + live meta-controller (Claude Code)
- A dynamic-workflow driver for the optimize loop — now the default on Claude Code (prose orchestration is opt-out).
- A concurrent meta-controller that watches a run and can restructure the loop live: set knobs, toggle phases, rewrite prompts, inject steps — plus a STOP signal with a gated enforcer. The autonomous stop-nudge is suppressed under the workflow driver.
- Scan clusters experiments by failure class; a context capsule loads category skills and known learnings; cross-history pattern recognition before proposing.
Subagents
evo:verifierandevo:ideatornow run as subagents.- New benchmark-reviewer subagent; the discover baseline is gated on its review.
Dashboard
- Live log tail, trackio link/sparkline in the node drawer, and per-experiment annotations.
- Cleaner tabs/logs; committed-experiment trace handling improvements.
EVO_DASHBOARD_HOSTto bind0.0.0.0for Modal/cloud.
CLI & hooks
evo waitgained process / log / GPU probes and a--for ideatorsselector so the loop can block on proposals.--per-exp-timeouton init with a--timeoutper-call override; a PostToolUse hint when the agent starts a long-running command.evo abortnow finds the subprocess tree cross-platform (Windows included), so detached benchmark/training children don't survive as orphans.
Integrity & config
task-skillsconfig: discover resolves category skills and agents load them on demand.- Literature research is required before the first experiment; training on the benchmark set is banned.
Fixes
- hook-drain staging honors
CLAUDE_CONFIG_DIRand from-path installs (fixes the SessionStart exit-127 warning).
Install
uv tool install evo-hq-cli==0.5.0
evo install claude-code # or codex / cursor / openclaw / pi
Also published: evo-hq-agent 0.5.0 (PyPI), @evo-hq/evo-agent and @evo-hq/pi-evo 0.5.0 (npm).
Full changelog: v0.4.5...v0.5.0
v0.5.0-alpha.13
What changed
- feat(optimize): meta controller restructures the workflow live; workflow is the default driver on Claude Code
- fix(host_install/claude-code): stage hook-drain into source tree for --from-path installs
Full diff: v0.5.0-alpha.12...v0.5.0-alpha.13
v0.5.0-alpha.12
What changed
- chore: bump 0.5.0-alpha.11 → 0.5.0-alpha.12, sync npm/
- test(assets): unit cover the new evo abstractions
- docs(optimize/workflow): make analyst STOP examples category-agnostic
- docs(finetuning): make device-placement a generic principle, not hardcoded HF/hardware specifics
- feat(optimize/workflow): cluster-on-failure_class in scan + clean loop-resume after STOP (#7)
- feat(optimize/workflow): analyst STOP signal + gated enforcer (#6)
- test(optimize): task-skill loading parity + workflow-loads-when-instructed
- feat(config): task-skills field — discover resolves category skills, agents load them
- chore: refresh uv.lock
- feat(assets): failure classifier, artifact reuse, mid-run circuit-breaker
- feat(optimize/workflow): context capsule — load category skills + apply known learnings
- fix(discard): preserve declared artifacts (#64)
Full diff: v0.5.0-alpha.11...v0.5.0-alpha.12
v0.5.0-alpha.9
What changed
- chore: bump 0.5.0-alpha.8 → 0.5.0-alpha.9, sync npm/
- feat(optimize): concurrent analyst thread for the workflow driver
Full diff: v0.5.0-alpha.8...v0.5.0-alpha.9
v0.5.0-alpha.8
What changed
- chore: bump 0.5.0-alpha.7 → 0.5.0-alpha.8, sync npm/
- feat(optimize): configurable scan batch size + compact scan-batch labels
- feat(optimize): Claude Code dynamic-workflow driver for the optimize loop
- docs(readme): codex exit-127 recovery in Upgrading (#62)
- fix(codex): stage hook binary under owner marketplace name + bump 0.4.5 (#61)
- feat(dashboard): render annotations + clean up tabs and logs
- skills(finetuning,subagent): training-scale discipline
- skills: post-commit per-task review + training observability
- test: end-to-end coverage for directive delivery pipeline (#58)
- skills/optimize: reframe description as structured autoresearch iteration
- skills: reframe Evo surface as general guidance + skills-before-references principle
Full diff: v0.5.0-alpha.7...v0.5.0-alpha.8
v0.5.0-alpha.11
What changed
- chore: bump 0.5.0-alpha.10 → 0.5.0-alpha.11, sync npm/
- fix(optimize/workflow): run-lane must finish the build/train step before evo run
Full diff: v0.5.0-alpha.10...v0.5.0-alpha.11
v0.5.0-alpha.10
What changed
- chore: bump 0.5.0-alpha.9 → 0.5.0-alpha.10, sync npm/
Full diff: v0.5.0-alpha.9...v0.5.0-alpha.10
v0.4.5
Codex hook fix
evo install codex staged the evo-hook-drain binary (and registered the plugin) under the marketplace name from marketplace.json (evo-hq-evo), but Codex 0.130+ loads the plugin under the repo-owner name (evo@evo-hq). Every hook resolved ${CLAUDE_PLUGIN_ROOT}/bin/evo-hook-drain to a cache directory the installer never populated and fired exit 127. evo doctor codex passed because it checked config, not the binary.
- Stage the binary, copy the plugin, and register under the owner name (
evo-hq), matching what Codex resolves andcodex plugin marketplace add evo-hq/evoregisters. - Legacy cleanup removes orphaned
evo@evo-hq-evoregistrations and their caches. evo doctor codexnow verifiesevo-hook-drainexists and is executable at the resolved path; version-dir selection is numeric and comment-safe.- Uninstall no longer leaves a stray
enabled = trueline behind the removed section header.
Upgrading
Existing Codex installs hitting exit-127 hooks do not self-heal from evo update — the broken install reports unhealthy and is skipped. Recover explicitly:
uv tool install --force evo-hq-cli && evo install codex --forceThis stages the binary into the cache directory Codex loads and clears the stale registration. Other hosts: evo update.
PR #61. Full diff: v0.4.4...v0.4.5