evo: proposed experiment config#71
Conversation
|
test installation -- evo platform integration moved to the private dry-run repo |
There was a problem hiding this comment.
🚩 Configuration file placed at repo root rather than in tests/fixtures
This evoc.yaml is placed at the repository root and references a test fixture harness (tests/fixtures/auto_harness_demo/benchmark.py). If this file is meant to be consumed by an external tool scanning the repo root, it would point that tool at a test fixture rather than production code. If it's only meant for demo/testing purposes, it might be more appropriate to place it alongside the fixture it references. No other code in the repo references this file.
Was this helpful? React with 👍 or 👎 to provide feedback.
| metric: latency_ms | ||
| direction: min |
There was a problem hiding this comment.
🚩 Configured metric does not match harness output
The goal.metric is set to latency_ms (line 3), but the referenced harness at tests/fixtures/auto_harness_demo/benchmark.py:91 emits {"score": ..., "tasks": ...} — there is no latency_ms key. The notes on lines 23-24 explicitly acknowledge this is an inferred guess ('goal INFERRED: benchmark script found, guessing latency_ms/min; rename to what the harness emits'), so this appears intentional as scaffolding. However, if this config were ever activated as-is without editing, the optimization tool would likely fail to find the expected metric. Worth confirming this is only scaffolding and not meant to be used directly.
Was this helpful? React with 👍 or 👎 to provide feedback.
evo: proposed experiment config
evo analyzed this repository and proposes the configuration below. Every detection cites its evidence; everything marked INFERRED is a guess for you to correct.
What evo detected
rust— Cargo.toml marker file (plugins/evo/bin/evo-hook-drain-rs/Cargo.toml:1)javascript— package.json marker file (plugins/evo/npm/package.json:1)python— pyproject.toml marker file (plugins/evo/pyproject.toml:1)cargo test— cargo manifest (plugins/evo/bin/evo-hook-drain-rs/Cargo.toml:1)npm test— package.json test script: 'node --test test/*.test.js' (sdk/node/package.json:20)python -m pytest -q— conftest.py present (tests/conftest.py:1)python tests/fixtures/auto_harness_demo/benchmark.py— benchmark-named script (tests/fixtures/auto_harness_demo/benchmark.py:1)python tests/fixtures/release_smoke/repo/bench.py— benchmark-named script (tests/fixtures/release_smoke/repo/bench.py:1)python tests/fixtures/tau3_demo/benchmark.py— benchmark-named script (tests/fixtures/tau3_demo/benchmark.py:1)python scripts/rlm_eval/score.py— evaluation-named script (scripts/rlm_eval/score.py:1)python scripts/rlm_eval/score_llm.py— evaluation-named script (scripts/rlm_eval/score_llm.py:1)python3 scripts/check_versions.py— workflow run step (.github/workflows/ci.yml:17)shell: bash— workflow run step (.github/workflows/ci.yml:28)python -m pip install --upgrade pip build— workflow run step (.github/workflows/ci.yml:35)python -m build— workflow run step (.github/workflows/ci.yml:38)python - <<'PY'— workflow run step (.github/workflows/ci.yml:42)python -m venv "$RUNNER_TEMP/smoke"— workflow run step (.github/workflows/ci.yml:58)python -m pip install --upgrade pip build— workflow run step (.github/workflows/ci.yml:80)python -m build— workflow run step (.github/workflows/ci.yml:83)python -m pip install dist/*.whl— workflow run step (.github/workflows/ci.yml:86)python sdk/python/test/test_run.py— workflow run step (.github/workflows/ci.yml:88)npm test— workflow run step (.github/workflows/ci.yml:103)shell: bash— workflow run step (.github/workflows/ci.yml:114)cargo build --release— workflow run step (.github/workflows/ci.yml:127)git config --global user.email "ci@evo.test"— workflow run step (.github/workflows/ci.yml:130)python -m pip install -e plugins/evo— workflow run step (.github/workflows/ci.yml:135)pytest tests/unit/ -q— workflow run step (.github/workflows/ci.yml:138)git config --global user.email "ci@evo.test"— workflow run step (.github/workflows/publish.yml:45)python3 scripts/check_versions.py— workflow run step (.github/workflows/publish.yml:51)TAG="${GITHUB_REF##*/}"— workflow run step (.github/workflows/publish.yml:55)python -m pip install --upgrade pip— workflow run step (.github/workflows/publish.yml:63)pytest tests/unit/ -q— workflow run step (.github/workflows/publish.yml:67)python tests/e2e/test_e2e.py— workflow run step (.github/workflows/publish.yml:73)python -m pip install --upgrade pip build twine— workflow run step (.github/workflows/publish.yml:92)python -m build— workflow run step (.github/workflows/publish.yml:95)TAG="${GITHUB_REF##*/}"— workflow run step (.github/workflows/publish.yml:99)twine check dist/*— workflow run step (.github/workflows/publish.yml:112)twine upload --non-interactive dist/*— workflow run step (.github/workflows/publish.yml:118)TAG="${GITHUB_REF##*/}"— workflow run step (.github/workflows/publish.yml:138)npm test— workflow run step (.github/workflows/publish.yml:151)# Pre-release semver (e.g. 0.4.0-alpha.1, 0.4.0-rc.1) ships under— workflow run step (.github/workflows/publish.yml:157)bash plugins/evo/npm/scripts/sync-from-source.sh— workflow run step (.github/workflows/publish.yml:184)TAG="${GITHUB_REF##*/}"— workflow run step (.github/workflows/publish.yml:188)# Pre-release semver (e.g. 0.4.2-alpha.2) ships under the— workflow run step (.github/workflows/publish.yml:204)python3 scripts/check_versions.py— workflow run step (.github/workflows/publish.yml:229)python -m pip install --upgrade pip build twine— workflow run step (.github/workflows/publish.yml:231)python -m build— workflow run step (.github/workflows/publish.yml:234)TAG="${GITHUB_REF##*/}"— workflow run step (.github/workflows/publish.yml:238)twine check dist/*— workflow run step (.github/workflows/publish.yml:251)whl=$(ls dist/*.whl)— workflow run step (.github/workflows/publish.yml:255)twine upload --non-interactive dist/*— workflow run step (.github/workflows/publish.yml:266)TAG="${GITHUB_REF##*/}"— workflow run step (.github/workflows/publish.yml:311)python -m pip install --upgrade pip— workflow run step (.github/workflows/publish.yml:326)python - <<'PY'— workflow run step (.github/workflows/publish.yml:337)if [ -z "$E2B_API_KEY" ]; then— workflow run step (.github/workflows/publish.yml:362)if [ -n "${{ matrix.target }}" ]; then— workflow run step (.github/workflows/publish.yml:429)base="plugins/evo/bin/evo-hook-drain-rs/target"— workflow run step (.github/workflows/publish.yml:438)cargo build --release --target aarch64-apple-darwin— workflow run step (.github/workflows/publish.yml:474)cargo build --release --target x86_64-apple-darwin— workflow run step (.github/workflows/publish.yml:477)lipo -create -output evo-hook-drain-darwin \— workflow run step (.github/workflows/publish.yml:480)TAG="${GITHUB_REF##*/}"— workflow run step (.github/workflows/publish.yml:508)ls -lh /tmp/hook-drain-bins/— workflow run step (.github/workflows/publish.yml:547)TAG="${{ steps.notes.outputs.tag }}"— workflow run step (.github/workflows/publish.yml:552)plugins/evo/bin/evo-hook-drain-rs/Cargo.lock— dependency lockfile (plugins/evo/bin/evo-hook-drain-rs/Cargo.lock:1)plugins/evo/uv.lock— dependency lockfile (plugins/evo/uv.lock:1)What evo proposes
evoc.yaml:What accepting this PR means
evoc.yaml: the committed file IS the approval, exactly as for a hand-written config.baseline: autois resolved by calibration on first run; no number here was invented.triggers:section becomes continuous monitors: when the metric degrades past baseline for consecutive checks, evo starts an optimization run and proposes the fix as a PR. Monitors are detectors, not verdicts -- the statistics happen in the run they trigger.Dashboard: https://pest-inserted-chronicle-indicators.trycloudflare.com/w/gh-evo-hq-evo · Workspace:
gh-evo-hq-evo