evo: proposed experiment config by evo-core-dryrun[bot] · Pull Request #71 · evo-hq/evo

evo-core-dryrun · 2026-06-11T18:50:37Z

evo: proposed experiment config

evo analyzed this repository and proposes the configuration below. Every detection cites its evidence; everything marked INFERRED is a guess for you to correct.

What evo detected

Language: rust — Cargo.toml marker file (plugins/evo/bin/evo-hook-drain-rs/Cargo.toml:1)
Language: javascript — package.json marker file (plugins/evo/npm/package.json:1)
Language: python — pyproject.toml marker file (plugins/evo/pyproject.toml:1)
Test command: cargo test — cargo manifest (plugins/evo/bin/evo-hook-drain-rs/Cargo.toml:1)
Test command: npm test — package.json test script: 'node --test test/*.test.js' (sdk/node/package.json:20)
Test command: python -m pytest -q — conftest.py present (tests/conftest.py:1)
Benchmark candidate: python tests/fixtures/auto_harness_demo/benchmark.py — benchmark-named script (tests/fixtures/auto_harness_demo/benchmark.py:1)
Benchmark candidate: python tests/fixtures/release_smoke/repo/bench.py — benchmark-named script (tests/fixtures/release_smoke/repo/bench.py:1)
Benchmark candidate: python tests/fixtures/tau3_demo/benchmark.py — benchmark-named script (tests/fixtures/tau3_demo/benchmark.py:1)
Eval script: python scripts/rlm_eval/score.py — evaluation-named script (scripts/rlm_eval/score.py:1)
Eval script: python scripts/rlm_eval/score_llm.py — evaluation-named script (scripts/rlm_eval/score_llm.py:1)
CI command: python3 scripts/check_versions.py — workflow run step (.github/workflows/ci.yml:17)
CI command: shell: bash — workflow run step (.github/workflows/ci.yml:28)
CI command: python -m pip install --upgrade pip build — workflow run step (.github/workflows/ci.yml:35)
CI command: python -m build — workflow run step (.github/workflows/ci.yml:38)
CI command: python - <<'PY' — workflow run step (.github/workflows/ci.yml:42)
CI command: python -m venv "$RUNNER_TEMP/smoke" — workflow run step (.github/workflows/ci.yml:58)
CI command: python -m pip install --upgrade pip build — workflow run step (.github/workflows/ci.yml:80)
CI command: python -m build — workflow run step (.github/workflows/ci.yml:83)
CI command: python -m pip install dist/*.whl — workflow run step (.github/workflows/ci.yml:86)
CI command: python sdk/python/test/test_run.py — workflow run step (.github/workflows/ci.yml:88)
CI command: npm test — workflow run step (.github/workflows/ci.yml:103)
CI command: shell: bash — workflow run step (.github/workflows/ci.yml:114)
CI command: cargo build --release — workflow run step (.github/workflows/ci.yml:127)
CI command: git config --global user.email "ci@evo.test" — workflow run step (.github/workflows/ci.yml:130)
CI command: python -m pip install -e plugins/evo — workflow run step (.github/workflows/ci.yml:135)
CI command: pytest tests/unit/ -q — workflow run step (.github/workflows/ci.yml:138)
CI command: git config --global user.email "ci@evo.test" — workflow run step (.github/workflows/publish.yml:45)
CI command: python3 scripts/check_versions.py — workflow run step (.github/workflows/publish.yml:51)
CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:55)
CI command: python -m pip install --upgrade pip — workflow run step (.github/workflows/publish.yml:63)
CI command: pytest tests/unit/ -q — workflow run step (.github/workflows/publish.yml:67)
CI command: python tests/e2e/test_e2e.py — workflow run step (.github/workflows/publish.yml:73)
CI command: python -m pip install --upgrade pip build twine — workflow run step (.github/workflows/publish.yml:92)
CI command: python -m build — workflow run step (.github/workflows/publish.yml:95)
CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:99)
CI command: twine check dist/* — workflow run step (.github/workflows/publish.yml:112)
CI command: twine upload --non-interactive dist/* — workflow run step (.github/workflows/publish.yml:118)
CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:138)
CI command: npm test — workflow run step (.github/workflows/publish.yml:151)
CI command: # Pre-release semver (e.g. 0.4.0-alpha.1, 0.4.0-rc.1) ships under — workflow run step (.github/workflows/publish.yml:157)
CI command: bash plugins/evo/npm/scripts/sync-from-source.sh — workflow run step (.github/workflows/publish.yml:184)
CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:188)
CI command: # Pre-release semver (e.g. 0.4.2-alpha.2) ships under the — workflow run step (.github/workflows/publish.yml:204)
CI command: python3 scripts/check_versions.py — workflow run step (.github/workflows/publish.yml:229)
CI command: python -m pip install --upgrade pip build twine — workflow run step (.github/workflows/publish.yml:231)
CI command: python -m build — workflow run step (.github/workflows/publish.yml:234)
CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:238)
CI command: twine check dist/* — workflow run step (.github/workflows/publish.yml:251)
CI command: whl=$(ls dist/*.whl) — workflow run step (.github/workflows/publish.yml:255)
CI command: twine upload --non-interactive dist/* — workflow run step (.github/workflows/publish.yml:266)
CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:311)
CI command: python -m pip install --upgrade pip — workflow run step (.github/workflows/publish.yml:326)
CI command: python - <<'PY' — workflow run step (.github/workflows/publish.yml:337)
CI command: if [ -z "$E2B_API_KEY" ]; then — workflow run step (.github/workflows/publish.yml:362)
CI command: if [ -n "${{ matrix.target }}" ]; then — workflow run step (.github/workflows/publish.yml:429)
CI command: base="plugins/evo/bin/evo-hook-drain-rs/target" — workflow run step (.github/workflows/publish.yml:438)
CI command: cargo build --release --target aarch64-apple-darwin — workflow run step (.github/workflows/publish.yml:474)
CI command: cargo build --release --target x86_64-apple-darwin — workflow run step (.github/workflows/publish.yml:477)
CI command: lipo -create -output evo-hook-drain-darwin \ — workflow run step (.github/workflows/publish.yml:480)
CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:508)
CI command: ls -lh /tmp/hook-drain-bins/ — workflow run step (.github/workflows/publish.yml:547)
CI command: TAG="${{ steps.notes.outputs.tag }}" — workflow run step (.github/workflows/publish.yml:552)
Lockfile: plugins/evo/bin/evo-hook-drain-rs/Cargo.lock — dependency lockfile (plugins/evo/bin/evo-hook-drain-rs/Cargo.lock:1)
Lockfile: plugins/evo/uv.lock — dependency lockfile (plugins/evo/uv.lock:1)

What evo proposes

evoc.yaml:

workspace: gh-evo-hq-evo
goal:
  metric: latency_ms
  direction: min
baseline: auto
rules:
  win_threshold: 0
  harm_threshold: 0
  min_n: 20
  max_n: 400
profile:
  blast_radius: sandbox
direction_source: inferred
triggers:
  - metric: latency_ms
    comparator: >
    baseline: auto
    source:
      kind: harness
    interval_minutes: 60
    action: optimize
notes:
  - 'harness INFERRED from tests/fixtures/auto_harness_demo/benchmark.py:1 (benchmark-named script); verify it emits JSON lines'
  - 'goal INFERRED: benchmark script found, guessing latency_ms/min; rename to what the harness emits'
  - 'optimize.on_accept: once this config is accepted, evo calibrates the harness, resolves baselines, and starts the first optimization run automatically (needs an optimize.proposer; set on_accept: false to disable)'
harness: python tests/fixtures/auto_harness_demo/benchmark.py
optimize:
  on_accept: true

harness INFERRED from tests/fixtures/auto_harness_demo/benchmark.py:1 (benchmark-named script); verify it emits JSON lines
goal INFERRED: benchmark script found, guessing latency_ms/min; rename to what the harness emits
optimize.on_accept: once this config is accepted, evo calibrates the harness, resolves baselines, and starts the first optimization run automatically (needs an optimize.proposer; set on_accept: false to disable)

What accepting this PR means

Merging commits evoc.yaml: the committed file IS the approval, exactly as for a hand-written config.
baseline: auto is resolved by calibration on first run; no number here was invented.
The triggers: section becomes continuous monitors: when the metric degrades past baseline for consecutive checks, evo starts an optimization run and proposes the fix as a PR. Monitors are detectors, not verdicts -- the statistics happen in the run they trigger.
Nothing executes against this repository until this file is merged.

Dashboard: https://pest-inserted-chronicle-indicators.trycloudflare.com/w/gh-evo-hq-evo · Workspace: gh-evo-hq-evo

alokwhitewolf · 2026-06-11T18:51:51Z

test installation -- evo platform integration moved to the private dry-run repo

devin-ai-integration

Devin Review found 2 potential issues.

devin-ai-integration · 2026-06-11T18:51:53Z

🚩 Configuration file placed at repo root rather than in tests/fixtures

This evoc.yaml is placed at the repository root and references a test fixture harness (tests/fixtures/auto_harness_demo/benchmark.py). If this file is meant to be consumed by an external tool scanning the repo root, it would point that tool at a test fixture rather than production code. If it's only meant for demo/testing purposes, it might be more appropriate to place it alongside the fixture it references. No other code in the repo references this file.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-06-11T18:51:55Z

+  metric: latency_ms
+  direction: min


🚩 Configured metric does not match harness output

The goal.metric is set to latency_ms (line 3), but the referenced harness at tests/fixtures/auto_harness_demo/benchmark.py:91 emits {"score": ..., "tasks": ...} — there is no latency_ms key. The notes on lines 23-24 explicitly acknowledge this is an inferred guess ('goal INFERRED: benchmark script found, guessing latency_ms/min; rename to what the harness emits'), so this appears intentional as scaffolding. However, if this config were ever activated as-is without editing, the optimization tool would likely fail to find the expected metric. Worth confirming this is only scaffolding and not meant to be used directly.

Was this helpful? React with 👍 or 👎 to provide feedback.

evo: propose experiment config

e906791

devin-ai-integration Bot reviewed Jun 11, 2026

View reviewed changes

alokwhitewolf closed this Jun 11, 2026

alokwhitewolf deleted the evo/config branch June 11, 2026 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evo: proposed experiment config#71

evo: proposed experiment config#71
evo-core-dryrun[bot] wants to merge 1 commit into
mainfrom
evo/config

evo-core-dryrun Bot commented Jun 11, 2026

Uh oh!

alokwhitewolf commented Jun 11, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jun 11, 2026

Uh oh!

devin-ai-integration Bot Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

evo-core-dryrun Bot commented Jun 11, 2026

evo: proposed experiment config

What evo detected

What evo proposes

What accepting this PR means

Uh oh!

alokwhitewolf commented Jun 11, 2026

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant