Skip to content

evo: proposed experiment config#71

Closed
evo-core-dryrun[bot] wants to merge 1 commit into
mainfrom
evo/config
Closed

evo: proposed experiment config#71
evo-core-dryrun[bot] wants to merge 1 commit into
mainfrom
evo/config

Conversation

@evo-core-dryrun

Copy link
Copy Markdown

evo: proposed experiment config

evo analyzed this repository and proposes the configuration below. Every detection cites its evidence; everything marked INFERRED is a guess for you to correct.

What evo detected

  • Language: rust — Cargo.toml marker file (plugins/evo/bin/evo-hook-drain-rs/Cargo.toml:1)
  • Language: javascript — package.json marker file (plugins/evo/npm/package.json:1)
  • Language: python — pyproject.toml marker file (plugins/evo/pyproject.toml:1)
  • Test command: cargo test — cargo manifest (plugins/evo/bin/evo-hook-drain-rs/Cargo.toml:1)
  • Test command: npm test — package.json test script: 'node --test test/*.test.js' (sdk/node/package.json:20)
  • Test command: python -m pytest -q — conftest.py present (tests/conftest.py:1)
  • Benchmark candidate: python tests/fixtures/auto_harness_demo/benchmark.py — benchmark-named script (tests/fixtures/auto_harness_demo/benchmark.py:1)
  • Benchmark candidate: python tests/fixtures/release_smoke/repo/bench.py — benchmark-named script (tests/fixtures/release_smoke/repo/bench.py:1)
  • Benchmark candidate: python tests/fixtures/tau3_demo/benchmark.py — benchmark-named script (tests/fixtures/tau3_demo/benchmark.py:1)
  • Eval script: python scripts/rlm_eval/score.py — evaluation-named script (scripts/rlm_eval/score.py:1)
  • Eval script: python scripts/rlm_eval/score_llm.py — evaluation-named script (scripts/rlm_eval/score_llm.py:1)
  • CI command: python3 scripts/check_versions.py — workflow run step (.github/workflows/ci.yml:17)
  • CI command: shell: bash — workflow run step (.github/workflows/ci.yml:28)
  • CI command: python -m pip install --upgrade pip build — workflow run step (.github/workflows/ci.yml:35)
  • CI command: python -m build — workflow run step (.github/workflows/ci.yml:38)
  • CI command: python - <<'PY' — workflow run step (.github/workflows/ci.yml:42)
  • CI command: python -m venv "$RUNNER_TEMP/smoke" — workflow run step (.github/workflows/ci.yml:58)
  • CI command: python -m pip install --upgrade pip build — workflow run step (.github/workflows/ci.yml:80)
  • CI command: python -m build — workflow run step (.github/workflows/ci.yml:83)
  • CI command: python -m pip install dist/*.whl — workflow run step (.github/workflows/ci.yml:86)
  • CI command: python sdk/python/test/test_run.py — workflow run step (.github/workflows/ci.yml:88)
  • CI command: npm test — workflow run step (.github/workflows/ci.yml:103)
  • CI command: shell: bash — workflow run step (.github/workflows/ci.yml:114)
  • CI command: cargo build --release — workflow run step (.github/workflows/ci.yml:127)
  • CI command: git config --global user.email "ci@evo.test" — workflow run step (.github/workflows/ci.yml:130)
  • CI command: python -m pip install -e plugins/evo — workflow run step (.github/workflows/ci.yml:135)
  • CI command: pytest tests/unit/ -q — workflow run step (.github/workflows/ci.yml:138)
  • CI command: git config --global user.email "ci@evo.test" — workflow run step (.github/workflows/publish.yml:45)
  • CI command: python3 scripts/check_versions.py — workflow run step (.github/workflows/publish.yml:51)
  • CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:55)
  • CI command: python -m pip install --upgrade pip — workflow run step (.github/workflows/publish.yml:63)
  • CI command: pytest tests/unit/ -q — workflow run step (.github/workflows/publish.yml:67)
  • CI command: python tests/e2e/test_e2e.py — workflow run step (.github/workflows/publish.yml:73)
  • CI command: python -m pip install --upgrade pip build twine — workflow run step (.github/workflows/publish.yml:92)
  • CI command: python -m build — workflow run step (.github/workflows/publish.yml:95)
  • CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:99)
  • CI command: twine check dist/* — workflow run step (.github/workflows/publish.yml:112)
  • CI command: twine upload --non-interactive dist/* — workflow run step (.github/workflows/publish.yml:118)
  • CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:138)
  • CI command: npm test — workflow run step (.github/workflows/publish.yml:151)
  • CI command: # Pre-release semver (e.g. 0.4.0-alpha.1, 0.4.0-rc.1) ships under — workflow run step (.github/workflows/publish.yml:157)
  • CI command: bash plugins/evo/npm/scripts/sync-from-source.sh — workflow run step (.github/workflows/publish.yml:184)
  • CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:188)
  • CI command: # Pre-release semver (e.g. 0.4.2-alpha.2) ships under the — workflow run step (.github/workflows/publish.yml:204)
  • CI command: python3 scripts/check_versions.py — workflow run step (.github/workflows/publish.yml:229)
  • CI command: python -m pip install --upgrade pip build twine — workflow run step (.github/workflows/publish.yml:231)
  • CI command: python -m build — workflow run step (.github/workflows/publish.yml:234)
  • CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:238)
  • CI command: twine check dist/* — workflow run step (.github/workflows/publish.yml:251)
  • CI command: whl=$(ls dist/*.whl) — workflow run step (.github/workflows/publish.yml:255)
  • CI command: twine upload --non-interactive dist/* — workflow run step (.github/workflows/publish.yml:266)
  • CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:311)
  • CI command: python -m pip install --upgrade pip — workflow run step (.github/workflows/publish.yml:326)
  • CI command: python - <<'PY' — workflow run step (.github/workflows/publish.yml:337)
  • CI command: if [ -z "$E2B_API_KEY" ]; then — workflow run step (.github/workflows/publish.yml:362)
  • CI command: if [ -n "${{ matrix.target }}" ]; then — workflow run step (.github/workflows/publish.yml:429)
  • CI command: base="plugins/evo/bin/evo-hook-drain-rs/target" — workflow run step (.github/workflows/publish.yml:438)
  • CI command: cargo build --release --target aarch64-apple-darwin — workflow run step (.github/workflows/publish.yml:474)
  • CI command: cargo build --release --target x86_64-apple-darwin — workflow run step (.github/workflows/publish.yml:477)
  • CI command: lipo -create -output evo-hook-drain-darwin \ — workflow run step (.github/workflows/publish.yml:480)
  • CI command: TAG="${GITHUB_REF##*/}" — workflow run step (.github/workflows/publish.yml:508)
  • CI command: ls -lh /tmp/hook-drain-bins/ — workflow run step (.github/workflows/publish.yml:547)
  • CI command: TAG="${{ steps.notes.outputs.tag }}" — workflow run step (.github/workflows/publish.yml:552)
  • Lockfile: plugins/evo/bin/evo-hook-drain-rs/Cargo.lock — dependency lockfile (plugins/evo/bin/evo-hook-drain-rs/Cargo.lock:1)
  • Lockfile: plugins/evo/uv.lock — dependency lockfile (plugins/evo/uv.lock:1)

What evo proposes

evoc.yaml:

workspace: gh-evo-hq-evo
goal:
  metric: latency_ms
  direction: min
baseline: auto
rules:
  win_threshold: 0
  harm_threshold: 0
  min_n: 20
  max_n: 400
profile:
  blast_radius: sandbox
direction_source: inferred
triggers:
  - metric: latency_ms
    comparator: >
    baseline: auto
    source:
      kind: harness
    interval_minutes: 60
    action: optimize
notes:
  - 'harness INFERRED from tests/fixtures/auto_harness_demo/benchmark.py:1 (benchmark-named script); verify it emits JSON lines'
  - 'goal INFERRED: benchmark script found, guessing latency_ms/min; rename to what the harness emits'
  - 'optimize.on_accept: once this config is accepted, evo calibrates the harness, resolves baselines, and starts the first optimization run automatically (needs an optimize.proposer; set on_accept: false to disable)'
harness: python tests/fixtures/auto_harness_demo/benchmark.py
optimize:
  on_accept: true
  • harness INFERRED from tests/fixtures/auto_harness_demo/benchmark.py:1 (benchmark-named script); verify it emits JSON lines
  • goal INFERRED: benchmark script found, guessing latency_ms/min; rename to what the harness emits
  • optimize.on_accept: once this config is accepted, evo calibrates the harness, resolves baselines, and starts the first optimization run automatically (needs an optimize.proposer; set on_accept: false to disable)

What accepting this PR means

  • Merging commits evoc.yaml: the committed file IS the approval, exactly as for a hand-written config.
  • baseline: auto is resolved by calibration on first run; no number here was invented.
  • The triggers: section becomes continuous monitors: when the metric degrades past baseline for consecutive checks, evo starts an optimization run and proposes the fix as a PR. Monitors are detectors, not verdicts -- the statistics happen in the run they trigger.
  • Nothing executes against this repository until this file is merged.

Dashboard: https://pest-inserted-chronicle-indicators.trycloudflare.com/w/gh-evo-hq-evo · Workspace: gh-evo-hq-evo

@alokwhitewolf

Copy link
Copy Markdown
Collaborator

test installation -- evo platform integration moved to the private dry-run repo

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

Open in Devin Review

Comment thread evoc.yaml

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Configuration file placed at repo root rather than in tests/fixtures

This evoc.yaml is placed at the repository root and references a test fixture harness (tests/fixtures/auto_harness_demo/benchmark.py). If this file is meant to be consumed by an external tool scanning the repo root, it would point that tool at a test fixture rather than production code. If it's only meant for demo/testing purposes, it might be more appropriate to place it alongside the fixture it references. No other code in the repo references this file.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread evoc.yaml
Comment on lines +3 to +4
metric: latency_ms
direction: min

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Configured metric does not match harness output

The goal.metric is set to latency_ms (line 3), but the referenced harness at tests/fixtures/auto_harness_demo/benchmark.py:91 emits {"score": ..., "tasks": ...} — there is no latency_ms key. The notes on lines 23-24 explicitly acknowledge this is an inferred guess ('goal INFERRED: benchmark script found, guessing latency_ms/min; rename to what the harness emits'), so this appears intentional as scaffolding. However, if this config were ever activated as-is without editing, the optimization tool would likely fail to find the expected metric. Worth confirming this is only scaffolding and not meant to be used directly.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@alokwhitewolf alokwhitewolf deleted the evo/config branch June 11, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant