Skip to content

gadievron/cve-env

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cve-env

Agentic CVE → Docker environment builder.

Given a CVE ID, cve-env builds a Docker environment running the affected application at its pre-patch version and verifies the build is correct. The agent researches each CVE live (NVD + OSV + GitHub + container registries) and picks its own build path.

Version: 1.0 beta Author: Gadi Evron (@gadievron)


What success means

status="success" requires both:

  1. Right version — a version-assertion exec_check (pip show, dpkg -l, openssl version, find / -name '*.jar', …) proving the deployed binaries fall in the CVE's affected range.
  2. Working app — functional-smoke checks proving normal operation on benign input (e.g. HTTP GET / + a content match, or a SELECT 1 roundtrip for a database).

If verify runs but that evidence is incomplete, the runtime records the honest fallback verified_partial instead. cve-env's job ends at a correct, working environment — it builds and verifies; it does not assess or attempt the vulnerability.

Every build runs in a hardened container: --cap-drop ALL, --security-opt no-new-privileges:true, ports bound to 127.0.0.1 only.


Requirements

  • Python 3.12+ and uv
  • A Docker daemon reachable via docker on PATH. Developed and tested against Colima; any Docker-compatible daemon (Docker Desktop, OrbStack, rootless Docker) should work but is untested
  • The claude CLI, logged in — cve-env authenticates through your existing Claude Code session via claude-agent-sdk. No ANTHROPIC_API_KEY is consumed.

Install

git clone <this-repo> && cd cve-env
uv sync

Quick start

# Build + verify one CVE (prints a per-stage report at the end)
uv run cve-env build CVE-2014-0160

# Probe external services (NVD, OSV, GitHub, Docker Hub, registries)
uv run cve-env doctor

Per-CVE audit traces (every tool call, result, and cost) are written to output/agentic/<run-id>/<cve>.jsonl.


How it works

The agent picks each tool from the CVE record and prior results — there's no fixed script. Work moves through five stages (the stage of each tool is defined by config.TOOL_TO_STAGE):

Stage Goal Tools
RESEARCH Ground the CVE — product, affected version, source repo nvd_lookup (OSV.dev fallback), github_fetch
RESOLVE Find a pre-built image to pull image_resolve (registry cascade)
ACQUIRE Build an image when no pre-built one fits dockerfile_gen, docker_build, source_build
LAUNCH Start the environment docker_run, docker_compose_up, run_in_container
VERIFY Prove it's the right version and works verify (7 check types); give_up ends the run

RESOLVE and ACQUIRE are alternatives — pull an existing image, or build one. The decision cascade:

  1. Research → product + affected version + (for OSS) the source repo.
  2. Resolve: try to pull a pre-built image. image_resolve walks a registry cascade — Docker Hub → mirror.gcr.ioquay.ioghcr.iomcr.microsoft.com — with transport-class retry and an architecture check. If one fits → go to launch.
  3. Acquire (only when no pre-built image fits): build one —
    • source_build — clone the OSS repo at the pre-patch tag/commit and build it;
    • dockerfile_gendocker_build — synthesize a Dockerfile (library / language CVEs);
    • plugin overlay — pull a clean host image (e.g. wordpress:5.6), then copy_ops the vulnerable plugin onto it;
    • forge cascade — fetch code hosted off GitHub (WP-SVN / OSDN / SourceForge).
  4. Launchdocker_run (single service) or docker_compose_up (multi-service vulhub stacks).
  5. Verify, then self-heal on failure — retry, pivot registry, pivot resolve→build, or give_up with a reason.

These are cve-env's 11 custom MCP tools; the agent also uses the Claude Code SDK's built-in tools (Bash, Write, Read, …) for staging files, git clone, and direct shell steps — Bash is in fact its most-used tool. Recovery is the agent's, not an orchestrator's: empty nvd_lookup → OSV.dev; rate-limited image_resolve → next registry (or a generic base image + manual install); source_build with no matching tag → build directly from a git clone in the Dockerfile.

Verification — proving the environment works

VERIFY builds evidence in layers, from "is it up" to "is it the right version doing real work." All HTTP/TCP probes hit the container's published port, bound to 127.0.0.1 only — a non-loopback target is rejected, so the agent can't probe the host network.

Layer Check type(s) Proves
Readiness container_status, stability_wait, log_check the container started and stayed up; an expected startup / health marker appears in docker logs
Networking http_check the service answers on its port — and the body isn't empty (a zero-byte 200 fails)
Actual usage http_request_check (send a request body, match the response), tcp_probe_check (raw TCP send/receive — Redis / Postgres / SMTP / SSH / Memcached …), exec_check (run a command inside the container, assert stdout + exit code) the application does real work on benign input
Version proof exec_check (e.g. openssl version, pip show, dpkg -l, find / -name '*.jar') the deployed binaries fall in the CVE's affected range

Functional smoke is matched to the application type — not just an HTTP ping. A web app gets a page fetch with a content match (and a deliberate 404); a database gets a query roundtrip (SELECT 1, INSERT/SELECT); a cache / wire-protocol service gets a protocol probe (e.g. a Redis PING+PONG over raw TCP, or redis-cli ping via exec_check); a library gets a trivial-use exec_check. The has_functional_smoke heuristic is the single gate that decides success vs verified_partial (it passes on ≥3 active checks, an http_check with a content match, or http_checks on ≥2 distinct paths).

success requires both a version-assertion and functional smoke. If the agent's plan is missing functional smoke, the runtime nudges it in real time and — for HTTP services — can auto-inject the missing checks; if the evidence is still incomplete, the honest fallback is verified_partial. (stability_wait auto-bumps to 120s for slow-booting JVM images; container_status is auto-prepended if omitted.)

Outcome statuses — every build writes exactly one:

Status Meaning
success built + verified (right version and functional smoke)
verified_partial built and verify passed, but the version/smoke evidence was incomplete
verify_failed built and launched, but verification did not pass
launched_no_verify container launched but the run ended before any verify check
unresolvable gave up — no buildable target (proprietary, kernel/firmware, or no image and no source)
turn_cap hit the max-turns cap before converging
budget_exhausted hit the cost cap before converging
rate_limited gave up to external rate-limiting (Docker Hub / NVD / API)
interrupted the run was interrupted before completing
error engine or API failure

Credentials & rate limits

cve-env defaults to anonymous tiers everywhere — no credentials required. Setting any of these raises the corresponding limit and reduces CVEs lost to transient throttling:

Service Env var(s) Anonymous With credential
NVD API NVD_API_KEY 5 req/30s 50 req/30s
GitHub API GITHUB_TOKEN 60 req/hr 5,000 req/hr
Docker Hub DOCKER_USERNAME + DOCKER_PASSWORD 100 pulls/6h 200 pulls/6h

Set them via .env (copy .env.example) or your shell profile. The GitHub token is resolved in order: GITHUB_TOKEN env → gh auth token (if the GitHub CLI is installed and logged in) → anonymous; an existing docker login session is reused likewise. The gh CLI is optional — only a convenient token source, not a dependency. Tokens are sent as request headers (never in URLs) and are redacted from audit logs.


Configuration

Settings resolve by precedence: CLI flag → env var (CVE_ENV_<UPPER_SNAKE>) → cve-env.toml → built-in default.

  • Per-CVE caps — set per run via CLI flags --max-turns, --max-cost-usd, --turn-extension-pct, --max-turn-extensions (built-in defaults: 24 turns, $0.60 soft cost, +20% × 2 extensions). The cost-extension knobs also have env vars (CVE_ENV_MAX_COST_EXTENSIONS, CVE_ENV_COST_EXTENSION_PCT); model via CVE_ENV_MODEL.
  • Per-stage soft budgetscve-env.toml [budget] block (copy cve-env.toml.example).
  • Behavior / safety knobsCVE_ENV_DISALLOWED_TOOLS (disable built-in agent tools), the source_build size caps (CVE_ENV_MAX_TARBALL_BYTES, CVE_ENV_MAX_EXTRACT_BYTES, …), and the lifecycle hooks below.

For the complete surface: cve-env build --help lists every CLI flag; .env.example documents the env vars; cve-env.toml.example shows the TOML keys; config.py holds the defaults.

Lifecycle hooks (opt-in, default off)

Env var CLI flag Action (post-build)
CVE_ENV_AUTO_CLEANUP_CONTAINERS=1 --auto-cleanup-containers docker rm -f this run's labeled containers (concurrency-safe)
CVE_ENV_AUTO_PRUNE_IMAGES=1 --auto-prune-images docker image prune -f (dangling layers only)
CVE_ENV_AUTO_STOP_COLIMA=1 --auto-stop-colima colima stop if no other cve-env build is running

Defaults are off so iterative use keeps containers and Colima warm.


Design principles

  1. Agentic, not corpus. No CVE → image dictionary; every run researches live and chooses its own path.
  2. Session auth, not API key. claude-agent-sdk uses your Claude Code session; setting_sources=[] + skills=[] keep the agent's context free of your global rules.
  3. Self-healing through retry, not orchestration. Verify failures, rate limits, and empty lookups are recovered by the agent + per-CVE runtime state (cooldowns, arch counters, refusal latches), reset between CVEs.
  4. Correctness gated at runtime. success requires version-assertion and functional-smoke; anything less is verified_partial. A false-positive success is not reachable by construction.
  5. Every refusal is logged for post-run analysis.

Known limitations (declared, not bugs)

Figures below are from 1,838 benched runs across 33 benches on the dev corpus.

  • Kernel / firmware / hardware / non-Linux CVEs aren't buildable. A Docker container can't be a kernel, a firmware image, or another OS — so CVEs in kernel drivers (e.g. Arm Mali GPU), firmware/BIOS (e.g. coreboot SMM), or non-Linux systems (e.g. FreeBSD) can't be reproduced as an application environment. The engine detects these and gives up cleanly (arch_incompatible).
  • Architecture is handled; the host platform tested-surface is narrow. cve-env detects the host arch (arm64 / amd64), pulls the native image, uses Rosetta to run amd64 images on Apple-Silicon macOS, and falls back to source-build when no compatible image exists — benches ran on both arm64 and amd64. But it has only ever been run on a macOS host + Colima; Linux/Windows hosts, Windows containers, and non-Colima Docker daemons are untested.
  • Proprietary-vendor CVEs whose CPE vendor overlaps closed-source products can't be fast-rejected by vendor match; cve-env spends a small research budget, then gives up cleanly (proprietary was the dominant give-up in benches).
  • Run-to-run variance on borderline CVEs — a few oscillate success ↔ unresolvable from agent-reasoning variance + external state.
  • Cost — it isn't free. Each build spends Claude tokens. A successful build cost a median ~$1.00 (p90 $1.69, max $2.49 across 602 successes); runs that give up early are cheaper (≈$0.13 median overall). Per-CVE cost is bounded by a configurable cap — monitor it on large runs.
  • Caps cut off the hardest CVEs. Each CVE has a turn cap and a cost cap; ~7% of runs hit one before building (turn_cap 6.9%, budget_exhausted 0.3%). A CVE that genuinely needs more than its cap won't finish. Both are configurable (--max-turns, --max-cost-usd, env, cve-env.toml).
  • Multi-step CMS auth+seed flows (e.g. admin-authed plugin SQLi) don't always converge within budget.

Dependencies

The four runtime deps (claude-agent-sdk, pydantic, pyyaml, requests) carry floors set to the versions cve-env was validated against and upper bounds at the next major (next minor for claude-agent-sdk during its 0.x series), so an API drift can't silently break a build. uv.lock pins the full transitive tree; uv sync installs exactly that. Bump a dependency intentionally, then re-validate.

Project structure

The wheel ships only src/cve_env (pyproject.tomlpackages = ["src/cve_env"]):

src/cve_env/
├── cli.py                  # `cve-env build <cve> | doctor`
├── config.py               # model, caps, paths, env overrides
├── models.py               # Outcome dataclass + status enum
├── policy.py / validators.py   # P14/P17/P18 build invariants
├── agent/
│   ├── llm.py              # claude-agent-sdk wrapper + retry
│   ├── loop.py             # turn loop, status mapping, audit write
│   ├── prompts.py          # system + user prompt rendering
│   ├── tools.py            # the 11 MCP tool registrations
│   ├── audit.py            # per-CVE JSONL writer (secret-redacting)
│   └── refusals.py         # refusal scanner + writer
├── tools/                  # nvd_lookup, github_fetch, image_resolve, source_build,
│                           # dockerfile_gen, docker_build, docker_run,
│                           # docker_compose_up, run_in_container, verify, web_fetch, arch
├── infra/service_health.py # `cve-env doctor`
└── utils/                  # run, lifecycle, safe_env, dockerfile_hygiene, …

Security posture

cve-env runs LLM-generated commands and fetches CVE research from the live internet — so it's built to contain that, and you should run it accordingly.

Built-in protections (automatic):

  • Builds and runs in a container — --cap-drop ALL, --security-opt no-new-privileges:true, ports bound to 127.0.0.1 only.
  • The in-process URL fetcher is SSRF-guarded — scheme allowlist + private/metadata-IP denylist + DNS-rebind re-resolution. Downloaded source archives are size-capped and path-traversal-/symlink-checked on extraction.
  • Audit logs redact some known secret-token shapes and are written owner-only (dir 0700, files 0600).

Recommended when you run it (operator's responsibility):

  • Point docker at a non-root, isolated context (e.g. a Colima VM), so a build escape lands in the VM, not your host.
  • Optional: set CVE_ENV_DISALLOWED_TOOLS=WebFetch,WebSearch to disable the agent's built-in general web tools (extra SSRF-surface reduction, at some loss of research reach).

Disclaimer

Provided under the MIT License (see LICENSE) with no warranty. Not fully validated; outputs may be incomplete or inaccurate, and costs must be monitored closely. Do not deploy in production — use only for defensive research in lab environments.

License

MIT — see LICENSE.

About

Agentic CVE → Docker environment builder: given a CVE ID, builds and verifies a Docker environment running the affected application at its pre-patch version.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages