Skip to content

Maxbanker/negentropic-forge

Repository files navigation

Negentropic Forge — MVP

A tiny, CPU-only reference implementation of the Forge (Project #1) so you can run seeded stress tests and compute a survival-percentile leaderboard in minutes.

Quickstart

# (Optional) create a venv, then:
pip install -r requirements.txt

# Run 500 seeded stress tests on the baseline EchoAgent
python -m forge run --agent agents.echo_agent:EchoAgent \
  --scenarios scenarios/basic.jsonl --n 500 --out runs/echo_500

# Compute the 95th percentile survival score and publish a leaderboard JSON
python -m forge leaderboard --runs runs --percentile 95 --out public/leaderboard.json

Outputs are written under runs/<name>/ as telemetry.ndjson and summary.json. public/leaderboard.json collects summaries across run folders.

Concepts (MVP Simplifications)

  • Agent interface: a simple respond(prompt) method.

  • Scenarios: JSONL; each line is a dict with fields:

    • type: one of collapse, veil, export, drift
    • prompt: text sent to the agent
    • seeds: optional list of integers to deterministic-ize sampling
  • Telemetry vector per iteration: {psi, gamma, Omega, V, O, kappa, Lambda_leak, E_safe, B_export, eps_adv, tau_curl, dt_prime_ms}

    • Here computed with lightweight heuristics (pattern checks, lengths, latency).
    • Replace with your real metrics later.
  • Gates & thresholds: policies/gates.toml

This is intentionally minimal: the goal is reproducible scoring and a public JSON leaderboard.


Docker

docker build -t forge .
docker run --rm -v $PWD:/app forge python -m forge run       --agent agents.echo_agent:EchoAgent --scenarios scenarios/tiny_pack.jsonl       --n 200 --out runs/echo_200

docker run --rm -v $PWD:/app forge python -m forge leaderboard       --runs runs --percentile 95 --out public/leaderboard.json

Second Baseline

Try the cautious parrot:

python -m forge run --agent agents.parrot_safe:ParrotSafe       --scenarios scenarios/tiny_pack.jsonl --n 300 --out runs/parrot_300
python -m forge leaderboard --runs runs --percentile 95 --out public/leaderboard.json

CI

A GitHub Actions workflow (.github/workflows/ci.yml) runs a tiny demo and uploads the leaderboard JSON as an artifact.

About

Public arena for agent swarms: collapse/veil/export stress tests, hard gates, and S₉₅ leaderboards. Proof-backed safety, CPU-friendly.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors