akm-eval

AKM Eval runs real benchmark packs through authoritative upstream harnesses and normalizes the outputs.

Part of the akm ecosystem — see also akm-stash, akm-plugins, akm-registry, and akm-bench.

Trust policy:

no synthetic or heuristic success metrics
no silent fallback when an official harness or evaluator is unavailable
baseline and future AKM variants both use real model providers

Host requirements

For normal bin/... usage:

bash
docker with a running daemon
uv
one real model-provider setup:
- opencode: config/opencode.json plus required env such as OPENCODE_API_KEY
- or openai-compatible: a reachable endpoint plus its required env/config

Extra pack requirements still apply:

beam: local vendor/BEAM checkout, prepared official datasets, and judge configuration
terminal-bench: opencode provider path only

bun is only required for repo development tasks.

Quick start

bin/build-image
bin/doctor --pack locomo
bin/eval --pack locomo --variant baseline --config config/common/locomo-smoke.json

Common runnable configs live under config/common/; see docs/running-evals.md for the current list.

config/common/locomo-smoke.json
config/common/longmemeval-smoke.json
config/common/beam-smoke.json
config/common/swe-bench-smoke.json
config/common/swe-bench-smoke-openai-compatible.json
config/common/tau-bench-smoke.json
config/common/terminal-bench-smoke.json

Supported packs

locomo
longmemeval
beam
swe-bench
terminal-bench
tau-bench
akm-bench remains intentionally blocked

Runner support

Pack	`opencode`	`openai-compatible`
`locomo`	Yes	Yes
`longmemeval`	Partial	Yes
`beam`	Yes	Yes
`swe-bench`	Yes	Yes
`tau-bench`	No	Yes
`terminal-bench`	Yes	No
`akm-bench`	No	No

Docs

command flow: docs/running-evals.md
operator caveats and exceptions: docs/operator-guide.md
pack constraints: docs/benchmark-packs.md
remaining external blockers: docs/operator-blockers.md
normalized result contract: docs/result-schema.md
contributor guide: docs/contributing.md

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
bin		bin
config		config
docker		docker
docs		docs
runs		runs
scripts		scripts
src		src
tests		tests
tools		tools
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
package.json		package.json
requirements-beam.txt		requirements-beam.txt
requirements-smoke.txt		requirements-smoke.txt
requirements-swebench.txt		requirements-swebench.txt
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

akm-eval

Host requirements

Quick start

Supported packs

Runner support

Docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

akm-eval

Host requirements

Quick start

Supported packs

Runner support

Docs

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages