Agency

A self-hosted engine for composing, evaluating, and evolving AI agents.

Agency treats AI agents not as monolithic system prompts but as composable, evaluable, evolvable entities — structured the way organisations have always structured human work. It decomposes agents into independent primitives (role components, desired outcomes, trade-off configurations), composes them into agents matched to specific tasks, evaluates their performance, and uses the results to improve future compositions.

Agency does not execute tasks. It composes agent descriptions and returns them to a task manager (Claude Code, Superpowers, or any MCP-compatible system) which executes the work. Agency handles composition, evaluation, and evolution; the task manager handles execution.

Why this exists

Current approaches to commissioning AI agents ignore nearly everything organisations have learnt about getting good work from agents (people). Three problems stand out:

No way to specify subjective trade-offs. You can tell an agent what to do, but not how to navigate the trade-offs inherent in any real task — speed vs thoroughness, cost vs quality, exploration vs reliability. Agency makes trade-off preferences explicit and controllable through structured trade-off configurations, based on the Boris methodology for trade-off articulation.
No inter-deployment memory. An agent performs a task well or poorly, then starts from zero on the next similar task. Agency records structured performance data across deployments, building a track record for each primitive and composition.
No controlled evolution. Agent configurations are static — a human writes them, a human edits them. Agency provides mechanisms for mutation, recombination, and selection of agent components, under human-defined fitness criteria.

The underlying model draws on research into how high-performing R&D teams structure roles under uncertainty: roles as modular assemblages of discrete components that can be independently proposed, tested, evaluated, and recombined. Agency applies this same structural logic to AI agents.

Core concepts

Role components — individual capabilities an agent brings to a task
Desired outcomes — what success looks like, specific enough for an evaluator to grade against
Trade-off configurations — acceptable and unacceptable trade-offs governing how work is done
Agents — compositions of role components + desired outcomes + trade-off configurations, matched to tasks by semantic similarity
Evaluators — specialised agents that grade task output against desired outcomes, building the performance data that drives evolution

Quick start

# Install (pipx recommended — keeps agency on PATH across sessions)
pipx install agency-engine

# If you use pip instead, the command is only available while the venv is active
# pip install agency-engine

# Set up
agency init

# Start the server
agency serve

# Verify
curl http://127.0.0.1:8000/health
# Returns: {"status": "ok", "version": "<your installed version>"}

Integration guides

Agency works with different task execution systems. Choose the guide that matches your setup:

You are using	Guide
Claude Code (MCP tools directly)	Using Agency as an MCP with Claude Code
Superpowers (brainstorming, plans, subagent dispatch)	Using Agency with Superpowers
Workgraph (shell-based batch task execution)	Using Agency with Workgraph

The Claude Code guide is the primary reference — it covers all 6 MCP tools, setup, response formats, and troubleshooting. The Superpowers guide explains how Agency fits into Superpowers workflows. The Workgraph guide covers the standalone shell-based integration.

MCP tools

Agency exposes six tools via the Model Context Protocol:

Tool	Purpose
`agency_assign`	Compose agents for tasks, receive rendered prompts
`agency_evaluator`	Get evaluation criteria and callback JWT for completed tasks
`agency_submit_evaluation`	Submit structured evaluation results
`agency_list_projects`	Discover available projects
`agency_create_project`	Create new projects
`agency_status`	Check instance health, task progress, primitive counts

The full caller protocol (assign → execute → evaluate) is documented in the Claude Code integration guide.

Licence

Code is licensed under the Elastic License 2.0. You may use, copy, distribute, and prepare derivative works of the software, but you may not provide it to third parties as a hosted or managed service.

Written content, specifications, and documentation are licensed under Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to share and adapt them for any purpose, provided you give appropriate credit to Vaughn Tan.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
docs/integrations		docs/integrations
primitives		primitives
src/agency		src/agency
tests		tests
translators		translators
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agency-status.json		agency-status.json
pyproject.toml		pyproject.toml
specification.md		specification.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agency

Why this exists

Core concepts

Quick start

Integration guides

MCP tools

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agency

Why this exists

Core concepts

Quick start

Integration guides

MCP tools

Licence

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages