AgentPack

Local context engine for AI coding agents.

AgentPack ranks relevant repository files and builds compact task-focused context packs for Claude Code, Codex, Cursor, Windsurf, Antigravity, MCP tools, CI jobs, and markdown-based LLM workflows.

It runs local/offline repo analysis, compresses selected files into a token budget, and keeps context fresh through CLI commands, MCP tools, hooks, and agent integrations. Use it when an AI coding agent needs a ranked starting map instead of burning tool calls rediscovering your repo.

AgentPack is a context preparation tool, not a coding agent.

One workflow matters:

route -> pack -> agent acts -> benchmark captures miss

First route the task to likely files, tests, rules, and skills:

pipx run --spec agentpack-cli agentpack route --task "fix auth token expiry"

Status: alpha (v0.3.21). Works, tested, and used in real sessions. Python and JavaScript/TypeScript are the best-supported languages. Current benchmarks are useful regression checks, not broad proof that AgentPack improves coding-agent success. API may change before 1.0.

Platform note: macOS, Linux, and Windows are supported. Windows support targets PowerShell plus Git for Windows. cmd.exe and bare Git setups are not a supported path yet.

Name note: PyPI package is agentpack-cli, npm package is @vishal2612200/agentpack, and the command is agentpack. This project is unrelated to AgentPack dataset papers or other repos with the same name.

What's New in 0.3.21

0.3.21 is a benchmark trust release. It keeps the current honest expanded public-suite baseline at 57.0% recall / 50.6% token precision, removes the legacy minimal-mode surface in favor of balanced, improves benchmark diagnostics and public-suite methodology, and documents the next release target: 65%+ recall while holding 50%+ token precision.

Core Workflow

1. Route

Use the read-only router when you want quick orientation without writing files:

agentpack route --task "fix auth token expiry"

2. Pack

agentpack task set "fix auth token expiry"
agentpack pack --task auto

AgentPack writes .agentpack/context.md with selected files, omitted-file receipts, task freshness, token stats, and suggested checks.

3. Agent Acts

Point the agent at the pack or use MCP tools. Agent still verifies code before editing; AgentPack is map, not correctness proof.

4. Benchmark Captures Miss

After a task, capture the files that actually changed:

agentpack benchmark capture --since main --task "fix auth token expiry"
agentpack benchmark --misses

Miss diagnostics show whether a required file was ignored, scored too low, ranked but cut by budget, or absent from scan.

Features

Route: read-only task map with relevant files, tests, rules, skills, commands, and warnings.
Pack: budgeted context with full, diff, symbols, skeleton, or summary file views.
Act: CLI, markdown, MCP, and agent integrations for Claude Code, Codex, Cursor, Windsurf, Antigravity, and generic agents.
Benchmark: expected-file recall, token precision, miss diagnostics, public commit suites, and E2E A/B reports.
Local: no cloud indexing, embeddings, or API calls for scan, summarize, rank, pack, stats, or benchmark.

Benchmark Proof

Latest published v0.3.20 release table: 8 pinned public commits from Pallets Click, ItsDangerous, and MarkupSafe, scored against files actually changed by each commit. The public manifest now also supports 100+ sampled historical commits across Python, TypeScript, Go, Java, and monorepo repos for broader release runs.

Metric	Result
Avg recall	79.2%
Avg token precision	51.2%
Pack p50	1,450 tokens
Pack p95	3,805 tokens

Full v0.3.20 table: benchmarks/results/2026-06-11-public.md. This is scoped benchmark evidence, not a universal quality claim. The expanded 109-case public suite is the current optimization baseline: 57.0% recall / 50.6% token precision. It is broader and harder than the 8-case published table, and recall remains the active improvement target. Reproduce the expanded public suite:

agentpack benchmark --public-suite --reproduce v0.3.20

Benchmark methodology lives under benchmarks/results/v0.3.20/.

Next Release Benchmark Target

The next benchmark release target is to raise the expanded public suite from the current 57.0% recall / 50.6% token precision baseline to 65%+ recall while keeping token precision at 50%+. The target should be measured on the same 100+ public historical-commit suite, with per-language slices published so precision gains are not hiding TypeScript, Go, Java, or monorepo regressions.

Decision gate for the next public table:

full-suite recall is at least 65.0%
full-suite token precision is at least 50.0%
no major language or task slice loses more than 2 recall points
Vite/TypeScript, Gin/Go, and NestJS monorepo misses are reported separately
any AgentPack-vs-no-AgentPack A/B claim includes task success, tool calls, token cost, and time-to-first-correct-file

Trust

AgentPack is MIT licensed, local-first, and uses PyPI Trusted Publishing plus npm provenance for release artifacts. See SECURITY.md, docs/privacy.md, docs/threat-model.md, and docs/data-flow.md.

Use Cases

Start with the docs index, or jump to guides for Claude Code, MCP, Cursor, token usage, AI coding agent context, and how AgentPack works.

Install

pipx install agentpack-cli
agentpack --version

Requires Python 3.10+ and is tested on Python 3.10-3.14. The PyPI package is agentpack-cli; the command is agentpack. Use pipx for normal installs because many macOS/Linux Python distributions block global pip install with PEP 668's externally-managed-environment error. If you prefer pip, install inside a virtual environment.

Install pipx first if needed:

# macOS
brew install pipx

# Ubuntu/Debian
sudo apt install pipx

# Fedora
sudo dnf install pipx

# Arch
sudo pacman -S python-pipx

pipx ensurepath

For JavaScript/TypeScript projects you can use the npm wrapper:

npx @vishal2612200/agentpack --version
npx @vishal2612200/agentpack init --yes
npx @vishal2612200/agentpack pack

Quickstart

cd your-repo
agentpack init --yes
agentpack work "fix auth token expiry"

Then read .agentpack/context.md or the agent-specific context file listed in the output. AgentPack also integrates with MCP so agents can call tools directly instead of reading markdown artifacts.

A good explicit workflow is:

agentpack task set "fix billing webhook retry handling in app/api/billing/route.ts" --guard
agentpack next --fix-all-safe
agentpack status
agentpack finish --since main

Use agentpack quickstart --task "..." --write when you want AgentPack to print the next commands and write the task file for you. Use agentpack start "..." --pack-only when you want only a fresh pack and not the guard path.

Learn from AI-assisted work

Generate local post-task learning artifacts from .agentpack/task.md and git changes:

agentpack learn
agentpack learn --today
agentpack learn --since main
agentpack learn --json
agentpack learn --llm-prompt --pr-comment
agentpack learn --provider-preview
agentpack learn --provider-command "python scripts/learn_provider.py"
agentpack learn --dashboard --team-export
agentpack learn --skills
agentpack learn --drills
agentpack learn --ci
agentpack learn --feedback helpful --feedback-target "skill:CLI design" --feedback-note "Useful review prompts"

AgentPack writes developer notes to .agentpack/learning.md or .agentpack/daily-summary.md, updates a local skill memory in .agentpack/skills-progress.json, writes ranked .agentpack/agent-lessons.md for future coding agents, and can emit .agentpack/learning.prompt.md, .agentpack/pr-learning-comment.md, .agentpack/learning-dashboard.html, or .agentpack/team-lessons.md. Learn is local-first by default: --provider-preview shows the bounded payload for optional external refinement without making a network call, --provider-command runs only the local command you provide, and feedback stays in .agentpack/learning-feedback.jsonl.

Use --dashboard when a developer wants an IDE-friendly review surface. Use --team-export when the useful lesson should be shared without publishing a personal skill history. Use --ci to fail a workflow when the generated learning is too generic or lacks changed-file evidence.

Agent Setup

AgentPack can install repo-local instructions and hooks for the coding agent you use. The installer is idempotent and merges with existing config where possible.

agentpack init --agent claude
agentpack init --agent codex
agentpack init --agent cursor
agentpack init --agent windsurf
agentpack init --agent antigravity
agentpack init --agent generic

What each integration writes:

Agent	Files
Claude Code	`CLAUDE.md`, `.claude/settings.json`, `.mcp.json`
Codex	`AGENTS.md`, `.codex/hooks.json`, git hooks
Cursor	`.cursorrules`, `.cursor/rules/agentpack.mdc`, `.vscode/tasks.json`, git hooks
Windsurf	`.windsurfrules`, `.vscode/tasks.json`, git hooks
Antigravity	`GEMINI.md`, `.vscode/tasks.json`, git hooks
Generic	no agent-specific files; use `context.md` directly

MCP-capable agents should prefer AgentPack MCP tools over reading markdown files directly. The markdown context files remain useful for manual review, CI, non-MCP agents, and logs.

Generated instructions keep thread mode explicit. They recommend AGENTPACK_THREAD_ID=<stable-id> agentpack ... --thread auto when you want scoped state, but plain installed hooks continue using global .agentpack/task.md and .agentpack/context.md.

Configuration Snapshot

agentpack init creates .agentpack/config.toml. Most projects can use the defaults:

[context]
default_mode = "balanced"
default_budget = 40000

[context_lite]
budget = 8000

[agents.generic]
output = ".agentpack/context.md"

Use agentpack pack --mode lite when you want a cheap ranked map before deeper file reads. Use the default balanced mode for normal agent work and benchmark claims. Use deep when the task needs broader docs and source context.

Use .agentignore to remove generated output, vendored code, large exports, or files that repeatedly appear as ranking noise. AgentPack imports obvious generated/noisy entries from gitignore sources during init, but repository-specific outputs should still be added by hand.

Use scoring weights only after measuring a real miss:

agentpack benchmark --misses
agentpack explain --file path/to/missed_file.py
agentpack diagnose-selection
agentpack ignore suggest

Configuration detail: docs/configuration.md.

Common Workflows

Debug Selection

agentpack explain --task auto
agentpack explain --file src/auth/session.py
agentpack explain --omitted
agentpack stats
agentpack diagnose-selection

MCP-First Agent Flow

Call start_task(task) when a new task begins. AgentPack writes .agentpack/task.md, packs context, and returns ranked markdown.
Call get_context() when you need the latest pack. It blocks for one refresh if .agentpack/task.md or the repo snapshot changed since the last pack.
Use route_task(task) for a lightweight route: relevant files, rules, skills, commands, and safety warnings without writing a full context file.

Multiple Agent Threads

Plain agentpack pack, agentpack status, and agentpack guard keep legacy global behavior and ignore ambient host thread env vars. Use thread mode only when you explicitly ask for it:

agentpack pack --thread codex-local
AGENTPACK_THREAD_ID=codex-local agentpack pack --thread auto
agentpack threads --active
agentpack state show --thread codex-local

Thread mode writes under .agentpack/threads/<id>/ and appends .agentpack/thread_index.jsonl. Same-worktree, same-branch overlap is warning-only; separate worktrees or branches remain the safest workflow for concurrent edits.

Release Validation

agentpack dev-check
agentpack release-check
agentpack verify-wheel
agentpack release prepare
agentpack ci init

The Makefile remains maintainer convenience, but the CLI commands above are the package-user source of truth. agentpack benchmark --release-gate runs the public proof path: public repos, proved target files, misses, and public table output. --sample-fixtures remains a regression smoke path, not the release gate.

Commands

Command	Purpose
`agentpack init --yes`	Create local config and ignore files
`agentpack work "task"`	Initialize if needed, start task, refresh context, show next steps
`agentpack start "task"`	Write task and run the guard/refresh workflow
`agentpack finish --since main`	Diagnose, capture benchmark case, run checks, mark done
`agentpack learn`	Generate developer learning notes, skill memory, feedback-aware drills, and future-agent lessons
`agentpack task show	set
`agentpack pack`	Generate a ranked context pack for `.agentpack/task.md`
`agentpack next --fix-all-safe`	Ask AgentPack what command or safe repair should happen next
`agentpack guard --repair-stale --refresh-context`	Check freshness, repair stale rules, refresh context
`agentpack status`	Show context freshness and git/task state
`agentpack stats`	Show pack size, token savings, and top files
`agentpack dashboard`	Local HTML control plane for context, skills, learning, and benchmark quality
`agentpack explain --task auto`	Debug selected and omitted files
`agentpack diagnose-selection`	Turn latest pack/benchmark signals into concrete tuning actions
`agentpack ignore suggest	apply`
`agentpack route --task "..."`	Get lightweight task routing without a full context file
`agentpack threads [--active] [--conflicts]`	Inspect thread-scoped context state
`agentpack state show	set
`agentpack benchmark capture --since <ref>`	Add a benchmark case from changed files
`agentpack benchmark --release-gate`	Run public benchmark evidence path
`agentpack dev-check`	Run docs, lint, pytest, and npm wrapper checks
`agentpack verify-wheel`	Install built wheel in a temp venv and run benchmark gate
`agentpack release-check`	Run version, changelog, tests, build, and benchmark checks
`agentpack release prepare`	Run full release prep, public table, and wheel verification
`agentpack ci init`	Generate a GitHub Actions workflow for AgentPack checks

Full command reference: docs/commands.md.

Make Shortcuts

Run make help for the current list. The main targets are:

Target	Wraps
`make test`	`pytest -q`
`make lint`	`python -m ruff check src tests`
`make npm-test`	npm wrapper/version tests
`make docs-check`	README/docs link smoke + `git diff --check`
`make benchmark`	`agentpack benchmark --release-gate --no-public-table`
`make benchmark-publish`	`agentpack benchmark --release-gate`
`make release-fast`	`agentpack release-check --skip-benchmark --skip-build`
`make release`	`agentpack release-check`
`make verify-wheel`	build wheel, install in temp venv, run benchmark gate

make context uses legacy global context. For scoped context, run:

THREAD=codex-local make context-thread
AGENTPACK_THREAD_ID=codex-local make context-thread

What A Pack Contains

Rendered packs are meant to be readable by humans and directly useful to agents. A typical pack includes:

freshness metadata with task source, generated time, git branch, SHA, and snapshot hash
execution state with task status, checklist counts, git dirty/ahead/behind counts, and Docker/Compose availability
concurrent-context warnings when another active thread overlaps files on the same branch and worktree
token stats and largest token consumers
semantic repo map grouped by subsystem
selected-file table with include mode and reason
compressed context receipts showing why files were included or excluded
file context in full, diff, symbols, skeleton, or summary mode

The pack is a starting map. Agents should still verify the selected files against actual code before editing.

Local Files

AgentPack writes local artifacts under .agentpack/:

Path	Purpose
`.agentpack/task.md`	current global task summary
`.agentpack/task_state.md`	optional global execution state
`.agentpack/context.md`	generic/Codex/Cursor/Windsurf fallback context
`.agentpack/context.claude.md`	Claude-flavored fallback context
`.agentpack/learning.md`	local post-task developer learning notes
`.agentpack/daily-summary.md`	local daily learning rollup from `agentpack learn --today`
`.agentpack/skills-progress.json`	local skill evidence map from task work
`.agentpack/agent-lessons.md`	compact repo-specific lessons injected into future packs
`.agentpack/learning.prompt.md`	optional source-backed prompt for external LLM refinement
`.agentpack/pr-learning-comment.md`	optional PR-comment-ready learning summary
`.agentpack/learning-dashboard.html`	optional static dashboard from `agentpack learn --dashboard`
`.agentpack/dashboard.html`	local project dashboard from `agentpack dashboard`
`.agentpack/team-lessons.md`	optional shared lesson export from `agentpack learn --team-export`
`.agentpack/learning-feedback.jsonl`	optional local helpful/not-helpful feedback records
`.agentpack/pack_metadata.json`	freshness and pack metadata
`.agentpack/cache/`	offline file summaries keyed by hash
`.agentpack/snapshots/`	repo snapshot hashes
`.agentpack/thread_index.jsonl`	append-only thread activity index
`.agentpack/threads/<id>/`	scoped task, state, context, and metadata for explicit thread mode

Generated context, cache, snapshots, and thread state are local operational files. The root .agentpack/config.toml and .agentignore are usually worth committing; generated context artifacts usually are not.

Troubleshooting

If AgentPack selects too much:

agentpack diagnose-selection
agentpack stats
agentpack explain --omitted
agentpack explain --budget-plan
agentpack ignore suggest

If a required file is missing:

agentpack explain --file path/to/file.py
agentpack benchmark --misses
agentpack benchmark capture --since HEAD~1 --task "describe the task"

If context looks stale:

agentpack status --deep
agentpack guard --agent auto --repair-stale --refresh-context

If multiple agents are editing the same project:

agentpack threads --active
agentpack threads --conflicts
agentpack state show --thread codex-local

If package release checks fail:

agentpack release-check --json
agentpack release-check --skip-benchmark

Use the skip flags only for local iteration. The final release proof should include the build and public benchmark gate.

Documentation

docs/architecture.md: pipeline, data flow, package layout, thread/execution state, rendered-budget accounting.
docs/commands.md: full CLI command reference.
docs/configuration.md: config, scoring weights, .agentignore, and git integration.
docs/integrations.md: agent setup, MCP workflow, hooks, and native integration status.
docs/benchmarking.md: quality bar, release gate, sample fixtures, and public artifacts.
docs/development.md: local development, release checklist, naming/ranking, and contribution notes.
docs/limitations.md: project scope, honest token framing, known limitations, and roadmap.

Limitations

AgentPack prepares context; it does not edit code, run tests for the agent, or prove correctness.
Ranking is deterministic and local, but it can still miss files. Use agentpack explain, agentpack benchmark --misses, and normal code review.
Thread coordination is warning-based, not locking-based. It helps agents notice collisions but does not enforce ownership.
Language support is strongest for Python and JavaScript/TypeScript. Other languages still benefit from filenames, git signals, and summaries.
Token estimates are approximate, even with rendered-budget accounting. Treat them as operational guidance, not provider billing truth.

More detail: docs/limitations.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 299 Commits
.claude		.claude
.github		.github
benchmarks		benchmarks
docs		docs
native-integrations		native-integrations
npm		npm
src/agentpack		src/agentpack
tests		tests
.agentignore		.agentignore
.gitignore		.gitignore
.mailmap		.mailmap
.mcp.json		.mcp.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
llms-full.txt		llms-full.txt
llms.txt		llms.txt
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

AgentPack

What's New in 0.3.21

Core Workflow

1. Route

2. Pack

3. Agent Acts

4. Benchmark Captures Miss

Features

Benchmark Proof

Next Release Benchmark Target

Trust

Use Cases

Install

Quickstart

Learn from AI-assisted work

Agent Setup

Configuration Snapshot

Common Workflows

Debug Selection

MCP-First Agent Flow

Multiple Agent Threads

Release Validation

Commands

Make Shortcuts

What A Pack Contains

Local Files

Troubleshooting

Documentation

Limitations

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages