Run AI coding agents inside OS-level containment.
Open-source containment for Claude, Codex, OpenCode, Antigravity, Hermes, Qwen, Cursor Agent, Pi, and shell loops
Hazmat lets an agent edit your repo without running as your real system user account.
It shows the session contract first, then launches the agent inside OS-level containment. Your project can be writable. Your real home directory, SSH keys, cloud credentials, and global dev state do not become the agent's default world.
Use it when you want the productive version of Claude Code, Codex, OpenCode, or a custom loop: fewer approval interruptions, more autonomy, and a smaller host blast radius.
The loop is simple: preview what the agent can reach, run it, then inspect what changed.
Fast path for Claude Code on macOS. The dry run previews setup; it does not launch an agent.
brew install dredozubov/tap/hazmat
hazmat init --dry-run
hazmat init --bootstrap-agent claude
cd your-project
hazmat explain -C .
hazmat claudeReplace the last command for another supported harness:
hazmat codex
hazmat opencode
hazmat antigravity
hazmat exec -- ./my-agent-loop.shAfter a session:
hazmat diff
hazmat snapshots
hazmat restoreUsing Hermes, Qwen, Cursor Agent, Pi, or a custom script instead? Start with docs/harnesses.md.
On the first real run, Hazmat gives you:
- A separate user: the agent runs as
agent, not as your login account. - A readable session contract: project access, integrations, network mode, and service access are printed before launch.
- Scoped filesystem access: project writes are explicit; common secret paths are denied.
- Network hardening:
pffirewall rules and DNS blocks reduce common exfiltration paths. - Recovery evidence: Hazmat snapshots before launch so you can diff and restore after the agent exits.
- The same agent workflow: use Claude, Codex, OpenCode, Antigravity, Hermes,
Qwen, Cursor Agent, Pi, or
hazmat exec.
Recent shipped work is in CHANGELOG.md. Read docs/testing.md for what is automated, approval-gated, and formally modeled. Read docs/overview.md before stretching native containment into Docker-heavy or hostile-repo workflows.
Hazmat is a fit when:
- you run agents with broad permissions, long tasks, or low supervision
- you want repo writes without handing over your real system user account
- you need a contract you can review before the agent starts
- you want a diff and rollback path after autonomous edits
It is not a magic prompt shield. If an agent gets prompt-injected, runs a poisoned dependency, or follows malicious repo instructions, the useful question is what the process can reach. Hazmat lowers that authority boundary.
Every session starts with a contract. No hidden widening, no vague "secure mode" label.
hazmat: session
Mode: Native containment
Why this mode: using native containment by default (Docker routing: none)
Project (read-write): /Users/dr/workspace/my-app
Integrations: go
Auto read-only: /Users/dr/go/pkg/mod
Read-only extensions: none
Read-write extensions: none
Service access: none
Pre-session snapshot: on
Snapshot excludes: vendor/
This is the screen to read before you let an agent run. It tells you what can change, what is read-only, and what is not available in the session.
Want proof before using it on a real repo? Run the small path: preview setup, preview a session, write one contained file, block one host-secret read, then inspect the diff.
hazmat init --dry-run
PROJECT=/tmp/hazmat-proof-demo
mkdir -p "$PROJECT"
printf '%s\n' '# Hazmat proof demo' >"$PROJECT/README.md"
hazmat explain -C "$PROJECT"
SECRET="${HAZMAT_PROOF_STACK_SECRET_PATH:-$HOME/.ssh/id_ed25519}"
test -r "$SECRET"
hazmat exec --docker=none --network none -C "$PROJECT" -- \
/bin/sh -eu -c 'printf "%s\n" contained >proof.txt; ! cat "$1" >/dev/null 2>&1' sh "$SECRET"
(cd "$PROJECT" && hazmat diff)The live README smoke captures the same shape without printing secret bytes. In the contained session, the project write succeeds and the host secret probe fails closed:
proof-stack: project write ok
proof-stack: host secret unreadable from contained session: <host-secret-fixture>
The recovery pointer comes from the same demo repo:
Comparing <scratch-project> against snapshot from <timestamp>
Only in <scratch-project>: proof.txt
The saved redacted snippets are in docs/proofs. If Claude is not your harness, use the same containment path through docs/harnesses.md. For project health and breadth, see docs/compatibility.md, docs/recipes/README.md, docs/public-roadmap.md, and docs/testing.md.
Approval prompts are workflow controls, not an authority boundary. If a process can read secrets, modify global state, or call out to the network, a bad instruction can use that authority.
That is not theoretical:
- Agents actively reason about escaping. Ona showed Claude Code bypassing its own denylist via
/proc/self/root, then trying to disable bubblewrap when that path was closed. - The CVEs are not hypothetical. Hazmat tracks 16 Claude Code CVEs, including CVE-2025-59536 and CVE-2026-21852.
- Supply chain attacks are fast enough to beat human supervision. The 2026 axios compromise delivered a RAT through a
postinstallhook in about two seconds.
The design goal is not "make the agent behave." The design goal is "make autonomous failure less catastrophic."
hazmat claude
hazmat codex
hazmat opencode
hazmat antigravity
hazmat hermes
hazmat qwen
hazmat cursor-agent
hazmat pi -- --mode rpc
hazmat exec ./my-agent-loop.sh
hazmat shell| Layer | What it does |
|---|---|
| User isolation | Runs the agent as a dedicated agent macOS user, so your real home directory is structurally out of reach |
| Kernel sandbox | Generates a per-session seatbelt policy with explicit read-write and read-only scope |
| Credential deny | Blocks common secret paths at the kernel level, including credential paths inside agent home |
| Network firewall | Uses pf to block common exfiltration and tunneling protocols |
| DNS blocklist | Redirects known tunnel, paste, and capture domains to localhost |
| Supply chain hardening | Applies conservative defaults such as npm ignore-scripts=true |
| Snapshots and restore | Takes a pre-session Kopia snapshot so you can diff or roll back |
Current state, not aspirational state:
- macOS agent-user native containment is the default supported path. Hazmat ships release artifacts for
darwin/arm64anddarwin/amd64. - macOS current-user Seatbelt launch is experimental.
HAZMAT_EXPERIMENTAL_MACOS_CURRENT_USER=1 hazmat exec --provider=macos-current-user ...runs same-UID contract sandboxing without the agent-user setup path; it is weaker than the dedicated-user boundary and is exec-only. - Eight harnesses are supported in containment. Claude Code, Codex, OpenCode, Antigravity, Hermes, Qwen Code, Cursor Agent, and Pi. Details, tested versions, auth flows, and Phase 1 limits live in docs/harnesses.md.
- Linux native support is plan-only. Hazmat has Linux platform probes, a
linux-nativelaunch-spec compiler, Linux package tests, and Linux CI coverage. It does not yet have the native Linux launch helper, setup, rollback, or release artifacts needed to run agents directly on Linux. - Experimental Apple Container exec exists for Linux workloads on macOS.
HAZMAT_EXPERIMENTAL_APPLE_CONTAINER=1 hazmat exec --backend=apple-container --image ... -- <command>runs a command in a short-lived Linux VM on macOS 26 Apple silicon. It is exec-only, gated, and uses Hazmat-planned mounts; host file IO still occurs as the invoking macOS user, so it is not the same boundary as native dedicated-user containment. - Harness lifecycle is managed.
hazmat harness status|update|uninstallinspects agent-owned harness code, refreshes it through the bootstrap paths, and removes Hazmat-owned code artifacts without deleting auth/profile/session state by default. - Docker support is real, but selective. Private-daemon Docker workflows can use Docker Sandbox mode through every harness entrypoint, plus
hazmat shellandhazmat exec. Shared host-daemon workflows stay code-only by default. See docs/tier3-docker-sandboxes.md and docs/shared-daemon-projects.md. - 27 built-in stack integrations. Full table in docs/STACKS.md; schema and trust-model rules in docs/integrations.md. Quick groupings:
- Python:
python-uv,python-pip,python-poetry. JS/TS:node,pnpm,yarn,bun,deno. - JVM and mobile:
java-gradle,java-maven,tla-java,android-gradle,swift,flutter. - Systems:
go,rust,cmake,haskell-cabal,elixir-mix,ruby-bundler,php-composer,dotnet. - Infra and build:
docker,kubernetes-render(render/lint only),terraform-plan,opentofu-plan,beads.
- Python:
- Repo-local Git hooks have a Hazmat-managed approval path. Repos can declare
pre-commit,commit-msg, andpre-pushin.hazmat/hooks/hooks.yaml; approval, install, drift review, and uninstall flow throughhazmat hooks .... - Core behavior is tested and partially formally verified. The exact proof boundary is explicit in tla/VERIFIED.md. If something is not listed there, do not assume a proof exists.
Hazmat is useful because the boundaries are concrete. That also means the limitations should be concrete.
- No Linux-native launch yet. Linux support is real but bounded: plan/spec generation, platform reporting, tests, and experimental Apple Container exec. Native Linux agent launch is still blocked on the launch helper plus setup/rollback model work. See docs/testing.md.
- This is not a total network allowlist. HTTPS exfiltration to a brand-new domain is still not fully solved by Tier 2. See docs/threat-matrix.md.
- The DNS blocklist is exact-domain, not wildcard. It is based on
/etc/hosts, not a full DNS filtering stack. See docs/design-assumptions.md. - Shared
/tmpstays shared. Hazmat does not pretend macOS temp space suddenly became private. - MCP env inheritance and
SSH_AUTH_SOCKabuse are still category-wide problems. Some of the hardest issues here are operational, not just architectural. They are called out directly in docs/threat-matrix.md.
If you are dealing with hostile repos, long unattended runs, or shared-daemon Docker workflows, the honest answer may be Tier 4, not stretching Tier 2 past what it does well. Start with docs/overview.md.
I want community help here, but I do not want to pretend every part of Hazmat is equally easy or equally safe to crowdsource.
- Integrations and stack coverage - new manifests, detection fixes, better snapshot excludes, compatibility reports
- Harness usability - bootstrap friction, auth/import bugs, first-run UX, docs for real setups
- Docs and onboarding - quickstart clarity, explain-mode examples, screenshots, diagrams, troubleshooting
- Research and evidence - CVE tracking, incident writeups, comparative analysis, drift checks
- Test matrix expansion - real repo validation, macOS version coverage, harness regression repros
- Seatbelt policy changes
pffirewall behavior- setup / rollback ordering
- credential delivery and capability brokering
- anything covered by the TLA+ governance rules
If you want to contribute, CONTRIBUTING.md is the starting point. If you want to understand which claims are modeled versus just tested, read tla/VERIFIED.md and docs/design-assumptions.md first.
# Claude Code
hazmat claude
hazmat claude -p "refactor the auth module"
# Other supported harnesses
hazmat codex
hazmat opencode
hazmat antigravity
hazmat hermes
hazmat qwen
hazmat cursor-agent
hazmat pi -- --mode rpc
# Harness lifecycle
hazmat harness status
hazmat harness update codex
hazmat harness uninstall codex --dry-run
# Any command in containment
hazmat exec -- make test
hazmat exec -- /bin/zsh -lc 'uv run pytest -q'
# Interactive shell
hazmat shell
# Review and recovery
hazmat diff
hazmat snapshots
hazmat restoreYou can expose more paths explicitly when you need them:
hazmat claude -R ~/reference-docs
hazmat claude -W ~/.venvs/my-app
hazmat config access add -C ~/workspace/my-app --read ~/reference-docs --write ~/.venvs/my-appAnd if the repo needs integration hints:
hazmat integration list
hazmat integration show node
hazmat claude --integration node
hazmat config set integrations.pin "~/workspace/my-app:node,go"
# Repo-local Git hooks
hazmat hooks status
hazmat hooks install
hazmat hooks review
hazmat hooks uninstallFor the full module graph, invariants, and data/user-flow diagrams, read docs/architecture.md.
You (dr) Agent (agent)
-------- -------------
~/ /Users/agent/
~/.ssh, ~/.aws <- denied -> ~/.claude/
~/workspace/ <- shared -> ~/workspace/ (symlink)
hazmat claude
|
|- snapshot project (Kopia)
|- generate per-session seatbelt policy
|- sudo -u agent hazmat-launch <policy>
| |- apply sandbox-exec
| `- exec harness
|
`- pf firewall already active
The important property is structural separation. The agent is not "forbidden from reading your SSH key while still running as you." It runs as a different user entirely.
| Doc | Why you would read it |
|---|---|
| docs/usage.md | Full user guide once you are past the first session |
| docs/overview.md | Which tier to use, and when |
| docs/architecture.md | Module graph, authority pipeline, invariants, and data/user-flow diagrams |
| docs/threat-matrix.md | Risk-by-risk coverage and documented caveats |
| docs/harnesses.md | Harness setup matrix for Claude, Codex, OpenCode, Antigravity, Hermes, Qwen, Cursor Agent, and Pi |
| docs/integrations.md | How integrations work, and what they are not allowed to do |
| docs/integration-contributor-flow.md | How users discover integrations and turn missing stack support into PR-shaped work |
| docs/integration-author-kit.md | How to propose integrations without turning them into policy escapes |
| docs/community.md | Support tiers, ownership model, sponsor lanes, and contribution surfaces |
| docs/discussions.md | GitHub Discussions categories, routing rules, and security-report boundaries |
| docs/incident-to-control-bulletin.md | Repeatable incident-to-control bulletin format and first control mapping |
| docs/public-roadmap.md | Curated public roadmap exported from beads issues |
| docs/compatibility.md | Compatibility status meanings, matrix shape, and reporting flow |
| docs/recipes/README.md | Community-expandable recipes for common harness + stack workflows |
| docs/testing.md | What is tested locally, in CI, and in destructive VM-backed flows |
| docs/git-hooks.md | Why Hazmat's repo-local hook flow is stricter than plain Git hooks |
| docs/manual-testing.md | Human-driven verification checklist (run before releases / after harness or seatbelt changes) |
| docs/design-assumptions.md | Non-obvious design decisions and known tradeoffs |
| docs/cve-audit.md | How Hazmat maps against known Claude Code CVEs |
| tla/VERIFIED.md | Exact formal verification scope and governance rules |
How I Made --dangerously-skip-permissions Safe in Claude Code
If you find a containment bypass, credential leak, sandbox escape, or other security issue, please use the private reporting path in SECURITY.md.
MIT
The Simpsons and all related characters are property of 20th Television and The Walt Disney Company. The Claude logo is property of Anthropic. We do not claim any rights to these properties. Their use here is purely for entertainment purposes.