claude-sandboxed-agent

A production support intelligence system where Claude operates inside a strict tool sandbox — it can only call explicitly permitted tools against live infrastructure, with credential isolation enforced at every boundary.

This is not a RAG chatbot. Claude has no access to a document store. Instead, it executes live tools — account state lookups, policy evaluations, audit trail fetches — and reasons over the results in real time.

The Core Idea

Most AI support systems work like this:

user question → embed → retrieve docs → LLM generates answer

This system works like this:

user question → Claude decides which tool to call → tool executes against live infra → Claude reasons over result → answer

The difference: answers reflect current system state, not stale documentation.

Sandbox Isolation

Claude is given exactly three tools. Nothing more.

Tool	What it can access	Credential scope
`lookup_account`	Account state + assignment history	MongoDB read-only
`lookup_policy`	Live policy evaluation for an account	AWS read-only (org profile)
`get_audit_trail`	Recent activity inside an account	STS assume-role → CloudTrail read

Claude cannot write to any system. It cannot assume arbitrary roles. Each tool has its own credential scope — compromise of one tool's credentials doesn't expose the others.

Architecture

┌─────────────────────────────────────────────────────┐
│                  Support Chat UI                     │
│            WebSocket  ·  Firebase Auth               │
└────────────────────┬────────────────────────────────┘
                     │ authenticated WS session
                     ▼
┌─────────────────────────────────────────────────────┐
│               FastAPI Backend                        │
│                                                     │
│  ┌─────────────┐    ┌──────────────────────────┐   │
│  │  Team Router│    │   Loopback Guard          │   │
│  │  JWT team   │    │   internal routes: 127.x  │   │
│  │  ↔ path     │    │   only                    │   │
│  │  validation │    └──────────────────────────┘   │
│  └──────┬──────┘                                    │
│         │                                           │
│         ▼                                           │
│  ┌─────────────────────────────────────────────┐   │
│  │           Claude Sandbox                     │   │
│  │                                             │   │
│  │   system prompt (role + constraints)        │   │
│  │   tool definitions (3 tools only)           │   │
│  │   streaming responses → WS                  │   │
│  └──────────────┬──────────────────────────────┘   │
│                 │                                   │
│    ┌────────────┼──────────────┐                   │
│    ▼            ▼              ▼                   │
│  [lookup_    [lookup_      [get_audit_             │
│  account]    policy]       trail]                  │
│    │            │              │                   │
│    ▼            ▼              ▼                   │
│  MongoDB    AWS Org API    STS assume-role         │
│  (read)     (read)         → CloudTrail            │
└─────────────────────────────────────────────────────┘

Memory Layer

The agent gets smarter per-user over time without retraining — through a two-path memory pipeline:

PATH A — Recall (warm path)

Previous observations for this user retrieved from memory
Cohort graph consulted for peer patterns
Merged into context before Claude responds

PATH B — Reflect (cold path, sparse observations)

Claude reflects on raw signals to generate initial observations
Used when a user is new or observations are too thin to trust

After every interaction, new signals are written back to memory asynchronously — deduplication and consolidation handled natively by the memory backend.

Auth Model

Firebase ID token
    → decoded + verified server-side
    → email looked up in allowlist (Firestore)
    → role (admin / support) + team (aws / azure / gcp) extracted
    → path team validated against JWT team claim
    → request proceeds or 403

A support engineer with team: aws cannot reach /api/azure/support/* even with a valid token.

Running Locally

# Install dependencies
pip install -r requirements.txt

# Set environment variables (see .env.example)
cp .env.example .env
# fill in ANTHROPIC_API_KEY, FIREBASE credentials

# Start backend
uvicorn server:app --port 8001 --reload

# Chat UI runs separately on :3006
# Admin UI runs separately on :3005

Key Design Decisions

Why three tools and not more? Blast radius control. Each tool added to the sandbox is a new attack surface. Three tools covering account state, policy, and audit trail answer 95% of support questions.

Why WebSocket and not HTTP? Claude's tool calls are streamed in real time — the support engineer sees Claude thinking, not just the final answer. This matters for trust: engineers can interrupt if Claude is going down the wrong path.

Why path-based team routing? A single backend serves multiple platform teams (AWS, Azure, GCP). Path routing means each team's URL space is isolated at the network level, not just the application level.

Why credential isolation per tool? If the policy lookup tool's read-only AWS credentials are leaked, the attacker cannot use them to read MongoDB or assume arbitrary roles. Each tool has the minimum credential needed for its job.

Project Structure

claude-sandboxed-agent/
├── server.py              # FastAPI app, CORS, loopback guard, router registration
├── agent/
│   ├── sandbox.py         # Claude client, tool binding, streaming
│   └── prompts.py         # System prompt templates
├── tools/
│   ├── lookup_account.py  # MongoDB account state tool
│   ├── lookup_policy.py   # AWS policy evaluation tool
│   └── get_audit_trail.py # CloudTrail fetch tool
├── memory/
│   ├── pipeline.py        # PATH A / PATH B routing
│   └── ingest.py          # Async write-back + consolidation scheduler
├── chat/
│   └── router.py          # WebSocket handler, Firebase auth, team routing
├── docs/
│   └── architecture.md    # Deep-dive on sandbox boundary decisions
├── .env.example
└── README.md

Related Work

Distilled from a production system governing cloud lab accounts across thousands of learners — validated across multiple platform teams (AWS, Azure, GCP).

Memory pipeline: mem0-pipeline
Multi-agent patterns: ai-sentinel-ecosystem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

claude-sandboxed-agent

The Core Idea

Sandbox Isolation

Architecture

Memory Layer

Auth Model

Running Locally

Key Design Decisions

Project Structure

Related Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent		agent
chat		chat
docs		docs
memory		memory
tools		tools
.env.example		.env.example
README.md		README.md
requirements.txt		requirements.txt
server.py		server.py

Folders and files

Latest commit

History

Repository files navigation

claude-sandboxed-agent

The Core Idea

Sandbox Isolation

Architecture

Memory Layer

Auth Model

Running Locally

Key Design Decisions

Project Structure

Related Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages