A production support intelligence system where Claude operates inside a strict tool sandbox — it can only call explicitly permitted tools against live infrastructure, with credential isolation enforced at every boundary.
This is not a RAG chatbot. Claude has no access to a document store. Instead, it executes live tools — account state lookups, policy evaluations, audit trail fetches — and reasons over the results in real time.
Most AI support systems work like this:
user question → embed → retrieve docs → LLM generates answer
This system works like this:
user question → Claude decides which tool to call → tool executes against live infra → Claude reasons over result → answer
The difference: answers reflect current system state, not stale documentation.
Claude is given exactly three tools. Nothing more.
| Tool | What it can access | Credential scope |
|---|---|---|
lookup_account |
Account state + assignment history | MongoDB read-only |
lookup_policy |
Live policy evaluation for an account | AWS read-only (org profile) |
get_audit_trail |
Recent activity inside an account | STS assume-role → CloudTrail read |
Claude cannot write to any system. It cannot assume arbitrary roles. Each tool has its own credential scope — compromise of one tool's credentials doesn't expose the others.
┌─────────────────────────────────────────────────────┐
│ Support Chat UI │
│ WebSocket · Firebase Auth │
└────────────────────┬────────────────────────────────┘
│ authenticated WS session
▼
┌─────────────────────────────────────────────────────┐
│ FastAPI Backend │
│ │
│ ┌─────────────┐ ┌──────────────────────────┐ │
│ │ Team Router│ │ Loopback Guard │ │
│ │ JWT team │ │ internal routes: 127.x │ │
│ │ ↔ path │ │ only │ │
│ │ validation │ └──────────────────────────┘ │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ Claude Sandbox │ │
│ │ │ │
│ │ system prompt (role + constraints) │ │
│ │ tool definitions (3 tools only) │ │
│ │ streaming responses → WS │ │
│ └──────────────┬──────────────────────────────┘ │
│ │ │
│ ┌────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ [lookup_ [lookup_ [get_audit_ │
│ account] policy] trail] │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ MongoDB AWS Org API STS assume-role │
│ (read) (read) → CloudTrail │
└─────────────────────────────────────────────────────┘
The agent gets smarter per-user over time without retraining — through a two-path memory pipeline:
PATH A — Recall (warm path)
- Previous observations for this user retrieved from memory
- Cohort graph consulted for peer patterns
- Merged into context before Claude responds
PATH B — Reflect (cold path, sparse observations)
- Claude reflects on raw signals to generate initial observations
- Used when a user is new or observations are too thin to trust
After every interaction, new signals are written back to memory asynchronously — deduplication and consolidation handled natively by the memory backend.
Firebase ID token
→ decoded + verified server-side
→ email looked up in allowlist (Firestore)
→ role (admin / support) + team (aws / azure / gcp) extracted
→ path team validated against JWT team claim
→ request proceeds or 403
A support engineer with team: aws cannot reach /api/azure/support/* even with a valid token.
# Install dependencies
pip install -r requirements.txt
# Set environment variables (see .env.example)
cp .env.example .env
# fill in ANTHROPIC_API_KEY, FIREBASE credentials
# Start backend
uvicorn server:app --port 8001 --reload
# Chat UI runs separately on :3006
# Admin UI runs separately on :3005Why three tools and not more? Blast radius control. Each tool added to the sandbox is a new attack surface. Three tools covering account state, policy, and audit trail answer 95% of support questions.
Why WebSocket and not HTTP? Claude's tool calls are streamed in real time — the support engineer sees Claude thinking, not just the final answer. This matters for trust: engineers can interrupt if Claude is going down the wrong path.
Why path-based team routing? A single backend serves multiple platform teams (AWS, Azure, GCP). Path routing means each team's URL space is isolated at the network level, not just the application level.
Why credential isolation per tool? If the policy lookup tool's read-only AWS credentials are leaked, the attacker cannot use them to read MongoDB or assume arbitrary roles. Each tool has the minimum credential needed for its job.
claude-sandboxed-agent/
├── server.py # FastAPI app, CORS, loopback guard, router registration
├── agent/
│ ├── sandbox.py # Claude client, tool binding, streaming
│ └── prompts.py # System prompt templates
├── tools/
│ ├── lookup_account.py # MongoDB account state tool
│ ├── lookup_policy.py # AWS policy evaluation tool
│ └── get_audit_trail.py # CloudTrail fetch tool
├── memory/
│ ├── pipeline.py # PATH A / PATH B routing
│ └── ingest.py # Async write-back + consolidation scheduler
├── chat/
│ └── router.py # WebSocket handler, Firebase auth, team routing
├── docs/
│ └── architecture.md # Deep-dive on sandbox boundary decisions
├── .env.example
└── README.md
Distilled from a production system governing cloud lab accounts across thousands of learners — validated across multiple platform teams (AWS, Azure, GCP).
- Memory pipeline:
mem0-pipeline - Multi-agent patterns:
ai-sentinel-ecosystem