Production-ready Python sandbox with PostgreSQL-backed virtual filesystem and document processing helpers for LangGraph agents.
Mayflower Sandbox provides secure, isolated Python code execution with persistent file storage, designed for LangChain and LangGraph applications. Execute untrusted Python code, process documents (Word, Excel, PowerPoint, PDF), and maintain persistent state across sessions—all with complete thread isolation.
- ✅ Secure Python Execution - Pyodide WebAssembly sandbox with configurable network access
- ✅ Persistent Virtual Filesystem - PostgreSQL-backed storage (20MB file limit per file)
- ✅ Document Processing Helpers - Built-in helpers for Word, Excel, PowerPoint, and PDF
- ✅ Stateful Execution - Variables and state persist across executions and restarts
- ✅ Thread Isolation - Complete isolation between users/sessions via
thread_id - ✅ LangChain Integration - All tools extend
BaseToolfor seamless LangGraph integration - ✅ HITL Support - Human-in-the-Loop approval for destructive operations (CopilotKit integration)
- ✅ HTTP File Server - Download files via REST API
- ✅ Automatic Cleanup - Configurable session expiration (180 days default)
# Install Deno (required for Pyodide)
curl -fsSL https://deno.land/x/install/install.sh | sh
# Install package
pip install -e .
# Setup database
createdb mayflower_test
psql -d mayflower_test -f migrations/001_sandbox_schema.sqlSee Installation Guide for detailed setup instructions.
import asyncpg
from mayflower_sandbox.tools import create_sandbox_tools
from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic
# Setup database
db_pool = await asyncpg.create_pool(
host="localhost",
database="mayflower_test",
user="postgres",
password="postgres"
)
# Create tools for a specific user
tools = create_sandbox_tools(db_pool, thread_id="user_123")
# Create LangGraph agent
llm = ChatAnthropic(model="claude-sonnet-4.5")
agent = create_react_agent(llm, tools)
# Use the agent
result = await agent.ainvoke({
"messages": [("user", "Create a CSV file and calculate the sum")]
})See Quick Start Guide for a complete tutorial.
- Installation Guide - Install and configure Mayflower Sandbox
- Quick Start - Get started in 5 minutes
- Examples - Complete working examples
- Tools Reference - Documentation for the 10 LangChain tools
- Helpers Reference - Document processing helpers (Word, Excel, PowerPoint, PDF)
- HITL Guide - Human-in-the-Loop approval for destructive operations
- Advanced Features - Stateful execution, file server, cleanup
- API Reference - Low-level API documentation
┌─────────────────────────────────────────────────────┐
│ LangGraph Agent │
│ ├─ ExecutePythonTool (direct code execution) │
│ ├─ RunPythonFileTool (run existing .py files) │
│ ├─ ExecuteCodeTool (state-based for large code) │
│ ├─ FileReadTool │
│ ├─ FileWriteTool │
│ ├─ FileEditTool (str_replace) │
│ ├─ FileListTool │
│ ├─ FileDeleteTool │
│ ├─ FileGlobTool (glob_files) │
│ └─ FileGrepTool (grep_files) │
└──────────────────┬──────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────┐
│ Mayflower Sandbox │
│ ├─ SandboxExecutor (VFS + Pyodide integration) │
│ ├─ VirtualFilesystem (PostgreSQL storage) │
│ ├─ Helper Modules (auto-loaded into VFS) │
│ ├─ SandboxManager (Session lifecycle) │
│ └─ CleanupJob (Automatic expiration) │
└──────────────────┬──────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────┐
│ Infrastructure │
│ ├─ PostgreSQL (Persistent storage) │
│ └─ Deno + Pyodide (Python execution) │
└─────────────────────────────────────────────────────┘
Mayflower Sandbox provides 10 LangChain tools:
-
ExecutePythonTool (
python_run) - Execute Python code directly via tool parameter- Best for: Small code snippets, simple calculations, quick operations
- Code passed as tool parameter (subject to serialization limits)
-
RunPythonFileTool (
python_run_file) - Execute existing Python files from VFS- Best for: Re-running scripts, organized multi-file projects
- Reads and executes .py files already stored in VFS
-
ExecuteCodeTool (
python_run_prepared) - Execute code from graph state (state-based extraction)- Best for: Large/complex code (20+ lines), subplots, multi-step analysis
- Solves AG-UI/LangGraph tool parameter serialization issues
- LLM generates code, stores in graph state, tool extracts and executes
- Use this for complex visualizations and large code blocks
- FileReadTool (
file_read) - Read files from PostgreSQL VFS - FileWriteTool (
file_write) - Write files to PostgreSQL VFS (20MB limit, HITL approval) - FileEditTool (
file_edit) - Edit files by replacing unique strings - FileListTool (
file_list) - List files with optional prefix filtering - FileDeleteTool (
file_delete) - Delete files from VFS (HITL approval required)
- FileGlobTool (
file_glob) - Find files matching glob patterns - FileGrepTool (
file_grep) - Search file contents with regex
See Tools Reference for detailed documentation.
Use python_run for:
- Simple calculations and data processing
- Code under ~10 lines
- Quick operations where code fits comfortably in tool parameters
Use python_run_file for:
- Re-running previously created scripts
- Organized multi-file projects
- Scripts stored permanently in VFS
Use python_run_prepared for:
- Complex visualizations with subplots
- Large code blocks (20+ lines)
- Multi-step data analysis pipelines
- When you encounter "missing required parameter" errors with
python_run - Any code too large for tool parameter serialization
State-Based Code Execution Pattern (python_run_prepared):
The python_run_prepared tool solves a critical issue with LangGraph/AG-UI: when LLMs try to pass large code blocks through tool parameters, the serialization layer can drop or truncate them, causing "missing required parameter" errors.
How it works:
- LLM generates Python code (automatically stored in graph state's
pending_codefield) - LLM calls
python_run_prepared(file_path="/tmp/viz.py", description="Create subplot visualization") - Tool extracts code from state, saves to VFS, and executes
- Code is cleared from state after successful execution
This pattern enables complex visualizations and large-scale data processing without serialization limits.
Built-in helpers for document processing (automatically available in sandbox):
- Word (DOCX) - Extract text, read tables, find/replace, add comments, convert to markdown
- Excel (XLSX) - Read/write cells, convert to dictionaries, detect formulas
- PowerPoint (PPTX) - Extract text, replace content, inventory slides, generate HTML
- PDF - Merge, split, extract text, rotate pages, get metadata
See Helpers Reference for complete documentation.
Mayflower Sandbox supports Human-in-the-Loop approval for destructive operations, allowing users to confirm actions before they execute. This is particularly important for file deletions and modifications.
The HITL mechanism uses CopilotKit's renderAndWaitForResponse pattern to create a seamless approval workflow:
┌─────────┐ ┌─────────┐ ┌──────────┐ ┌──────┐
│ LLM │─────▶│ Backend │─────▶│ Frontend │─────▶│ User │
└─────────┘ └─────────┘ └──────────┘ └──────┘
│ │ │ │
│ 1. Call tool │ │ │
│ (no approval) │ │ │
├───────────────▶│ │ │
│ │ 2. Return │ │
│ │ "WAIT_FOR_USER_ │ │
│ │ APPROVAL" │ │
│◀───────────────┤ │ │
│ │ 3. Trigger │ │
│ │ approval dialog │ │
│ ├─────────────────▶│ │
│ │ │ 4. Show UI │
│ │ ├──────────────▶│
│ │ │ 5. User │
│ │ │ approves │
│ │ │◀──────────────┤
│ │ 6. Re-call with │ │
│ │ approved=true │ │
│ │◀─────────────────┤ │
│ 7. Execute │ │ │
│ & return │ │ │
│◀───────────────┤─────────────────▶│──────────────▶│
Backend (file_delete.py):
class FileDeleteInput(BaseModel):
file_path: str = Field(description="Path to the file to delete")
approved: bool = Field(
default=False,
description="User approval status for deletion"
)
class FileDeleteTool(SandboxTool):
async def _arun(
self,
file_path: str,
approved: bool = False,
run_manager: AsyncCallbackManagerForToolRun | None = None,
) -> str:
# HITL: If not approved, return special message
if not approved:
return "WAIT_FOR_USER_APPROVAL"
# User approved - proceed with deletion
vfs = VirtualFilesystem(self.db_pool, thread_id)
deleted = await vfs.delete_file(file_path)
return f"Successfully deleted: {file_path}"Frontend (CopilotKit integration):
useCopilotAction({
name: 'file_delete',
description: 'Delete a file. Requires user approval.',
parameters: [
{
name: 'file_path',
type: 'string',
required: true,
},
// NOTE: 'approved' parameter intentionally NOT defined here
// CopilotKit detects it's missing and triggers approval flow
],
renderAndWaitForResponse: ({ args, respond }) => {
// Show confirmation dialog
// When user approves: respond({ approved: true })
// When user cancels: respond({ approved: false })
},
});-
Parameter Omission Detection
- Frontend omits
approvedparameter from tool definition - Backend requires
approvedparameter withdefault=False - CopilotKit detects the mismatch and triggers approval flow
- Frontend omits
-
Special Return Value
"WAIT_FOR_USER_APPROVAL"signals approval needed- Not an error—it's a control flow signal
-
Stateless Re-invocation
- Frontend re-calls tool with
approvedparameter - No server-side state needed
- Frontend re-calls tool with
-
Security by Default
- Default is always
approved=False(safe) - Destructive operations require explicit user consent
- Default is always
- FileDeleteTool - Requires approval before deleting files
- FileWriteTool - Requires approval for overwriting existing files
To add HITL approval to any tool:
- Add
approved: bool = Field(default=False)to input schema - Check approval status at the start of
_arun() - Return
"WAIT_FOR_USER_APPROVAL"if not approved - Omit
approvedfrom frontend parameter definition - Implement
renderAndWaitForResponsein frontend
# Setup PostgreSQL in Docker and run migrations
make db-setup
# Install dependencies and run tests
uv venv
uv pip install -e ".[dev]"
POSTGRES_PORT=5433 uv run pytest -v
# When done, stop database
make db-down# Start database
make db-up
# Run all tests
pytest -v
# Run specific test suites
pytest tests/test_executor.py -v
pytest tests/test_pptx_helpers.py -v
# Stop database
make db-downCore Tests: ✅ All passing
- Executor: 12/12
- Filesystem: 12/12
- Manager: 9/9
- Tools: 10/10
- Session Recovery: 16/16
Helper Tests: ✅ All passing
- PPTX: 5/5
- XLSX: 4/4
- Word: 4/4
- PDF: 4/4
export POSTGRES_HOST=localhost
export POSTGRES_DB=mayflower_test
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=postgres
export POSTGRES_PORT=5432- sandbox_sessions - Session tracking (180-day expiration)
- sandbox_filesystem - File storage (20MB per file limit)
- sandbox_session_bytes - Stateful execution support
See API Reference for complete schema.
The worker pool provides 70-95% performance improvement by keeping Pyodide loaded in memory:
| Operation | Without Pool | With Pool | Improvement |
|---|---|---|---|
| Simple code | 4.5s | 0.5s | 89% faster |
| With numpy | 4.5s | 0.2s | 96% faster |
| With matplotlib | 14s | 1.5s | 89% faster |
Enable worker pool:
import os
os.environ["PYODIDE_USE_POOL"] = "true" # Enable (recommended)
# Optional configuration
os.environ["PYODIDE_POOL_SIZE"] = "3" # Number of workers (default: 3)
os.environ["PYODIDE_WORKER_REQUEST_LIMIT"] = "1000" # Recycle after N requests
os.environ["PYODIDE_HEALTH_CHECK_INTERVAL"] = "30" # Health check secondsHow it works:
- 3 long-running Deno workers keep Pyodide + micropip loaded
- Round-robin load balancing across workers
- Automatic health monitoring and recovery
- Session state preserved between executions
See Worker Pool Documentation for details.
When worker pool is disabled (PYODIDE_USE_POOL=false):
- File operations: < 50ms
- Python execution: ~4-5s per execution (loads Pyodide each time)
- Helper loading: < 100ms
- Thread isolation: 100% via PostgreSQL
- ✅ WebAssembly sandboxing (Pyodide)
- ✅ Path validation (prevents directory traversal)
- ✅ Size limits (20MB per file)
- ✅ Thread isolation (complete separation)
- ✅ Configurable network access
- ✅ Automatic session expiration
- ✅ HITL approval for destructive operations (file deletion, overwrites)
# Setup
git clone <repo>
cd mayflower-sandbox
pip install -e ".[dev]"
# Run linters
ruff check src/ tests/
ruff format src/ tests/
# Run tests
pytest -vMIT
- Documentation: See docs/ directory
- Issues: GitHub Issues