If
CLAUDE.local.mdexists in this directory, read it first — it contains current session state.
AURA is a TOML-based configuration system for composing Rig.rs AI agents with MCP tools and RAG pipelines.
All major features complete:
- Bounded streaming with custom aura events
- Rig 0.28 upgrade with ProviderAgent architecture
- Configurable MCP header forwarding (
headers_from_requestwith static TOML fallback) - Request-scoped MCP progress and cancellation
- Client disconnect detection with MCP
notifications/cancelled - Multi-agent orchestration mode with coordinator/worker architecture and DAG execution
Pending: Upstream Rig PRs - StreamingPromptHook fix + Content-Type header fix
# Build
cargo build --release
# Start web server (default config.toml)
cargo run --bin aura-web-server
# Start with orchestration config
CONFIG_PATH=configs/example-math-orchestration.toml AURA_CUSTOM_EVENTS=true cargo run --bin aura-web-server
# Build and run CLI (HTTP mode — connects to aura-web-server)
cargo run -p aura-cli -- --api-url http://localhost:8080
# Build and run CLI (standalone mode — no server needed)
cargo run -p aura-cli --features standalone-cli -- --standalone --config configs/my-agent.toml
# Run integration tests (local, requires Docker)
make test-integration-local # base integration
make test-integration-orchestration-local # orchestration integration
make test-integration-sre-orchestration-local # SRE orchestration integrationaura/
├── crates/
│ ├── aura/ # Core library (agent builder + orchestration)
│ ├── aura-cli/ # Interactive terminal client (HTTP + standalone modes)
│ ├── aura-config/ # TOML parsing and configuration
│ ├── aura-events/ # Shared SSE event types (lightweight, no agent deps)
│ ├── aura-web-server/ # OpenAI-compatible API
│ └── aura-test-utils/ # Shared testing utilities
├── compose/ # Docker Compose (integration + orchestration overlays)
├── configs/ # Integration test and example configurations
├── deployment/ # Helm charts and K8s manifests
├── docs/ # Architecture and protocol documentation
├── examples/ # Example and reference configurations
└── .makefiles/ # Modular Make targets (rust, docker, node, aura)
- TOML-based declarative configuration
- Environment variable resolution (
{{ env.VAR }}) - Support for multiple LLM providers (OpenAI, Anthropic, Bedrock, Gemini, Ollama, OpenRouter)
- Dynamic tool registration
- HTTP Transport: Full authentication and tool execution
- SSE Transport: AWS Knowledge Base integration
- STDIO Transport: Tool discovery
- Header Forwarding:
headers_from_requestmappings with static TOMLheadersas fallback - Cancellation:
notifications/cancelledpropagation on client disconnect
- OpenAI-compatible SSE streaming (
/v1/chat/completions) - Custom
aura.*events (opt-in viaAURA_CUSTOM_EVENTS=true):aura.session_info,aura.tool_requested,aura.tool_start,aura.tool_complete,aura.reasoning,aura.progress,aura.worker_phase,aura.tool_usage,aura.usage,aura.scratchpad_usage
- Request cancellation on timeout or client disconnect
- Two-phase graceful shutdown: new requests rejected immediately (503), in-flight streams get configurable grace period (
SHUTDOWN_TIMEOUT_SECS, default 30s)
- Intercepts large MCP tool outputs and saves them to disk instead of filling the context window
- Eight read-only exploration tools:
head,slice,grep,schema,item_schema,get_in,iterate_over,read - Per-tool token thresholds configured via
[mcp.servers.<name>.scratchpad]TOML sections (min_tokens, default5_120). Keys are glob patterns matched against tool names at interception time; when multiple patterns match the same tool, the longest (most specific) wins, ties broken by smallest threshold - Token counting uses tiktoken-rs (real BPE tokenization, not heuristics) —
o200k_basefor GPT-5/4o/o-series,cl100k_basefor older OpenAI models,o200k_basefallback for other providers - Works in both single-agent and orchestration mode:
- Single-agent: configure
[agent.scratchpad]with top-levelmemory_dir = "..."— storage lands under{memory_dir}/scratchpad/, budget built from[agent.llm].context_window - Orchestration:
[agent.scratchpad]provides defaults,[orchestration.worker.<name>.scratchpad]overrides per worker; top-levelmemory_diralso roots orchestration persistence (legacy[orchestration.artifacts].memory_dirstill works as a fallback)
- Single-agent: configure
- Per-worker budgets: each worker gets a fresh
ContextBudgetscoped to its effective LLM (worker'sllmoverride if set, otherwise[agent.llm]) - Workers never share an "orchestrator-level" budget; budgets are created at
create_worker()time and live onAgent.scratchpad_budget - LLM-reported usage feedback (
input_tokens+output_tokens) feeds into the budget as ground truth each turn — orchestration viaStreamItem::TurnUsage, single-agent via the streaming hook'son_stream_completion_response_finish - Per-call extraction limit (
max_extraction_tokens, default 10k) prevents single reads from flooding context - Auto-increased
turn_depthwhen scratchpad is active (turn_depth_bonus, default 6) — applied in both single-agent and worker contexts aura.scratchpad_usageSSE event emitted per-agent withagent_id,tokens_intercepted,tokens_extracted— fires in both single-agent and orchestration contexts (lives in baseaura.*namespace, notaura.orchestrator.*)- Storage (orchestration):
{memory_dir}/{run_id}/iteration-{n}/scratchpad/ - Storage (single-agent):
{memory_dir}/scratchpad/ memory_diris a top-level TOML field shared by single-agent scratchpad and orchestration persistence
- Coordinator/worker architecture with DAG-based parallel task execution
- Per-worker LLM overrides: workers inherit
[agent.llm]by default;[orchestration.worker.<name>.llm]overrides it (different model, same provider config). Resolved inline at worker construction (worker.llm.as_ref().unwrap_or(&agent.llm)) - Dependency-aware multi-wave execution with iterative re-planning (
max_planning_cycles) - Three-way routing: direct answer, orchestrated plan, clarification
aura.orchestrator.*SSE events for real-time visibility (seedocs/streaming-api-guide.md)
- Interactive terminal client with REPL, one-shot mode, and conversation persistence
- One-shot output contract (
--query): stdout is the raw assistant response only — no●markers, no markdown rendering, no tool-execution summaries, no response-summary header, nobackend.summarizeround-trip. Errors, permission prompts, and warnings go to stderr (witherror:/warning:prefixes, no markers). Exit code 0 ⇒ stdout is the full response; non-zero ⇒ stderr explains and stdout is empty. The REPL retains rich formatting; the strict-output rules apply only to--querymode. Seecrates/aura-cli/src/oneshot.rs. - Two backends: HTTP mode (default) and standalone mode (
--standalone --config, builds agents in-process) - Standalone mode requires
--features standalone-cliat build time and explicit--standaloneflag at runtime --modelworks in both modes: HTTP passes it as starting model; standalone matches against agent.name/agent.alias in configs--system-promptworks in both modes: standalone prompts for append/replace; HTTP prompts for AURA vs OpenAI-compatible service--forcebypasses non-critical warnings (e.g. HTTP system-prompt in query mode)- Local tool execution: Shell, Read, ListFiles, Update, SearchFiles, FindFiles, FileInfo
- CLI advertises local tools to the server with
--enable-client-tools; the server attaches them only when[agent].enable_client_tools = true(filtered byclient_tool_filterglobs). Single-agent configs only — orchestrated configs drop the tools with a warning. No server-wide--enable-client-toolsflag. - USE AT YOUR OWN RISK. Enabling client-side tools is functionally equivalent to handing the LLM a shell prompt on the client machine — prompt injection, hallucination, and lack of sandboxing are real failure modes. See the prominent warnings in
README.mdandcrates/aura-cli/README.mdbefore enabling for any user-facing config. - Permission system (
.aura/permissions.json, formerlysettings.json) with allow/deny glob rules. Discovered by walking up from$PWDto find the closest.aura/. Project-scoped only — no global~/.aura/permissions.json. Legacysettings.jsonis still read with a deprecation warning; new rules saved at the prompt land inpermissions.jsonand migrate any existing legacy rules forward. - CLI preferences live in
~/.aura/cli.toml(global) and<project>/.aura/cli.toml(per-project override, walk-up discovered, merged on top of global per-field). Renamed from~/.aura/config.tomlto avoid collision with AURA agent TOML configs; the old name is still read with a deprecation warning. /modelcommand works in both modes — lists server models (HTTP) or loaded TOML configs (standalone)- Env vars:
AURA_API_URL,AURA_API_KEY,AURA_MODEL,AURA_EXTRA_HEADERS,AURA_LOG_FILE - Diagnostic logs: opt-in via
--log-file <path>/AURA_LOG_FILE/cli.tomllog_file(precedence: CLI > env > project > global > none). Events are appended to the file (no rotation — user-managed) in both REPL and one-shot mode, so stdout stays a clean pipe. Default filter iswarn,aura=info,aura_cli=info,aura_config=info,rig::agent::prompt_request=info; override withRUST_LOG. - OpenTelemetry (standalone only): when built with
--features standalone-cliand run with--standalone, the CLI installs an OTel layer whenOTEL_EXPORTER_OTLP_ENDPOINTis set. Trace shape mirrors the web server —agent.streamroot span viadirect.rs, withagent.turn/mcp.tool_call/orchestration.*nesting under it. CLI omits the HTTP-infrastructure spans (chat_completions,streaming_completion) since it has no HTTP layer. - Single shared tokio runtime:
mainowns onetokio::runtime::Runtimeand threads it intoBackend::from_config,run_oneshot, andrun_repl.logging::initruns insidert.enter()so the OTLP gRPC exporter can callHandle::current()duringwith_tonic()construction; theBatchSpanProcessorworker lives on the same runtime that handles every subsequent request.maincallsaura::logging::shutdown_tracer()viart.block_on(...)before returning to flush buffered spans. - SSE event parsing uses shared types from
aura-eventscrate (not indefault-members, build explicitly withcargo build -p aura-cli) - See
crates/aura-cli/README.mdfor full documentation
- Lightweight crate defining
AuraStreamEventandOrchestrationStreamEventenums - Both
Serialize + Deserialize— used by the web server (producer) and CLI (consumer) - No agent, MCP, or provider dependencies — only
serdeandserde_json ProgressTokentype uses a local wire-compatible definition by default; enablesrmcp-typesfeature for direct rmcp interop (used by theauracrate)
export OPENAI_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key" # Optional
export OPENROUTER_API_KEY="your-key" # Optional
export MEZMO_API_KEY="your-key" # For Mezmo MCP
export AWS_PROFILE="your-profile" # For Knowledge Base
export AWS_REGION="your-region" # For Knowledge Base- rig-core 0.28: ProviderAgent architecture (via fork for StreamingPromptHook fix)
- rmcp 0.12: MCP client with cancellation support
- Rig Fork:
mezmo/rigbranchmshearer/LOG-23351-openai-reasoning
provider_agent.rs- Type-erased streaming across providersstream_events.rs- Custom aura SSE eventsrequest_cancellation.rs- Request lifecycle managementtool_event_broker.rs- FIFO queue for tool_call_id correlation (see critical assumption below)orchestration/- Multi-agent coordinator, workers, DAG execution, orchestration SSE events
The tool_event_broker uses a FIFO queue for correlating tool_call_id between hook and MCP execution contexts. This relies on Rig 0.28 streaming mode executing tools sequentially.
If upgrading Rig, verify this assumption by reviewing:
rig-core/src/agent/prompt_request/streaming.rs- Look for
.awaitbetweenon_tool_callandon_tool_result(ensures sequential) - Check for
FuturesUnorderedor parallel execution patterns (would break FIFO)
Confirmed sequential as of Rig 0.28: the streaming handler .awaits each tool call inline. See docs/rig-tool-execution-order.md.
Status: Jenkins/Makefile complete, Helm charts and K8s manifests in deployment/
make build # Build release binary
make test # Run all tests
make docker-build # Build Docker image
make lint # Run clippy + fmt check- No AI co-authorship: Never add
Co-Authored-Bylines for Claude or any AI assistant. Claude cannot accept the CLA. - Sign-off commits as the user: Always sign off commits as the human user, not as Claude.
- Commit message format: Conventional Commits. First line must be entirely lowercase, no trailing period, under 72 characters. Use the body to explain what and why.
Format:
<type>(<optional scope>): <description>Types:feat,fix,doc,style,refactor,perf,test,chore,ciBreaking changes: add!after type/scope and include aBREAKING CHANGE:footer. If fixing an issue, includeFixes: #<issue number>in the footer.
README.md- User-facing documentationcrates/aura-cli/README.md- CLI usage, backends, features, and build instructionsCHANGELOG.md- Auto-generated version historydocs/streaming-api-guide.md- SSE streaming, custom events, and orchestration eventsdocs/ollama-guide.md- Ollama configuration, fallback tool parsing, and local model guidancedocs/request-lifecycle.md- Request flow, lifecycle, timeout, cancellation, and shutdowndocs/rig-tool-execution-order.md- Tool execution order analysisdocs/rig-fork-changes.md- Rig fork changes and rationaledocs/orchestration-tickets.md- Epic ticket table, dependency graph, research references, implementation plan