GitHub - elsium-ai/elsium-ai: Production-grade TypeScript AI runtime focused on reliability, governance, and reproducible LLM systems. Multi-provider gateway, agents, RAG, workflows, policy engine, audit trails, and deterministic testing — built for teams shipping AI in production.

Reliability. Governance. Reproducible AI.
The TypeScript framework for AI systems you can trust in production.

Website · Live Demos · GitHub

Every demo is interactive and runnable: elsiumai.com/#demos

AI systems must fail predictably. AI systems must be auditable. AI systems must be reproducible. AI systems must be governed by policy, not hope.

Every feature in ElsiumAI exists to serve one of these principles. If it doesn't, it doesn't ship.

The Problem

Every AI framework helps you call an LLM. None of them help you trust the result.

ElsiumAI is built on three pillars that most frameworks ignore entirely:

Pillar	The guarantee
Reliability	Your system stays up when providers break — circuit breakers, bulkhead isolation, request dedup, graceful shutdown
Governance	You control who does what, and you can prove it — policy engine, RBAC, approval gates, hash-chained audit trail, agent identity, runtime policy enforcement, memory integrity, MCP trust framework, compliance reporting (OWASP Agentic, EU AI Act, Colorado AI Act)
Reproducible AI	Tools to measure, pin, and trace AI outputs — seed propagation, output pinning, provenance tracking, determinism assertions (see caveats below)

It also does everything you'd expect — multi-provider gateway, agents, tools, RAG, workflows, MCP, streaming, cost tracking. But those are table stakes. The three pillars are what make ElsiumAI different.

Quick Start

npm install @elsium-ai/core @elsium-ai/gateway @elsium-ai/agents

import { gateway } from '@elsium-ai/gateway'
import { defineAgent } from '@elsium-ai/agents'
import { env } from '@elsium-ai/core'

const llm = gateway({
  provider: 'anthropic',
  model: 'claude-sonnet-4-6',
  apiKey: env('ANTHROPIC_API_KEY'),
})

const agent = defineAgent(
  { name: 'assistant', system: 'You are a helpful assistant.' },
  { complete: (req) => llm.complete(req) },
)

const result = await agent.run('What is TypeScript?')

Cross-Runtime Support

Since 0.13.0, every governance and reliability primitive in ElsiumAI loads on any modern JS runtime. The framework no longer depends on node:crypto for hash chains, agent identity, signed replay, idempotent checkpoints, or auth middleware — all use the Web Crypto API (globalThis.crypto) instead.

Runtime	Status	Notes
Node.js ≥ 20	✅ fully supported	The reference runtime
Bun ≥ 1	✅ fully supported	Toolchain target
Deno	✅ supported	Web Crypto native
Cloudflare Workers	✅ supported	Web Crypto native; SQLite memory store and the CLI stay Node-only by design
Vercel Edge	✅ supported	Same caveats as Workers
Browser	✅ supported for non-server modules	Use the gateway behind your own proxy for API-key safety

The published elsium-ai umbrella tarball is 412 KB (-67% vs 0.12.x — see #35) and contains zero node:* imports across the governance pillar. Edge deployments work out of the box.

Reliability

Providers go down. Rate limits hit. Costs spiral. ElsiumAI treats failure as a first-class concern.

import { createProviderMesh } from '@elsium-ai/gateway'
import { env } from '@elsium-ai/core'

const mesh = createProviderMesh({
  providers: [
    { name: 'anthropic', config: { apiKey: env('ANTHROPIC_API_KEY') } },
    { name: 'openai', config: { apiKey: env('OPENAI_API_KEY') } },
  ],
  strategy: 'fallback',
  circuitBreaker: {         // Provider failing? Circuit opens, traffic reroutes
    failureThreshold: 5,
    resetTimeoutMs: 30_000,
  },
})

Feature	What it does
Circuit Breaker	Detects failing providers, stops sending traffic, auto-recovers. Scoped per `(provider, model)` so a flaky model never trips a healthy peer.
Bulkhead Isolation	Bounds concurrency — one slow consumer can't starve the rest
Fair Queuing Per Agent	Token-bucket rate limiter with per-agent buckets — one greedy agent can't drain a shared LLM quota
Request Dedup	Identical in-flight calls coalesce into one API request
Idempotent Checkpoints	Workflow steps with `idempotent: true` never re-run after a crash recovery. Failures are cached and replayed verbatim.
Graceful Shutdown	Drains in-flight operations before process exit
Retry with Backoff	Exponential backoff with jitter, respects `Retry-After` headers
Stream Failover	Provider stream fails mid-request? Automatically switches to next provider

Governance

Who called which model? Did they have permission? Can you prove the audit log hasn't been tampered with?

import { createPolicySet, policyMiddleware, modelAccessPolicy, costLimitPolicy, env } from '@elsium-ai/core'
import { createAuditTrail, auditMiddleware } from '@elsium-ai/observe'
import { createRBAC } from '@elsium-ai/app'

// Policy: what's allowed
const policies = createPolicySet([
  modelAccessPolicy(['claude-sonnet-4-6', 'gpt-4o-mini']),
  costLimitPolicy(5.00),
])

// Audit: what happened (hash-chained, tamper-proof)
const audit = createAuditTrail({ hashChain: true })

// RBAC: who can do it
const rbac = createRBAC({
  roles: [{ name: 'analyst', permissions: ['model:use:gpt-4o-mini'], inherits: ['viewer'] }],
})

const llm = gateway({
  provider: 'anthropic',
  apiKey: env('ANTHROPIC_API_KEY'),
  middleware: [policyMiddleware(policies), auditMiddleware(audit)],
})

Feature	What it does
Declarative Policy Engine	Policies as data (`PolicyDocument` YAML/JSON), not closures — hot-reload, version-control independent of code, compliance-team-readable
Policy Engine (legacy closures)	Original declarative rules — deny by model, cost, token count, or content pattern. Coexists with the data-driven form.
Runtime Policy Enforcement	Enforce policies inside the agent loop — check permissions before every tool call
RBAC	Role-based permissions with inheritance and wildcard matching
Multi-Stage Approval Chain	Sequential approval stages with role/user/callback approvers, per-stage timeouts, escalation, persistent state. Skipped stages, denied stages, timeouts — all auditable.
Approval Gates	Single-callback gate for high-stakes tool calls (legacy; new chains preferred)
Agent Identity	HMAC-SHA256 signed agent requests with replay protection and cross-agent verification
Memory Integrity	SHA-256 hash-chained message stores — detect tampering in agent memory
Audit Trail	SHA-256 hash-chained events with tamper-proof integrity verification, pluggable sinks (webhook, Splunk, Datadog)
Cost Attribution per Tenant	Eight cost dimensions (model / agent / user / feature / tenant / workflow / workflowStep / traceId), reserve/commit/release for concurrent writers
Jurisdiction Routing	PII classifier → JurisdictionRouter intersects allowed providers per data class. EU email never reaches a US-only model.
Compliance Reporting	Generate reports against OWASP Agentic, EU AI Act, Colorado AI Act frameworks
MCP Trust Framework	Server allowlists, tool filtering, output validation, manifest integrity for MCP
PII Detection	Auto-redacts emails, phones, addresses, API keys before they reach the model

Reproducible AI

LLMs are non-deterministic by nature. ElsiumAI gives you the tools to constrain, measure, and track output consistency — but the framework cannot make a hosted model deterministic on its own. See the caveats below.

import { assertDeterministic } from '@elsium-ai/testing'
import { createProvenanceTracker } from '@elsium-ai/observe'

// Measure: how stable is the output across N runs?
const result = await assertDeterministic(
  (seed) => llm.complete({
    messages: [{ role: 'user', content: [{ type: 'text', text: 'Classify: spam' }] }],
    temperature: 0,
    seed,  // Forwarded to provider API where supported
  }).then(r => extractText(r.message.content)),
  { runs: 5, seed: 42, tolerance: 0 },
)
// { deterministic: true | false, variance: number, uniqueOutputs: number }

// Prove: who/what/when produced this output
const provenance = createProvenanceTracker()
provenance.record({ prompt, model, config, input, output, traceId })

Feature	What it does
Seed Propagation	Forwards `seed` to providers that accept it (OpenAI, Google). Anthropic does not expose a seed parameter; calls without one rely on `temperature: 0` alone.
Output Pinning	Locks expected outputs — model update changes your classifier? CI catches it
Determinism Assertions	Runs N times and reports variance. Does not enforce determinism — it surfaces drift so you fail builds before users see it.
Drift Detection	Compare yesterday's model snapshot against today's: exact-match rate, length delta, tool-call divergence, semantic similarity (with pluggable provider). Runs in production against sampled traffic, not only in CI.
Audit-Grade Signed Replay	HMAC-SHA256 hash-chained recorder of every LLM call. `verifyReplay` returns the exact `invalidAtIndex` on tampering. Legal-citable evidence with the right secret management.
Streaming Replay	Record and replay token-level `StreamEvent` sequences for deterministic UI / partial-result tests
Trace Replay With Overrides	Side-by-side cost / latency / contentChanged for the same inputs run under a different model / temperature / system prompt
Per-Case Regression Budgets	Each baseline case carries its own `tolerance` + `maxDelta`. Critical cases get tight budgets; open-ended cases get loose ones. CI fails on `critical` outcome.
Provenance Tracking	SHA-256 hashes every prompt/config/input/output — full lineage per traceId
Request-Matched Fixtures	Replay test fixtures by content hash, not sequence order

Reproducibility caveats. True bit-exact reproducibility against a hosted provider requires temperature: 0 AND a stable system_fingerprint (OpenAI rotates these between deploys, so identical (prompt, seed, temperature) calls can still differ across days). Tool calls that hit external APIs are not seedable. For full reproducibility, pair these primitives with mockProvider from @elsium-ai/testing or replay-recorder fixtures — the determinism assertions then catch real drift instead of provider noise.

Everything Else

The three pillars are what make ElsiumAI unique. These are the fundamentals it also delivers:

Multi-provider gateway — X-Ray mode, middleware, smart routing (fallback, cost-optimized, latency-racing, capability-aware)
Agents — Memory, semantic guardrails, confidence scoring, state machines, multi-agent orchestration, ReAct reasoning loop
Multimodal — Text, image, audio, and document content across all providers
Structured output — Native JSON mode per provider (OpenAI json_schema, Anthropic tool-use, Google responseSchema)
RAG — Document loading, PDF loading, chunking, embeddings, vector search, PgVector store, plugin registries
Workflows — Retries, parallel execution, branching, checkpointing, resumable workflows
MCP — Bidirectional client/server bridge, resources, prompts
Custom providers — OpenAI-compatible adapter for Ollama, Groq, Together, any OpenAI-compatible API
Caching — LRU response cache with TTL, custom adapters, streaming bypass
Output guardrails — PII/secret detection in responses, content policy, block/redact/warn modes
Batch processing — Concurrent LLM requests with semaphore control, per-item retry, progress callbacks
Token counting & context management — Model-aware estimation, truncate/summarize/sliding-window strategies
SSE streaming — Server-Sent Events for HTTP endpoints, real-time response streaming
Multi-tenant — Tenant extraction, per-tenant rate limiting, tier-based access control
A/B experiments — Weight-based variant assignment, deterministic user hashing, metrics aggregation
Client SDK — TypeScript HTTP client with SSE parsing for consuming ElsiumAI servers
Persistent storage — SQLite memory stores for agents, PgVector for RAG
Cost intelligence — Budgets, projections, loop detection
Testing — Mock providers, evals, LLM-as-judge, prompt versioning, regression suites, dataset loading (JSON/CSV/JSONL), baseline comparison, multi-turn conversation testing, tool call assertions, automated red-teaming (44 adversarial probes including multi-turn), agent metrics (efficiency, recovery, cost), unified agent eval runner, CI reporters (JUnit XML, GitHub Actions, Markdown)
Structured extraction — Zod schema → typed output, auto-retry on validation failure
Dev Studio — Local web dashboard for live traces, X-Ray, costs, streaming events
AI Proxy — OpenAI-compatible proxy with cost tracking, caching, audit — any language, zero code changes

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                          @elsium-ai/app                           │
│                  HTTP server · RBAC · auth · routes               │
├────────────────────┬────────────────┬────────────────────────────┤
│  @elsium-ai/agents │ @elsium-ai/mcp │       @elsium-ai/cli       │
│  memory · approval │ client · server│  init · dev · eval · studio│
│  guardrails · multi│ resources      │  proxy                     │
│  ReAct             │ prompts        │                            │
├──────────┬─────────┼────────┬───────┼───────────┬────────────────┤
│  gateway │  tools  │observe │  rag  │ workflows │   client      │
│ providers│ define  │ trace  │ load  │   steps   │  HTTP+SSE     │
│   mesh   │ toolkit │ audit  │ chunk │  parallel │   parsing     │
│ security │         │ prove- │ embed │  branch   │               │
│ bulkhead │         │ nance  │vector │checkpoint │               │
│  cache   │         │ experi-│pgvect │ resumable │               │
│guardrail │         │  ment  │regist │           │               │
│  batch   │         │        │  PDF  │           │               │
│ openai-  │         │        │       │           │               │
│  compat  │         │        │       │           │               │
├──────────┴─────────┴────────┴───────┴───────────┴───────────────┤
│                         @elsium-ai/core                           │
│    types · errors · stream · logger · config · retry · result    │
│    circuit breaker · dedup · policy engine · shutdown manager     │
│    tokens · context manager · registry · schema · multimodal     │
└──────────────────────────────────────────────────────────────────┘
                          ·  ·  ·  ·  ·  ·
┌──────────────────────────────────────────────────────────────────┐
│                       @elsium-ai/testing                          │
│    mocks · evals · fixtures · pinning · determinism · snapshots  │
└──────────────────────────────────────────────────────────────────┘

Three Pillars — where each feature lives:

  Reliability             Governance              Determinism
  ───────────             ──────────              ───────────
  circuit breaker  [core] policy engine    [core] seed propagation [gw]
  request dedup    [core] RBAC             [app]  output pinning   [test]
  shutdown manager [core] approval gates   [agt]  determinism test [test]
  retry + backoff  [core] audit trail      [obs]  provenance       [obs]
  bulkhead         [gw]   PII detection    [gw]   req-match fixts  [test]
  provider mesh    [gw]   content classify [gw]   crypto hashing   [test]

Packages

Package	Description
`@elsium-ai/core`	Types, errors, streaming, circuit breaker, dedup, policy engine, shutdown, tokens, context manager, registry, schema
`@elsium-ai/gateway`	Multi-provider gateway, X-Ray, provider mesh, OpenAI-compatible provider, bulkhead, PII detection, caching, output guardrails, batch processing
`@elsium-ai/agents`	Agents, ReAct agent, memory, persistent stores (in-memory, SQLite), guardrails, approval gates, multi-agent
`@elsium-ai/tools`	Tool definitions with Zod validation
`@elsium-ai/rag`	Document loading, PDF loading, chunking, embeddings, BM25, hybrid search, vector search, PgVector store, plugin registries
`@elsium-ai/workflows`	DAG workflows, sequential, parallel, branching, checkpointing, resumable workflows
`@elsium-ai/observe`	Tracing, cost intelligence, audit trail, provenance tracking, A/B experiments
`@elsium-ai/mcp`	Bidirectional MCP client and server, resources, prompts
`@elsium-ai/app`	HTTP server, CORS, auth, rate limiting, RBAC, SSE streaming, multi-tenant
`@elsium-ai/client`	TypeScript HTTP client with SSE parsing for consuming ElsiumAI servers
`@elsium-ai/testing`	Mocks, evals, multi-turn agent testing, tool assertions, red-teaming (single + multi-turn), agent metrics, CI reporters
`@elsium-ai/cli`	Scaffolding, dev server, X-Ray inspection

Built-In Capabilities

Beyond agents, tools, RAG, and multi-provider routing, ElsiumAI ships production infrastructure out of the box:

Category	Feature
Reliability	Circuit Breaker, Bulkhead Isolation, Request Dedup, Graceful Shutdown, Retry with Backoff, Stream Failover
Governance	Policy Engine, Runtime Policy Enforcement, RBAC, Approval Gates, Agent Identity, Memory Integrity, Hash-Chained Audit, Compliance Reporting, MCP Trust Framework, PII Detection, Output Guardrails, Multi-Tenant
Determinism	Seed Propagation, Output Pinning, Determinism Assertions, Provenance Tracking, A/B Experiments
Performance	Response Caching, Batch Processing, Token Counting, Context Management
Multimodal	Text, Image, Audio, Document across Anthropic, OpenAI, Google
Structured Output	Native JSON mode per provider, Zod schema validation

Performance

Measured with zero-latency mock provider to isolate framework cost. Full methodology and reproduction steps in benchmarks/.

Framework Cost (Isolated)

Metric	P50	P95	Conditions
Core completion path	2.3μs	5.5μs	Agent, no middleware
Full governance stack	6.2μs	9.5μs	Security + audit + policy + cost + xray + logging
Under concurrency	5.0μs	6.4μs	100 parallel requests, full stack

Real-World Context


Typical LLM network latency	200–800ms
ElsiumAI overhead at P95	<10μs
Framework cost contribution	<0.01% of total request time

Resource Footprint

Metric	Value
Cold start	<3ms
Bundle size (minified)	349 KB
Memory per 10K requests	~10 MB (full stack + tracing + audit, all in-memory, capped)
Per-request heap growth	~1 KB
Circuit breaker throughput	>5M ops/sec

Baselines are frozen per release and checked for regressions in CI. See benchmarks/results/ for historical data.

Principles

Fail predictably — handle failure before you see it
Trust but verify — every call auditable, every output traceable
Reproducible by design — testable AI is trustworthy AI
Zero magic — createX(config) everywhere, no hidden behavior
Type safety end-to-end — from config to LLM output
Modular — use what you need, tree-shake the rest

Name		Name	Last commit message	Last commit date
Latest commit History 207 Commits
.changeset		.changeset
.github		.github
.husky		.husky
assets		assets
benchmarks		benchmarks
docs		docs
examples		examples
packages		packages
scripts		scripts
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
biome.json		biome.json
bun.lock		bun.lock
bunfig.toml		bunfig.toml
commitlint.config.js		commitlint.config.js
package.json		package.json
tsconfig.check.json		tsconfig.check.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Problem

Quick Start

Cross-Runtime Support

Reliability

Governance

Reproducible AI

Everything Else

Architecture

Packages

Built-In Capabilities

Performance

Framework Cost (Isolated)

Real-World Context

Resource Footprint

Principles

Contributing

Author

License

About

Uh oh!

Releases 42

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The Problem

Quick Start

Cross-Runtime Support

Reliability

Governance

Reproducible AI

Everything Else

Architecture

Packages

Built-In Capabilities

Performance

Framework Cost (Isolated)

Real-World Context

Resource Footprint

Principles

Contributing

Author

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 42

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages