LLMSim

LLM Traffic Simulator - A lightweight, high-performance LLM API simulator for load testing, CI/CD, and local development.

Overview

LLMSim replicates realistic LLM API behavior without running actual models. It solves common challenges when testing LLM-integrated applications:

Cost: Real API calls during load tests are expensive
Rate Limits: Production APIs prevent realistic load testing
Reproducibility: Real models produce variable responses
Traffic Realism: LLM responses have unique characteristics (streaming, variable latency, token-based billing)

Features

Multi-Provider API Support - OpenAI Chat Completions, OpenResponses, and Anthropic Messages APIs
Realistic Latency Simulation - Time-to-first-token (TTFT) and inter-token delays with normal distribution
Streaming Support - Server-Sent Events (SSE) for OpenAI, OpenResponses, and Anthropic streaming formats
Image Generation - Simulated gpt-image ("ChatGPT Images") endpoint returning watermarked PNGs, with streaming partial images
Accurate Token Counting - Uses tiktoken-rs (OpenAI's tokenizer implementation)
Error Injection - Rate limits (429), server errors (500/503), timeouts
Multiple Response Generators - Lorem ipsum, echo, fixed, random, sequence
Model-Specific Profiles - GPT-5, GPT-4, Claude, Gemini latency profiles
Real-time Stats Dashboard - TUI dashboard with live metrics (requests, tokens, latency, errors)
Stats API - JSON endpoint for programmatic access to server metrics

Installation

cargo install llmsim

# Include the optional terminal dashboard
cargo install llmsim --features tui

Demo

Usage

CLI Server

# Start with defaults (port 8080, lorem generator)
llmsim serve

# Start with real-time stats dashboard (TUI)
# Requires installing/building with `--features tui`
llmsim serve --tui

# All options
llmsim serve \
  --port 8080 \
  --host 0.0.0.0 \
  --generator lorem \
  --target-tokens 150 \
  --tui

# Using config file
llmsim serve --config config.toml

Stats Dashboard

The --tui flag launches an interactive terminal dashboard showing real-time metrics:

Requests: Total, active, streaming vs non-streaming, requests/sec
Tokens: Prompt, completion, total, tokens/sec
Latency: Average, min, max response times
Errors: Total errors, rate limits (429), server errors (5xx), timeouts
Charts: RPS and token rate sparklines, model distribution

Controls: q to quit, r to force refresh.

As a Library

use llmsim::{
    openai::{ChatCompletionRequest, Message},
    generator::LoremGenerator,
    latency::LatencyProfile,
};

// Create a latency profile
let latency = LatencyProfile::gpt5();

// Count tokens
let tokens = llmsim::count_tokens("Hello, world!", "gpt-5").unwrap();

// Generate responses
let generator = LoremGenerator::new(100);
let response = generator.generate(&request);

Cargo features

The crate is split into optional features so library consumers only pull in what they use. The defaults (["cli"]) give the full binary, so cargo build, cargo run -- serve, and cargo test work out of the box.

Feature	Adds	Extra dependencies
`tokens`	`tokens` module (token counting)	`tiktoken-rs`
`server`	`cli` module (axum router, handlers, websockets); implies `tokens`	`axum`, `tower-http`
`cli`	the `llmsim` binary; implies `server`	`clap`, `tracing-subscriber`
`tui`	`serve --tui` dashboard; implies `cli`	`ratatui`, `crossterm`

To embed only the core library modules (types, generators, latency, streaming, stats, scripts) and shed axum, tower-http, tiktoken-rs, clap, websockets, and tracing-subscriber:

[dependencies]
llmsim = { version = "0.4", default-features = false }

Note that count_tokens and the tokens module are only available with the tokens feature enabled.

API Endpoints

OpenAI API (`/openai/v1/...`)

Endpoint	Method	Description
`/openai/v1/chat/completions`	POST	Chat completions (streaming & non-streaming)
`/openai/v1/models`	GET	List available models
`/openai/v1/models/{model_id}`	GET	Get specific model details
`/openai/v1/responses`	POST	Responses API (streaming & non-streaming)
`/openai/v1/images/generations`	POST	Image generation (gpt-image, streaming & non-streaming)

When using OpenAI SDKs, set the base URL to http://localhost:8080/openai/v1.

OpenResponses API (`/openresponses/v1/...`)

OpenResponses is an open-source specification for building multi-provider, interoperable LLM interfaces.

Endpoint	Method	Description
`/openresponses/v1/responses`	POST	Create response (streaming & non-streaming)

Anthropic API (`/anthropic/v1/...`)

Simulates the Anthropic Messages API with realistic Claude model profiles.

Endpoint	Method	Description
`/anthropic/v1/messages`	POST	Messages API (streaming & non-streaming)
`/anthropic/v1/models`	GET	List available Claude models
`/anthropic/v1/models/{model_id}`	GET	Get specific model details

When using Anthropic SDKs, set the base URL to http://localhost:8080/anthropic:

from anthropic import Anthropic

client = Anthropic(base_url="http://localhost:8080/anthropic", api_key="not-needed")
msg = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=64,
    messages=[{"role": "user", "content": "Hello, Claude"}],
)
print(msg.content[0].text)

Runnable examples for Python, TypeScript, Go, curl, and LangChain live in examples/ (see examples/README.md).

LLMSim endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/llmsim/stats`	GET	Real-time server statistics (JSON)

Configuration

TOML Config File

[server]
port = 8080
host = "0.0.0.0"

[latency]
profile = "gpt5"
# Custom values (optional):
# ttft_mean_ms = 600
# ttft_stddev_ms = 150
# tbt_mean_ms = 40
# tbt_stddev_ms = 12

[response]
generator = "lorem"
target_tokens = 100

[errors]
rate_limit_rate = 0.01
server_error_rate = 0.001
timeout_rate = 0.0
timeout_after_ms = 30000

[models]
available = [
  "gpt-5",
  "gpt-5-mini",
  "gpt-4o",
  "claude-opus",
]

Note: The config file format moved from YAML to TOML in this release. To migrate an existing config.yaml, replace section headers like server: with [server], change key: value to key = value, quote strings, and convert lists. See benchmarks/config/*.toml for working examples.

Supported Models

Family	Models
GPT-5	gpt-5, gpt-5-pro, gpt-5-mini, gpt-5-nano, gpt-5-codex, gpt-5.1, gpt-5.2, gpt-5.3-codex, gpt-5.4, gpt-5.5
O-Series	o1, o1-mini, o3, o3-mini, o4-mini
GPT-4	gpt-4, gpt-4-turbo, gpt-4o, gpt-4o-mini, gpt-4.1
Claude	claude-opus, claude-sonnet, claude-haiku (with 4.x versions through Opus 4.8 and Sonnet 4.6)
Gemini	gemini-2.0-flash, gemini-2.5-pro, gemini-3 and gemini-3.1 previews

The Anthropic endpoints (/anthropic/v1/...) use the real Anthropic API model IDs (dash-separated, e.g. claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5, claude-fable-5), including dated-snapshot and -latest aliases. List them via GET /anthropic/v1/models.

Latency Profiles

Profile	TTFT Mean	TBT Mean
gpt-5	600ms	40ms
gpt-5-mini	300ms	20ms
gpt-4	800ms	50ms
gpt-4o	400ms	25ms
o-series	2000ms	30ms
claude-opus	1000ms	60ms
claude-sonnet	500ms	30ms
claude-haiku	200ms	15ms
instant	0ms	0ms
fast	10ms	1ms

Use Cases

Load Testing - Simulate thousands of concurrent LLM requests
CI/CD Pipelines - Fast, deterministic tests for LLM integrations
Local Development - Develop without API keys or costs
Chaos Engineering - Test behavior under failure scenarios
Cost Estimation - Estimate token usage before production

Requirements

Rust 1.83+ (for building from source)
OR Docker

License

MIT License - see LICENSE for details.

Contributing

See CONTRIBUTING.md for contribution guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.claude		.claude
.github		.github
benchmarks		benchmarks
docs		docs
examples		examples
scripts/lib		scripts/lib
specs		specs
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
deny.toml		deny.toml
docker-compose.yml		docker-compose.yml
justfile		justfile
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMSim

Overview

Features

Installation

Demo

Usage

CLI Server

Stats Dashboard

As a Library

Cargo features

API Endpoints

OpenAI API (`/openai/v1/...`)

OpenResponses API (`/openresponses/v1/...`)

Anthropic API (`/anthropic/v1/...`)

LLMSim endpoints

Configuration

TOML Config File

Supported Models

Latency Profiles

Use Cases

Requirements

License

Contributing

About

Uh oh!

Releases 9

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LLMSim

Overview

Features

Installation

Demo

Usage

CLI Server

Stats Dashboard

As a Library

Cargo features

API Endpoints

OpenAI API (/openai/v1/...)

OpenResponses API (/openresponses/v1/...)

Anthropic API (/anthropic/v1/...)

LLMSim endpoints

Configuration

TOML Config File

Supported Models

Latency Profiles

Use Cases

Requirements

License

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 9

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

OpenAI API (`/openai/v1/...`)

OpenResponses API (`/openresponses/v1/...`)

Anthropic API (`/anthropic/v1/...`)

Packages