TokenDog

LLM gateway for vLLM / SGLang local inference engines — thin reverse proxy with load balancing and streaming support.

Packages

Package	Language	Description
router	Rust	HTTP gateway binary + library crate
router python	Python	PyO3 bindings, installable as a wheel

Quick Start

Rust binary

cd crates
cargo run -- \
    --worker-urls http://192.168.1.10:8000 http://192.168.1.20:8000 \
    --port 30000 \
    --log-level info

Python wheel

cd python/router-py
maturin build --release
pip install ../target/wheels/router-*.whl

Use from the command line:

router --port 30000 --worker-urls http://192.168.1.10:8000 http://192.168.1.20:8000

Or as a library in Python:

from router import Router

gateway = Router(
    worker_urls=["http://192.168.1.10:8000", "http://192.168.1.20:8000"],
    port=30000,
)
gateway.serve()  # blocks until Ctrl+C

See examples/ for more (including PD separation).

Prefill-Decode (PD) Separation

The gateway supports disaggregated prefill/decode mode for both vLLM and SGLang:

Runtime	Mode	Execution	KV Transfer
vLLM	Sequential	Prefill (`max_tokens=1`) → decode	`kv_transfer_params` (Nixl)
SGLang	Concurrent	Prefill + decode simultaneously	`bootstrap_host/port/room`

# vLLM PD mode
cargo run -- --pd-mode vllm \
    --prefill-urls http://prefill1:8000 http://prefill2:8000 \
    --decode-urls http://decode1:8000 http://decode2:8000 \
    --policy least-loaded

# SGLang PD mode
cargo run -- --pd-mode sglang \
    --prefill-urls http://prefill1:8000 \
    --decode-urls http://decode1:8000 \
    --policy round-robin

Architecture

Client                  tokendog                   Backend vLLM/SGLang
  │                        │                              │
  │  POST /v1/chat/...     │                              │
  │───────────────────────►│  next_worker() (round-robin) │
  │                        │─────────────────────────────►│
  │                        │                              │
  │                        │  SSE token stream            │
  │  Streamed response     │◄─────────────────────────────│
  │◄───────────────────────│                              │

Transparent: requests forwarded verbatim — no API coupling
Streaming-first: SSE frames forwarded without buffering (bytes_stream → from_stream)
Pluggable LB: LoadBalancer trait — 7 built-in policies including cache-aware routing (session affinity, prefix affinity, load-cache-aware scoring)

Configuration

All options via CLI args or env vars:

Option	Env	Default	Description
`--host`	`HOST`	`0.0.0.0`	Bind address
`--port`	`PORT`	`30000`	Bind port
`--worker-urls`	`WORKER_URLS`	(required)	Backend URLs (space-separated)
`--request-timeout-secs`	`REQUEST_TIMEOUT`	`300`	Worker timeout (seconds)
`--log-level`	`LOG_LEVEL`	`info`	Log filter: error, warn, info, debug
`--policy`	`POLICY`	`least-loaded`	Load-balancing policy (see router README)
`--pd-mode`	`PD_MODE`	(none)	PD separation mode: `vllm` or `sglang`
`--prefill-urls`	—	(none)	Prefill worker URLs for PD mode (space-separated)
`--decode-urls`	—	(none)	Decode worker URLs for PD mode (space-separated)

Development

# Rust workspace
cd crates
cargo build
cargo test
cargo clippy --all-targets

# Python bindings
cd python/router-py
maturin develop
python -c "from router import Router; print(Router(worker_urls=['http://localhost:8000']))"

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.claude		.claude
.github/workflows		.github/workflows
crates/router		crates/router
deployment/inference-sim		deployment/inference-sim
docker		docker
examples		examples
python/router-py		python/router-py
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TokenDog

Packages

Quick Start

Rust binary

Python wheel

Prefill-Decode (PD) Separation

Architecture

Configuration

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TokenDog

Packages

Quick Start

Rust binary

Python wheel

Prefill-Decode (PD) Separation

Architecture

Configuration

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages