2 releases
Uses new Rust 2024
| 0.1.2 | Dec 20, 2025 |
|---|---|
| 0.1.0 | Dec 20, 2025 |
#988 in Text processing
41KB
896 lines
tokstream
中文 | English
A token streaming simulator powered by Hugging Face tokenizers. It downloads a tokenizer from HF Hub and generates tokens at a target rate, with live stats for target vs actual throughput.
Highlights
- Rust CLI with high‑precision pacing (sleep + spin)
- Web demo (WASM) and npx executable
- Random English / Chinese generation and text replay
- Configurable filtering strategy
- Target vs actual tokens/sec stats
- Workspace layout with reusable core
Project Layout
.
├── crates
│ ├── tokstream-core # tokenizer engine
│ ├── tokstream-cli # Rust CLI
│ └── tokstream-wasm # wasm-bindgen bindings
├── npm # npx CLI + web demo
├── bin # npm bin entry
├── Cargo.toml # workspace
├── justfile
├── package.json
├── README.md
└── README_ZH.md
Rust CLI
Quick Start
cargo run -p tokstream-cli -- --model gpt2 --mode english --rate 8
cargo run -p tokstream-cli -- --model gpt2 --mode chinese --rate 8
cargo run -p tokstream-cli -- --model gpt2 --mode text --text "Hello" --repeat 3
Install from crates.io
cargo install tokstream-cli
# or
cargo binstall tokstream-cli
Notes:
- The binary name is
tokstreamafter installation. cargo binstallwill compile from source unless you provide prebuilt release assets and setrepositoryin the crate metadata.
Model & Auth
--model <id>HF Hub model id (default:gpt2)--revision <rev>HF revision (default:main)--hf-token <token>access token for private models
Modes
--mode <english|chinese|text>--text <text>text mode input--text-file <path>text mode input from file--loop-textloop text forever--repeat <n>repeat text n times
Rate Control
--rate <n>target tokens/sec--rate-min <n>min rate for random range--rate-max <n>max rate for random range--rate-sample-interval <n>sampling interval for rate range (seconds, default: 1)--batch <n>tokens emitted per batch--max-tokens <n>stop after n tokens
Pacing & Throughput
--pace <strict|sleep>pacing mode (default:strict)--spin-threshold-us <n>busy‑spin threshold forstrictmode--no-throttledisable pacing (measure max throughput)--no-outputdisable stdout output (closer to tokenizer upper bound)
Stats
--no-statsdisable stats output (stderr)--stats-interval <n>stats interval seconds (default: 1)
Random Output Filters
--no-skip-specialdo not skip special tokens--allow-digits--allow-punct--allow-space--allow-non-ascii--no-require-letter--no-require-cjk
Seed
--seed <n>random seed
Examples
# Random rate range sampled every 2 seconds
cargo run -p tokstream-cli -- --model gpt2 --mode english --rate-min 6 --rate-max 12 --rate-sample-interval 2
# Text mode from file, repeat 5 times
cargo run -p tokstream-cli -- --model gpt2 --mode text --text-file ./sample.txt --repeat 5
# Infinite loop text
cargo run -p tokstream-cli -- --model gpt2 --mode text --text "Hello" --loop-text
# Throughput upper bound (no throttle, no output)
cargo run -p tokstream-cli -- --model gpt2 --mode english --no-throttle --no-output
npx CLI
Quick Start
npx tokstream@latest --model gpt2 --mode english --rate 8
npx tokstream@latest --web --port 8787
For local development in this repo:
npx . --model gpt2 --mode english --rate 8
Supported Flags (npx)
--model <id>--revision <rev>--hf-token <token>(or envHF_TOKEN/HUGGINGFACE_HUB_TOKEN)--mode <english|chinese|text>--text <text>--loop(loop text forever)--repeat <n>--rate <n>--rate-min <n>/--rate-max <n>--rate-sample-interval <n>--seed <n>--max-tokens <n>--no-skip-special--allow-digits/--allow-punct/--allow-space/--allow-non-ascii--no-require-letter/--no-require-cjk--no-stats/--stats-interval <n>--no-throttle/--no-output--web --port <n>
Notes:
--loop-text,--text-file,--batch,--pace, and--spin-threshold-usare Rust‑CLI only.
Web Demo
npx tokstream@latest --web --port 8787
# open http://localhost:8787
While running, you can drag the rate slider or enable random rate range. The page shows target and actual throughput. The output pane is fixed‑height and scrolls independently.
Accuracy Notes
- Rust CLI
strictuses sleep + short spin for high precision. - Web / npx are best‑effort due to event loop and I/O limits.
- If actual throughput doesn’t change while raising target rates, you likely hit tokenizer limits.
- For maximum throughput testing, use the Rust CLI with
--no-output --no-throttle.
Build WASM (optional refresh)
npm run build:wasm
WASM artifacts are committed and included in the npm package.
just Recipes
just
Tests
cargo clippy --workspace
cargo nextest run --workspace
License
MIT
Dependencies
~1–17MB
~154K SLoC