xbid-ai-tokkit

C++ BPE counter compatible with .tiktoken (OpenAI) encodings.
Quasi-parity (no templates)
60% faster than OpenAI's official tiktoken (JS/WASM)
No external dependencies (standard C++20 toolchain)

Optional support for Google's SentencePiece binary models (full parity, only a thin wrapper).

This library is used in the xbid.ai project, where we need a low-overhead, fast, BPE counter that is accurate enough for billing estimates. xbid-ai-tokkit uses greedy longest-match search without materializing token IDs and skips templates, trading exact parity (<1.5% error) for speed and simplicity.

We may extend the project with support for additional LLM tokenizers and token utilities.

Accuracy Benchmarks

Evaluated on a corpus of 2,628 GPT-4o requests (16.08 MB) collected directly from xbid.ai live calls, with reference values from the OpenAI API usage counters.

Mean size: 1877 tokens/request

Bias	MAE	MAPE (%)	Stdev error	R²
-11.93	11.93	1.48	2.04	1.000

By design, xbid-ai-tokkit achieves near-equivalence: ~12 tokens off per request (<1.5% miss rate) with narrow, predictable error distribution.

Speed Benchmarks

On the same dataset (2,628 GPT-4o requests, 16.08 MB) xbid-ai-tokkit processed data at 10.87 MB/s, ~60% faster than OpenAI's official tiktoken JS/WASM (6.65 MB/s).

Build

Default (OpenAI BPE only, no external deps):

make clean
make

For building with SentencePiece support, install the library from google/sentencepiece, then:

make clean
make SENTENCEPIECE=1

Usage

See xbid-ai project for an example client implementation of server IPC mode.

# inline
./tokkit --provider openai --model /path/o200k_base.tiktoken --text "hello"

# file
./tokkit --provider openai --model /path/o200k_base.tiktoken --file prompt.txt

# stdin
echo -n "hello" | ./tokkit --provider openai --model /path/o200k_base.tiktoken --stdin

# server mode (binary IPC)
./tokkit --provider openai --model /path/o200k_base.tiktoken --serve
# protocol: client [u32 LE length][bytes] → server "<count>\n"

Disclaimer

This software is experimental and provided as-is, without warranties or guarantees of any kind. Use at your own risk.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
incl		incl
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
tokkit.cpp		tokkit.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

xbid-ai-tokkit

Accuracy Benchmarks

Speed Benchmarks

Build

Usage

Disclaimer

License

About

Uh oh!

Releases

Packages

Languages

License

xbid-ai/xbid-ai-tokkit

Folders and files

Latest commit

History

Repository files navigation

xbid-ai-tokkit

Accuracy Benchmarks

Speed Benchmarks

Build

Usage

Disclaimer

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages