-
tokenizers
today's most used tokenizers, with a focus on performances and versatility
-
tiktoken-rs
encoding and decoding with the tiktoken library in Rust
-
bpe
Fast byte-pair encoding implementation
-
splintr
Fast Rust BPE tokenizer with Python bindings
-
smoltok-core
Byte-Pair Encoding tokenizer implementation in Rust
-
wordchipper
HPC Rust LLM Tokenizer Library
-
huggingface/tokenizers-python
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
-
bpe-openai
Prebuilt fast byte-pair encoders for OpenAI
-
bbpe
Binary byte pair encoding (BPE) trainer and CLI compatible with Hugging Face tokenizers
-
trustformers-tokenizers
Tokenizers for TrustformeRS
-
rustbpe
A BPE (Byte Pair Encoding) tokenizer written in Rust with Python bindings
-
quicktok
Minimal, fast, multi-threaded implementation of the Byte Pair Encoding (BPE) for LLM tokenization
-
bpe-match
A pattern matching library for BPE tokenization, intended to replace regex-based approaches
-
bpetok
CLI for tokenizing text input using Byte Pair Encoding (BPE)
-
kitoken
Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization
-
unitoken
Fast BPE tokenizer/trainer with a Rust core and Python bindings
-
bpe-tokenizer
A BPE Tokenizer library
-
tiktokenx
A high-performance Rust implementation of OpenAI's tiktoken library
-
tokenizers-enfer
today's most used tokenizers, with a focus on performances and versatility
-
another-tiktoken-rs
encoding and decoding with the tiktoken library in Rust
-
smoltoken
A fast library for Byte Pair Encoding (BPE) tokenization
-
rust_transformers
High performance tokenizers for Rust
-
tokeneer
tokenizer crate
-
liendl_tokenizer
BPE tokenizer for Rust
-
gpt_tokenizer
Rust BPE Encoder Decoder (Tokenizer) for GPT-2 / GPT-3
-
gpt-encoder
Rust BPE Encoder Decoder for GPT-2 / GPT-3
-
tiktoken-rust
a fast BPE tokeniser for use with OpenAI's models
-
tokin
Experimental fast tokenizer
-
fastok
BPE in Rust with bindings to Python using PyO3
Try searching with DuckDuckGo.