#bpe

  1. tokenizers

    today's most used tokenizers, with a focus on performances and versatility

    v0.22.2 723K #tokenize #hugging-face #word-piece #bpe #tokenizer
  2. tiktoken-rs

    encoding and decoding with the tiktoken library in Rust

    v0.9.1 810K #openai #bpe #gpt
  3. bpe

    Fast byte-pair encoding implementation

    v0.2.1 6.9K #tokenize #encoding #algorithm #tokenizer
  4. splintr

    Fast Rust BPE tokenizer with Python bindings

    v0.8.0 #tokenize #llm #tiktoken #bpe #gpt #tokenizer
  5. smoltok-core

    Byte-Pair Encoding tokenizer implementation in Rust

    v0.1.1 #bpe #encoding #text-processing #tokenizer
  6. wordchipper

    HPC Rust LLM Tokenizer Library

    v0.6.2 #bpe #gpt #tokenizer
  7. huggingface/tokenizers-python

    💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

    GitHub 0.22.3-dev.0 #tokenize #bert #word-piece #language-model #training #byte-level #bpe #pad #state-of-the-art #py
  8. bpe-openai

    Prebuilt fast byte-pair encoders for OpenAI

    v0.3.0 6.7K #bpe #algorithm #tokenizer
  9. bbpe

    Binary byte pair encoding (BPE) trainer and CLI compatible with Hugging Face tokenizers

    v0.6.3 #malware #hugging-face #bpe #binary #tokenizer
  10. trustformers-tokenizers

    Tokenizers for TrustformeRS

    v0.1.0-alpha.2 #tokenize #word-piece #bpe #tokenizer #nlp-processing
  11. rustbpe

    A BPE (Byte Pair Encoding) tokenizer written in Rust with Python bindings

    v0.1.0 #python-bindings #tokenize #training #bpe #byte-pair #tiktoken #gpt-4 #regex
  12. quicktok

    Minimal, fast, multi-threaded implementation of the Byte Pair Encoding (BPE) for LLM tokenization

    v0.2.0 #byte-pair #multi-threading #bpe #tokenize #llm
  13. bpe-match

    A pattern matching library for BPE tokenization, intended to replace regex-based approaches

    v0.1.1 #pattern-matching #bpe
  14. bpetok

    CLI for tokenizing text input using Byte Pair Encoding (BPE)

    v0.1.2 #byte-pair #text-tokenization #bpe #text-tokenizer
  15. kitoken

    Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization

    v0.10.1 2.5K #tokenize #word-piece #unigram #bpe #tokenizer
  16. unitoken

    Fast BPE tokenizer/trainer with a Rust core and Python bindings

    v0.1.1 #tokenize #bpe #nlp #tokenizer
  17. bpe-tokenizer

    A BPE Tokenizer library

    v0.1.4 150 #byte-pair #tokenize #bpe #encoding #byte
  18. tiktokenx

    A high-performance Rust implementation of OpenAI's tiktoken library

    v0.1.0 #openai #bpe #gpt
  19. tokenizers-enfer

    today's most used tokenizers, with a focus on performances and versatility

    v0.21.1 #tokenize #hugging-face #word-piece #bpe #tokenizer
  20. another-tiktoken-rs

    encoding and decoding with the tiktoken library in Rust

    v0.1.2 #openai #bpe #gpt
  21. Try searching with DuckDuckGo.

  22. smoltoken

    A fast library for Byte Pair Encoding (BPE) tokenization

    v0.2.0 360 #artificial-intelligence #bpe #tokenizer
  23. rust_transformers

    High performance tokenizers for Rust

    v0.2.0 #tokenize #transformer-models #bert #byte-pair #py #bpe #word-piece #integration-tests #rust-nightly
  24. tokeneer

    tokenizer crate

    v0.1.0 340 #tokenize #bpe #tokenizer
  25. liendl_tokenizer

    BPE tokenizer for Rust

    v0.1.0 #tokenize #training #vocabulary #character #model #tokenize-text #bpe #csv #different-versions #convert-text
  26. gpt_tokenizer

    Rust BPE Encoder Decoder (Tokenizer) for GPT-2 / GPT-3

    v0.1.0 #chatgpt #gpt-3 #bpe #openai #tokenizer
  27. gpt-encoder

    Rust BPE Encoder Decoder for GPT-2 / GPT-3

    v0.1.1 #bpe #decoder #gpt
  28. tiktoken-rust

    a fast BPE tokeniser for use with OpenAI's models

    v0.2.1 #openai #model #tokeniser #bpe #python-interface
  29. tokin

    Experimental fast tokenizer

    v0.1.0 #word-piece #tokenize #nlp #tiktoken #bpe #tokenizer
  30. fastok

    BPE in Rust with bindings to Python using PyO3

    v0.0.1 #python-bindings #bpe #pyo3