Starred repositories
🍡 50x faster tokenization for every HuggingFace model
Yet another implementation of Rust's Result type, with type annotations and async support
A fast, helpful, and open-source document parser
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning
Check what an AI agent can access before you run it
OpenAPI 3 and 3.1 schema generator and validator for Hono, itty-router and more!
Claude skills I'm experimenting with. Please review carefully before use.
Unified Schema-Based Information Extraction
bb25 is a fast, self-contained BM25 + Bayesian calibration implementation with a minimal Python API.
Hybrid search engine, combining best features of text and semantic search worlds
📐 Fast token estimation at 96% accuracy of a full tokenizer in a 2kB bundle
Next-generation Punkt sentence boundary detection with zero dependencies
A lightweight, local-first, and 🆓 experiment tracking library from Hugging Face 🤗
inline-snapshot boosts efficiency when writing tests by generating code with the expected values and simplifies snapshot tests with pytest.
bauwenst / PickyBPE
Forked from pchizhov/picky_bpePickyBPE as Python package.
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Nearly Inference Free Embeddings: make your RAG queries 500x faster
Filter sensitive information from free text before sending it to external services or APIs, such as chatbots and LLMs.
High-performance FFI wrapper for Hugging Face tokenizers in Go
A score-based implementation of WordPiece tokenization training, compatible with HuggingFace tokenizers.
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Simple-to-use scoring function for arbitrarily tokenized texts.
DSPy: The framework for programming—not prompting—language models