Starred repositories
A vector search SQLite extension that runs anywhere!
Python SDK for ProgramAsWeights β compile natural language specs into neural programs that run locally
π‘ 30x faster tokenization for every HuggingFace model
Yet another implementation of Rust's Result type, with type annotations and async support
A fast, helpful, and open-source document parser
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning
Check what an AI agent can access before you run it
OpenAPI 3 and 3.1 schema generator and validator for Hono, itty-router and more!
Claude skills I'm experimenting with. Please review carefully before use.
Unified Schema-Based Information Extraction
bb25 is a fast, self-contained BM25 + Bayesian calibration implementation with a minimal Python API.
Hybrid search engine, combining best features of text and semantic search worlds
π Fast token estimation at 96% accuracy of a full tokenizer in a 2kB bundle
Next-generation Punkt sentence boundary detection with zero dependencies
A lightweight, local-first, and π experiment tracking library from Hugging Face π€
inline-snapshot boosts efficiency when writing tests by generating code with the expected values and simplifies snapshot tests with pytest.
bauwenst / PickyBPE
Forked from pchizhov/picky_bpePickyBPE as Python package.
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Nearly Inference Free Embeddings: make your RAG queries 500x faster
Filter sensitive information from free text before sending it to external services or APIs, such as chatbots and LLMs.
High-performance FFI wrapper for Hugging Face tokenizers in Go
A score-based implementation of WordPiece tokenization training, compatible with HuggingFace tokenizers.
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
πͺ’ Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. πYC W23