Skip to content
View stephantul's full-sized avatar
🌳
Busy planting trees
🌳
Busy planting trees

Organizations

@clips

Block or report stephantul

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A vector search SQLite extension that runs anywhere!

C 7,508 306 Updated Apr 8, 2026

Python SDK for ProgramAsWeights β€” compile natural language specs into neural programs that run locally

Python 59 5 Updated Apr 18, 2026

Fast and Accurate Code Search for Agents

Python 223 21 Updated Apr 29, 2026
Python 3 1 Updated Mar 29, 2026

🍑 30x faster tokenization for every HuggingFace model

Rust 31 2 Updated Apr 22, 2026

Yet another implementation of Rust's Result type, with type annotations and async support

Python 22 2 Updated Mar 27, 2026

A fast, helpful, and open-source document parser

TypeScript 4,936 321 Updated Apr 28, 2026

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Python 340 34 Updated Apr 24, 2026

Check what an AI agent can access before you run it

Go 26 2 Updated Mar 8, 2026

OpenAPI 3 and 3.1 schema generator and validator for Hono, itty-router and more!

TypeScript 736 68 Updated Apr 23, 2026

Claude skills I'm experimenting with. Please review carefully before use.

Python 233 21 Updated Feb 19, 2026

Unified Schema-Based Information Extraction

Python 1,450 137 Updated Apr 21, 2026

bb25 is a fast, self-contained BM25 + Bayesian calibration implementation with a minimal Python API.

Rust 144 22 Updated Mar 17, 2026

Hybrid search engine, combining best features of text and semantic search worlds

Scala 610 15 Updated Jan 6, 2026

πŸ“ Fast token estimation at 96% accuracy of a full tokenizer in a 2kB bundle

TypeScript 146 6 Updated Apr 16, 2026

Next-generation Punkt sentence boundary detection with zero dependencies

Python 30 1 Updated Nov 18, 2025

A lightweight, local-first, and πŸ†“ experiment tracking library from Hugging Face πŸ€—

Python 1,427 110 Updated Apr 28, 2026

inline-snapshot boosts efficiency when writing tests by generating code with the expected values and simplifies snapshot tests with pytest.

Python 724 25 Updated Apr 24, 2026

Fast Diversification for Search & Retrieval

Python 487 27 Updated Mar 29, 2026

PickyBPE as Python package.

Python 2 Updated Apr 13, 2026

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript

Go 623 21 Updated Jul 2, 2024

Nearly Inference Free Embeddings: make your RAG queries 500x faster

Python 77 4 Updated Apr 27, 2026

Filter sensitive information from free text before sending it to external services or APIs, such as chatbots and LLMs.

Ruby 388 10 Updated Apr 22, 2026

High-performance FFI wrapper for Hugging Face tokenizers in Go

Makefile 5 Updated Sep 24, 2025

A score-based implementation of WordPiece tokenization training, compatible with HuggingFace tokenizers.

Python 5 Updated Jan 14, 2025

Efficient optimizers

Python 314 27 Updated Apr 27, 2026

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

Rust 11,119 540 Updated Apr 29, 2026

πŸͺ’ Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript 26,316 2,661 Updated Apr 29, 2026

benchmarks for LLM tokenizers

Python 18 1 Updated Mar 25, 2026
Next