Skip to content
View stephantul's full-sized avatar
🌳
Busy planting trees
🌳
Busy planting trees

Organizations

@clips

Block or report stephantul

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
Python 3 1 Updated Mar 29, 2026

🍡 50x faster tokenization for every HuggingFace model

Rust 30 1 Updated Mar 30, 2026

Yet another implementation of Rust's Result type, with type annotations and async support

Python 22 2 Updated Mar 27, 2026

A fast, helpful, and open-source document parser

TypeScript 4,280 287 Updated Apr 13, 2026

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Python 339 34 Updated Apr 1, 2026

Check what an AI agent can access before you run it

Go 26 2 Updated Mar 8, 2026

OpenAPI 3 and 3.1 schema generator and validator for Hono, itty-router and more!

TypeScript 729 68 Updated Mar 30, 2026

Claude skills I'm experimenting with. Please review carefully before use.

Python 204 16 Updated Feb 19, 2026

Unified Schema-Based Information Extraction

Python 1,373 128 Updated Apr 8, 2026

bb25 is a fast, self-contained BM25 + Bayesian calibration implementation with a minimal Python API.

Rust 142 22 Updated Mar 17, 2026

Hybrid search engine, combining best features of text and semantic search worlds

Scala 608 15 Updated Jan 6, 2026

📐 Fast token estimation at 96% accuracy of a full tokenizer in a 2kB bundle

TypeScript 140 6 Updated Mar 31, 2026

Next-generation Punkt sentence boundary detection with zero dependencies

Python 30 1 Updated Nov 18, 2025

A lightweight, local-first, and 🆓 experiment tracking library from Hugging Face 🤗

Python 1,392 106 Updated Apr 15, 2026

inline-snapshot boosts efficiency when writing tests by generating code with the expected values and simplifies snapshot tests with pytest.

Python 722 25 Updated Apr 14, 2026

Fast Diversification for Search & Retrieval

Python 486 27 Updated Mar 29, 2026

PickyBPE as Python package.

Python 2 Updated Apr 13, 2026

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript

Go 622 21 Updated Jul 2, 2024

Nearly Inference Free Embeddings: make your RAG queries 500x faster

Python 75 4 Updated Feb 20, 2026

Filter sensitive information from free text before sending it to external services or APIs, such as chatbots and LLMs.

Ruby 376 9 Updated Apr 15, 2026

High-performance FFI wrapper for Hugging Face tokenizers in Go

Makefile 4 Updated Sep 24, 2025

A score-based implementation of WordPiece tokenization training, compatible with HuggingFace tokenizers.

Python 5 Updated Jan 14, 2025

Efficient optimizers

Python 311 27 Updated Apr 4, 2026

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

Rust 11,083 537 Updated Apr 15, 2026

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript 24,953 2,536 Updated Apr 15, 2026

benchmarks for LLM tokenizers

Python 18 1 Updated Mar 25, 2026

Simple-to-use scoring function for arbitrarily tokenized texts.

Python 48 6 Updated Feb 19, 2025
Jupyter Notebook 6 3 Updated Sep 12, 2024

DSPy: The framework for programming—not prompting—language models

Python 33,715 2,793 Updated Apr 14, 2026
Next