stephantul

Follow

🌳

Busy planting trees

Stephan Tulkens stephantul

🌳

Busy planting trees

Follow

154 followers · 39 following

Achievements

Achievements

Organizations

Starred repositories

asg017 / sqlite-vec

A vector search SQLite extension that runs anywhere!

C 7,508 306 Updated Apr 8, 2026

programasweights / programasweights-python

Python SDK for ProgramAsWeights — compile natural language specs into neural programs that run locally

Python 59 5 Updated Apr 18, 2026

MinishLab / semble

Fast and Accurate Code Search for Agents

Python 223 21 Updated Apr 29, 2026

Ahmetcanyvz / UNILID

Python 3 1 Updated Mar 29, 2026

chonkie-inc / tokie

🍡 30x faster tokenization for every HuggingFace model

Rust 31 2 Updated Apr 22, 2026

AstraBert / better-result-py

Yet another implementation of Rust's Result type, with type annotations and async support

Python 22 2 Updated Mar 27, 2026

run-llama / liteparse

A fast, helpful, and open-source document parser

TypeScript 4,936 321 Updated Apr 28, 2026

GRAAL-Research / deepparse

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Python 340 34 Updated Apr 24, 2026

Pringled / agentcheck

Check what an AI agent can access before you run it

Go 26 2 Updated Mar 8, 2026

cloudflare / chanfana

OpenAPI 3 and 3.1 schema generator and validator for Hono, itty-router and more!

TypeScript 736 68 Updated Apr 23, 2026

honnibal / claude-skills

Claude skills I'm experimenting with. Please review carefully before use.

Python 233 21 Updated Feb 19, 2026

fastino-ai / GLiNER2

Unified Schema-Based Information Extraction

Python 1,450 137 Updated Apr 21, 2026

instructkr / bb25

bb25 is a fast, self-contained BM25 + Bayesian calibration implementation with a minimal Python API.

Rust 144 22 Updated Mar 17, 2026

nixiesearch / nixiesearch

Hybrid search engine, combining best features of text and semantic search worlds

Scala 610 15 Updated Jan 6, 2026

johannschopplich / tokenx

📐 Fast token estimation at 96% accuracy of a full tokenizer in a 2kB bundle

TypeScript 146 6 Updated Apr 16, 2026

alea-institute / nupunkt

Next-generation Punkt sentence boundary detection with zero dependencies

Python 30 1 Updated Nov 18, 2025

gradio-app / trackio

A lightweight, local-first, and 🆓 experiment tracking library from Hugging Face 🤗

Python 1,427 110 Updated Apr 28, 2026

15r10nk / inline-snapshot

inline-snapshot boosts efficiency when writing tests by generating code with the expected values and simplifies snapshot tests with pytest.

Python 724 25 Updated Apr 24, 2026

Pringled / pyversity

Fast Diversification for Search & Retrieval

Python 487 27 Updated Mar 29, 2026

bauwenst / PickyBPE

Forked from pchizhov/picky_bpe

PickyBPE as Python package.

Python 2 Updated Apr 13, 2026

alasdairforsythe / tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript

Go 623 21 Updated Jul 2, 2024

stephantul / pynife

Nearly Inference Free Embeddings: make your RAG queries 500x faster

Python 77 4 Updated Apr 27, 2026

thoughtbot / top_secret

Filter sensitive information from free text before sending it to external services or APIs, such as chatbots and LLMs.

Ruby 388 10 Updated Apr 22, 2026

takara-ai / go-tokenizers

High-performance FFI wrapper for Hugging Face tokenizers in Go

Makefile 5 Updated Sep 24, 2025

kacperlukawski / real-wordpiece

A score-based implementation of WordPiece tokenization training, compatible with HuggingFace tokenizers.

Python 5 Updated Jan 14, 2025

HomebrewML / HeavyBall

Efficient optimizers

Python 314 27 Updated Apr 27, 2026

quickwit-oss / quickwit

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

Rust 11,119 540 Updated Apr 29, 2026

langfuse / langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript 26,316 2,661 Updated Apr 29, 2026

bgub / tokka-bench

benchmarks for LLM tokenizers

Python 18 1 Updated Mar 25, 2026

cimeister / tokenizer-analysis-suite

Python 45 10 Updated Feb 11, 2026

Starred topics

python3

Atom