tomaarsen

Tom Aarsen tomaarsen

Sentence Transformers, SetFit & NLTK maintainer, ML Engineer & Fellow @huggingface

969 followers · 0 following

Achievements

x4 x3 x3

Achievements

x4 x3 x3

Highlights

1 security advisory credit

Organizations

Starred repositories

Rikka-Botan / Stable-Static-Embedding-Models

SSE (Stable Static Embedding): Unlocking the Potential of Static Embeddings, A Dynamic Tanh Normalization Approach without Speed Penalty

Python 3 Updated Jun 11, 2026

TusKANNy / awesome-multivector-retrieval

An extensive and commented list of resources on Late-Interaction Multivector Retrieval.

TeX 60 7 Updated Jun 17, 2026

agentic-in / elephant-agent

Personal-Model First Self Evolving AI Agent 🐘

Python 565 61 Updated Jun 1, 2026

justram / pi-serini

A Minimalistic Search Agent

TypeScript 75 5 Updated May 12, 2026

julien-c / hf-speedtest

How Fast can you pull from Hugging Face?

Python 24 Updated May 12, 2026

huggingface / ml-intern

🤗 ml-intern: an open-source ML engineer that reads papers, trains models, and ships ML models

Python 10,489 1,114 Updated Jun 18, 2026

teraflop-ai / teraflopai-python

Official Python library for the TeraflopAI API

Python 2 Updated Mar 9, 2026

x-tabdeveloping / turftopic

Robust and fast topic models with sentence-transformers.

Python 114 9 Updated Jun 11, 2026

huggingface / skills

Give your agents the power of the Hugging Face ecosystem

Python 10,693 703 Updated Jun 18, 2026

mishig25 / hf-autoresearch

Forked from karpathy/autoresearch

AI agents running research on Hugging Face infra

Python 175 17 Updated Mar 29, 2026

huggingface / hf-mount

Mount Hugging Face Buckets and repos as local filesystems. No download, no copy, no waiting.

Rust 749 54 Updated Jun 18, 2026

AlexsJones / llmfit

Hundreds of models & providers. One command to find what runs on your hardware.

Rust 28,265 1,725 Updated Jun 17, 2026

qdrant / sparse-finetune

Fine-tune SPLADE sparse embedding models for your product catalog. CLI, web dashboard, and Python API.

Python 11 Updated Jun 10, 2026

searchivarius / py_mtasklite

A missing piece of the Python multitask (both threads and processes) API: An extension that supports stateful worker pools & size-aware iterators.

Python 29 2 Updated Mar 8, 2026

rahulseetharaman / cross-encoder-scaling

Python 1 1 Updated Apr 5, 2026

tanishqkumar / ssd

A lightweight inference engine supporting speculative speculative decoding (SSD).

Python 956 72 Updated May 10, 2026

databricks / flashoptim

Python 255 10 Updated Apr 17, 2026

huggingface / kernels

Build compute kernels and load them from the Hub.

Python 697 105 Updated Jun 18, 2026

TRUMANCFY / Revela

Implementation for Revela: Dense Retriever Learning via Language Modeling - ICLR 2026 Oral

Python 20 3 Updated Mar 26, 2026

zilliztech / VectorDBBench

Benchmark for vector databases.

Python 1,127 395 Updated Jun 17, 2026

codefuse-ai / CodeFuse-Embeddings

Text and code embeddings research from CodeFuse: C2LLM, D2LLM, E2LLM, F2LLM, ML-Embed

Python 564 74 Updated May 22, 2026

huggingface / kernel-builder

👷 Build compute kernels

Nix 214 36 Updated Apr 6, 2026

aiexplorations / vajra_bm25

Fast BM25 search engine with category theory abstractions

Python 12 Updated Feb 22, 2026

xhluca / bm25-benchmarks

Python 24 10 Updated Apr 29, 2026

datologyai / luxical

Python 81 7 Updated Dec 12, 2025

UlrickBL / multimodal_reranker

Mutlimodal reranker training and benchmarks

Python 4 Updated Dec 1, 2025

hseb-benchmark / hseb

HSEB: Hybrid Search Engine Benchmark

Python 21 2 Updated Oct 5, 2025

stephantul / pynife

Nearly Inference Free Embeddings: make your RAG queries 500x faster

Python 78 4 Updated Apr 27, 2026

pydantic / pydantic-ai

AI Agent Framework, the Pydantic way

Python 17,844 2,230 Updated Jun 18, 2026

Pringled / pyversity

Fast Diversification for Search & Retrieval

Python 493 27 Updated May 24, 2026