NohTow

Follow

🚤

shipping

Antoine Chaffin NohTow

🚤

shipping

Follow

CS engineer & ML PhD @lightonai

78 followers · 14 following

Achievements

Achievements

Stars

embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark

Python 2,951 499 Updated Nov 4, 2025

stephantul / skeletoken

Datamodels for hugging face tokenizers

Python 85 4 Updated Nov 5, 2025

yichuan-w / LEANN

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 3,499 350 Updated Nov 3, 2025

MinishLab / model2vec

Fast State-of-the-Art Static Embeddings

Python 1,890 106 Updated Oct 11, 2025

JHU-CLSP / ettin-encoder-vs-decoder

State-of-the-art paired encoder and decoder models (17M-1B params)

Python 53 3 Updated Aug 6, 2025

raphaelsty / knowledge

Open-source personal bookmarks search engine

Python 692 37 Updated Nov 5, 2025

lightonai / pylate-rs

PyLate efficient inference engine

Rust 66 7 Updated Sep 11, 2025

mixedbread-ai / maxsim-cpu

Python 44 3 Updated Jul 10, 2025

huggingface / large-scale-image-deduplication

Python 170 13 Updated Jul 18, 2025

orionw / promptriever

The first dense retrieval model that can be prompted like an LM

Python 89 5 Updated May 8, 2025

fresh-stack / freshstack

A holistic framework to construct realistic evaluation datasets

Python 28 3 Updated Jun 16, 2025

lightonai / fast-plaid

High-Performance Engine for Multi-Vector Search

Rust 180 11 Updated Oct 29, 2025

AnswerDotAI / fastkmeans

Python 85 6 Updated Jul 4, 2025

NovaSearch-Team / RAG-Retrieval

Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.

Python 1,051 83 Updated Jul 5, 2025

justinchiu / openlogprobs

Extract full next-token probabilities via language model APIs

Python 247 13 Updated Feb 23, 2024

AnswerDotAI / ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

Python 1,559 129 Updated Jun 30, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,128 2,426 Updated Nov 5, 2025

mixedbread-ai / mxbai-rerank

Crispy reranking models by Mixedbread

Python 38 6 Updated Sep 17, 2025

allenai / ai2-scholarqa-lib

Repo housing the open sourced code for the ai2 scholar qa app and also the corresponding library

Python 232 43 Updated Nov 5, 2025

jlscheerer / xtr-warp

XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.

Python 169 13 Updated May 3, 2025

nomic-ai / contrastors

Train Models Contrastively in Pytorch

Python 754 61 Updated Mar 26, 2025

eva-giboulot / WaterMax

A plug-&-play watermark for LLMs with no impact on text quality.

Python 8 Updated Sep 30, 2024

webis-de / lightning-ir

One-stop shop for running and fine-tuning transformer-based language models for retrieval

Python 59 17 Updated Nov 5, 2025

facebookresearch / schedule_free

Schedule-Free Optimization in PyTorch

Python 2,229 68 Updated May 21, 2025

stanford-oval / storm

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

Python 27,576 2,500 Updated Sep 30, 2025

lightonai / ducksearch

Efficient BM25 with DuckDB 🦆

Python 58 2 Updated Dec 20, 2024

mixedbread-ai / batched

The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching of inference workloads.

Python 151 11 Updated Jul 14, 2025

AnswerDotAI / byaldi

Use late-interaction multi-modal models such as ColPali in just a few lines of code.

Python 828 92 Updated Jan 28, 2025

bigscience-workshop / promptsource

Toolkit for creating, sharing and using natural language prompts.

Python 2,960 377 Updated Oct 23, 2023

AnswerDotAI / RAGatouille

Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.

Python 3,749 260 Updated May 17, 2025