Stars
MTEB: Massive Text Embedding Benchmark
RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
State-of-the-art paired encoder and decoder models (17M-1B params)
Open-source personal bookmarks search engine
The first dense retrieval model that can be prompted like an LM
A holistic framework to construct realistic evaluation datasets
High-Performance Engine for Multi-Vector Search
Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.
Extract full next-token probabilities via language model APIs
Bringing BERT into modernity via both architecture changes and scaling
verl: Volcano Engine Reinforcement Learning for LLMs
Repo housing the open sourced code for the ai2 scholar qa app and also the corresponding library
XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.
A plug-&-play watermark for LLMs with no impact on text quality.
One-stop shop for running and fine-tuning transformer-based language models for retrieval
Schedule-Free Optimization in PyTorch
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching of inference workloads.
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
Toolkit for creating, sharing and using natural language prompts.
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.