MTEB: Massive Text Embedding Benchmark
-
Updated
Dec 14, 2025 - Python
MTEB: Massive Text Embedding Benchmark
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
User Profile-Based Long-Term Memory for AI Chatbot Applications.
Fast, Accurate, Lightweight Python library to make State of the Art Embedding
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Data for the MTEB leaderboard
Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
Deep Recommenders
[Pytorch] Generative retrieval model using semantic IDs from "Recommender Systems with Generative Retrieval"
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
Use ArXiv ChatGuru to talk to research papers. This app uses LangChain, OpenAI, Streamlit, and Redis as a vector database/semantic cache.
Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.
Parsing-free RAG supported by VLMs
The official implementation of ICLR 2020, "Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering".
Add a description, image, and links to the retrieval topic page so that developers can more easily learn about it.
To associate your repository with the retrieval topic, visit your repo's landing page and select "manage topics."