Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
-
Updated
Dec 15, 2025 - Python
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
📚 Curate arXiv papers effectively using a modern AI approach with Retrieval-Augmented Generation to enhance your learning and research experience.
🔒 Enable secure federated autoregressive inference for multiple parties using a shared model while keeping private inputs confidential.
🔍 Search for similar academic papers using semantic search. Utilize local models or OpenAI API for high-quality results.
🧠 Boost research efficiency with Deep Research AI, an advanced multi-agent system that leverages cutting-edge reasoning techniques for smarter insights.
🤖 Enhance your research efficiency with an AI-powered assistant that analyzes documents and provides insights through a smart multi-agent system.
🗄️ Streamline data analysis with ConsciousDB, a vector database that integrates directly with your models for enhanced performance and ease of use.
🔍 Access multiple knowledge sources with this Streamlit chatbot powered by Groq LLM and LangChain for accurate and quick information retrieval.
🤖 Build and interact with Claude Agent using this Python SDK for seamless integration and efficient asynchronous querying.
🔍 Evaluate web search APIs with our framework, testing accuracy and relevance across multiple AI agents and benchmarks for better information retrieval.
🛠️ Build powerful search systems effortlessly with Haystack, a framework for developing end-to-end question answering and search applications.
🎯 Optimize retrieval with TriStage-RAG, a 3-stage pipeline that enhances document discovery while overcoming the limits of single-vector embeddings.
📄 Create a local, free Retrieval-Augmented Q&A system to easily extract answers from your personal documents in minutes.
🐙 AI Agent Pipeline routes queries by intent to docs, weather, or chat, with LangGraph, ChromaDB, and LangSmith for modular, observable workflows across CLI and UI.
Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.
Fırat University Assistant: An offline Turkish question-answering and document search system built on local PDFs using FastAPI, pdfplumber, and BM25.
MTEB: Massive Text Embedding Benchmark
Data for the MTEB leaderboard
Add a description, image, and links to the information-retrieval topic page so that developers can more easily learn about it.
To associate your repository with the information-retrieval topic, visit your repo's landing page and select "manage topics."