A complete RAG system that achieves 72.89% Recall@10 on MultiHop-RAG, surpassing RAPTOR's ~70%. This repository includes:
- π§ Full RAG Implementation (
ultimate_rag/) - RAPTOR + Graph + HyDE + BM25 + Neural Reranking - π Benchmark Suite (
adapters/,scripts/) - Evaluation harness for MultiHop-RAG, CRAG - π Documentation (
docs/) - Blog post, technical report, architecture
# Clone the repo
git clone https://github.com/incidentfox/OpenRag.git
cd rag_benchmarking
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install requirements
pip install -r requirements.txtexport OPENAI_API_KEY="sk-..."
export COHERE_API_KEY="..." # Optional but recommended for best performancecd ultimate_rag
python -m api.serverServer runs at http://localhost:8000. Check health: curl http://localhost:8000/health
# MultiHop-RAG (2556 queries)
python scripts/run_multihop_eval.py --queries 100 # Quick test
# Full benchmark
python scripts/run_multihop_eval.py| Benchmark | Queries Tested | Our Result | SOTA | Notes |
|---|---|---|---|---|
| MultiHop-RAG | 2,556 (full) | 72.89% | ~70% | Beats RAPTOR baseline |
| SQuAD | 200+ (ongoing) | 99.0% | ~85-90% | Full benchmark running on EC2 |
| CRAG | 10 (sample) | 70% | ~50-60% | Per-query corpus test |
Note on SQuAD: Full 10,570-query benchmark running on EC2. After 200 queries: 99.0% Recall@10.
Note on CRAG: Tested 10 queries using each query's provided search results as corpus. Scaling requires per-query ingestion which is compute-intensive. CRAG is designed for API-augmented RAG, not static document retrieval.
| Component | Recall@10 | Ξ from baseline |
|---|---|---|
| Semantic only | 55.2% | β |
| + RAPTOR hierarchy | 62.5% | +7.3% |
| + Cohere reranking | 71.8% | +16.6% |
| + BM25 hybrid | 72.4% | +17.2% |
| + HyDE + Query decomp | 72.89% | +17.7% |
Key insight: Cohere's neural reranker alone adds +9.3 percentage points.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Query Input β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Parallel Retrieval Strategies β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β Semantic β β HyDE β β BM25 β β Query β β
β β Search β β Expansionβ β Hybrid β β Decomp β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Cohere Neural Reranking β
β (rerank-english-v3.0) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Top-K Results β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
rag_benchmarking/
βββ ultimate_rag/ # π§ Full RAG implementation
β βββ api/
β β βββ server.py # FastAPI server
β βββ retrieval/
β β βββ retriever.py # Main orchestration
β β βββ strategies.py # HyDE, BM25, decomposition
β β βββ reranker.py # Cohere + cross-encoder
β βββ raptor/
β β βββ tree_building.py # RAPTOR hierarchy
β βββ graph/
β β βββ graph.py # Knowledge graph
β βββ core/
β β βββ node.py # Tree/forest data structures
β βββ agents/
β βββ teaching.py # Knowledge teaching interface
β
βββ knowledge_base/ # π RAPTOR core library
β βββ raptor/
β βββ cluster_tree_builder.py
β βββ EmbeddingModels.py
β βββ ...
β
βββ adapters/ # π Benchmark adapters
β βββ ultimate_rag_adapter.py
β
βββ scripts/ # π Evaluation scripts
β βββ run_multihop_eval.py
β βββ run_crag_eval.py
β
βββ docs/ # π Documentation
β βββ blog_post.md # Practitioner-friendly writeup
β βββ technical_report.md # Academic-style report
β βββ README.md
β
βββ multihop_rag/ # π MultiHop-RAG dataset
β βββ dataset/
β βββ corpus.json # 609 news articles
β βββ MultiHopRAG.json # 2556 queries
β
βββ crag/ # π CRAG dataset
β βββ ...
β
βββ requirements.txt # Dependencies
curl http://localhost:8000/healthcurl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"query": "What was the outcome of the merger?", "top_k": 10}'curl -X POST http://localhost:8000/ingest/batch \
-H "Content-Type: application/json" \
-d '{
"tree": "default",
"documents": [{"content": "Document text here..."}],
"build_hierarchy": true
}'# Save
curl -X POST http://localhost:8000/persist/save \
-H "Content-Type: application/json" \
-d '{"tree": "default"}'
# Load
curl -X POST http://localhost:8000/persist/load \
-H "Content-Type: application/json" \
-d '{"tree": "default", "path": "trees/default.pkl"}'| Mode | Strategies | Use Case |
|---|---|---|
fast |
Semantic only | Low latency, simple queries |
standard |
Semantic + HyDE + BM25 + Decomp | Balanced (default) |
thorough |
All strategies | Maximum recall, high latency |
OPENAI_API_KEY=sk-... # Required for embeddings
COHERE_API_KEY=... # Recommended for reranking (see privacy note below)
RETRIEVAL_MODE=standard # fast|standard|thorough
DEFAULT_TOP_K=10 # Number of resultsThis system uses Cohere's rerank API for neural reranking, which provides the best benchmark results (+9.3% improvement). Please be aware:
- Data logging: By default, Cohere logs prompts and outputs on their SaaS platform (retained for 30 days)
- Training opt-out: You can disable data usage for training in your Cohere dashboard under "Data Controls"
- Zero retention: Enterprise customers can request zero data retention
- Cloud deployments: If using Cohere via AWS/GCP/Azure, Cohere does not receive your data
For privacy-sensitive use cases, consider these alternatives:
- Local cross-encoder: The system includes
CrossEncoderRerankerusingBAAI/bge-reranker-base(runs locally, no external API) - Remove Cohere: Don't set
COHERE_API_KEYand the system falls back to local reranking - LLM-as-reranker: Use a local/GDPR-compliant LLM for reranking
See Cohere's privacy policy and enterprise data commitments for details.
| Component | Cost per Query |
|---|---|
| OpenAI embeddings | $0.000007 |
| HyDE generation | $0.00018 |
| Query decomposition | $0.00027 |
| Cohere reranking | $0.002 |
| Total | ~$0.0025 |
Full benchmark (2556 queries): ~$6
- π Blog Post - Practitioner-friendly writeup
- π Technical Report - Detailed analysis with ablations
- ποΈ Architecture - System design
If you use this code, please cite:
@software{rag_benchmarking_2026,
title = {Multi-Strategy RAG for Multi-Hop Question Answering},
author = {Anonymous},
year = {2026},
url = {https://github.com/incidentfox/OpenRag}
}MIT License - see LICENSE for details.
- RAPTOR for hierarchical retrieval
- Cohere for neural reranking API
- MultiHop-RAG for benchmark dataset
- Built with Claude as AI pair programmer