A Retrieval-Augmented Generation (RAG) chatbot for document question-answering with hybrid search capabilities. Built with FastAPI, ChromaDB, Strands Agents, and React.
- Hybrid Search: Combines BM25 keyword matching with vector similarity using Reciprocal Rank Fusion (RRF)
- Multi-LLM Support: Works with Anthropic Claude, OpenAI GPT, or local Ollama models
- Document Ingestion: Supports PDF, DOCX, TXT, Markdown, and HTML formats
- Streaming Responses: Real-time response streaming via Server-Sent Events
- Session Management: Persistent conversation history
- Single-Container Deployment: Self-contained Docker deployment
- Python 3.11+
- Node.js 18+ (for frontend)
- Docker (optional, for containerized deployment)
- Ollama (optional, for local LLM)
# Clone the repository
git clone <repository-url>
cd darksite-rag
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Copy environment template
cp .env.example .envEdit .env with your settings:
# Choose LLM provider: anthropic, openai, or ollama
LLM_PROVIDER=anthropic
# For Anthropic Claude
ANTHROPIC_API_KEY=sk-ant-your-key-here
# For OpenAI
OPENAI_API_KEY=sk-your-key-here
# For Ollama (local)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3Place your documents in a directory and run:
python -m ingestion.cli --path ./path/to/documents --statsuvicorn api.main:app --host 0.0.0.0 --port 8000 --reloadcd frontend
npm install
npm run devAccess the application:
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Frontend: http://localhost:5173
# With cloud LLM (Anthropic/OpenAI)
docker-compose up -d
# With local Ollama
docker-compose --profile local-llm up -ddocker build -t darksite-rag .
docker run -p 8000:8000 \
-v $(pwd)/data:/app/data \
-e ANTHROPIC_API_KEY=your-key \
darksite-rag| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check with system status |
| GET | /info |
Detailed system information |
| POST | /chat |
Send a question, receive an answer |
| POST | /chat/stream |
Stream response via SSE |
| POST | /ingest |
Ingest documents into vector store |
| GET | /sessions |
List all sessions |
| GET | /sessions/{id} |
Get session info |
| GET | /sessions/{id}/history |
Get conversation history |
| DELETE | /sessions/{id} |
Delete a session |
# Health check
curl http://localhost:8000/health
# Ask a question
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "What topics are covered in the documents?"}'
# Ingest documents
curl -X POST http://localhost:8000/ingest \
-H "Content-Type: application/json" \
-d '{"path": "/path/to/docs", "recursive": true}'# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | shollama pull llama3 # Recommended for RAG
ollama pull mistral # Lightweight alternative
ollama pull phi3 # Smallest option# In .env
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3darksite-rag/
├── api/ # FastAPI application
│ ├── main.py # API endpoints
│ └── models.py # Request/response schemas
├── agents/ # LLM agent orchestration
│ ├── model_factory.py # Multi-provider LLM factory
│ ├── qa_agent.py # Document Q&A agent
│ └── session.py # Session management
├── config/ # Configuration
│ └── settings.py # Pydantic settings
├── ingestion/ # Document processing
│ ├── loaders.py # Format-specific loaders
│ ├── chunker.py # Text chunking
│ ├── pipeline.py # Ingestion orchestration
│ └── cli.py # CLI interface
├── tools/ # RAG tools
│ └── retrieval.py # Hybrid search implementation
├── vector_store/ # Vector database
│ └── chromadb_client.py
├── frontend/ # React frontend
│ └── src/
│ ├── App.tsx # Chat interface
│ └── App.css # Styling
├── tests/ # Test suite
├── data/ # Runtime data
│ ├── vector_store/ # ChromaDB storage
│ └── sessions/ # Session files
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── CLAUDE.md # AI assistant guidance
# Run all tests
pytest
# Run with coverage
pytest --cov=. --cov-report=html
# Run specific test file
pytest tests/test_ingestion.py -v| Variable | Description | Default |
|---|---|---|
LLM_PROVIDER |
Provider: anthropic, openai, ollama | anthropic |
ANTHROPIC_API_KEY |
Anthropic API key | - |
OPENAI_API_KEY |
OpenAI API key | - |
OLLAMA_BASE_URL |
Ollama server URL | http://localhost:11434 |
OLLAMA_MODEL |
Ollama model name | llama3 |
TEMPERATURE |
LLM temperature | 0.7 |
MAX_TOKENS |
Max response tokens | 2048 |
| Variable | Description | Default |
|---|---|---|
VECTOR_STORE_PATH |
ChromaDB storage path | data/vector_store |
COLLECTION_NAME |
Collection name | documents |
BM25_WEIGHT |
BM25 search weight | 0.3 |
VECTOR_WEIGHT |
Vector search weight | 0.7 |
TOP_K_RESULTS |
Results per query | 5 |
| Variable | Description | Default |
|---|---|---|
INGESTION_CHUNK_SIZE |
Characters per chunk | 512 |
INGESTION_CHUNK_OVERLAP |
Overlap between chunks | 50 |
INGESTION_BATCH_SIZE |
Batch size for indexing | 100 |
INGESTION_EMBEDDING_MODEL |
Embedding model | all-MiniLM-L6-v2 |
[Add your license here]
[Add contribution guidelines here]