MediRAG: Healthcare Document Retrieval System

A production-ready Retrieval-Augmented Generation (RAG) system for healthcare document retrieval, built with FastAPI, ChromaDB, and OpenAI integration.

Overview

This project demonstrates a scalable RAG pipeline designed for healthcare applications. It combines semantic search using sentence transformers with LLM-based answer generation to provide accurate, context-aware responses to medical queries.

Key Features

Dual Vector Store Support: ChromaDB (primary) with FAISS fallback for high availability
Production-Grade Architecture: FastAPI with async support, structured logging, health checks, and monitoring
LLM Integration: OpenAI GPT-4o-mini for context-aware answer generation
Containerized Deployment: Docker + Docker Compose with multi-stage builds
Enterprise Configuration: Environment-based settings with Pydantic validation
Performance Optimized: Response time tracking, efficient embeddings, automatic retries

Tech Stack

Framework: FastAPI 0.115.0
Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
Vector Stores: ChromaDB 0.4.24, FAISS 1.9.0
LLM: OpenAI GPT-4o-mini
Server: Uvicorn with uvloop
Containerization: Docker, Docker Compose

Architecture

┌─────────────┐
│   Client    │
└──────┬──────┘
       │
       ▼
┌─────────────────────────────────┐
│      FastAPI Server             │
│  ┌──────────────────────────┐   │
│  │  Middleware & Logging    │   │
│  └──────────────────────────┘   │
│  ┌──────────────────────────┐   │
│  │   RAG Endpoint           │   │
│  │  - Query validation      │   │
│  │  - Embedding generation  │   │
│  └──────────────────────────┘   │
└────────┬────────────────┬───────┘
         │                │
         ▼                ▼
   ┌─────────┐      ┌─────────┐
   │ChromaDB │      │  FAISS  │
   │(Primary)│      │(Fallback)│
   └─────────┘      └─────────┘
         │                │
         └────────┬───────┘
                  ▼
         ┌────────────────┐
         │  OpenAI API    │
         │  (GPT-4o-mini) │
         └────────────────┘

Getting Started

Prerequisites

Docker & Docker Compose
Python 3.10+ (for local development)
OpenAI API key (optional, for answer generation)

Quick Start with Docker

Clone the repository

git clone <repository-url>
cd RAG_healthcare

Configure environment variables

cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

Launch services

docker-compose up --build

Test the API

curl -X POST "http://localhost:8000/rag" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is metformin used for?", "k": 2}'

Local Development Setup

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the application
uvicorn rag_app:app --reload --port 8000

API Documentation

Endpoints

`POST /rag`

Retrieve relevant documents and generate an answer.

Request:

{
  "query": "What is metformin used for?",
  "k": 2
}

Response:

{
  "query": "What is metformin used for?",
  "retrieved_contexts": [
    "Metformin is commonly prescribed as a first-line therapy for type 2 diabetes."
  ],
  "num_results": 1,
  "answer": "Metformin is commonly prescribed as a first-line therapy for type 2 diabetes. It helps control blood sugar levels in patients with this condition.",
  "processing_time_ms": 245.32,
  "timestamp": "2025-10-26T10:30:45.123456"
}

`GET /health`

Health check endpoint for monitoring.

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "timestamp": "2025-10-26T10:30:45.123456",
  "index_size": 4
}

`GET /docs`

Interactive API documentation (Swagger UI).

`GET /redoc`

Alternative API documentation (ReDoc).

Configuration

Environment variables are managed via .env and validated through config.py:

Variable	Default	Description
`OPENAI_API_KEY`	-	OpenAI API key for answer generation
`EMBEDDING_MODEL`	all-MiniLM-L6-v2	Sentence transformer model
`OPENAI_MODEL`	gpt-4o-mini	OpenAI model for answers
`USE_CHROMADB`	true	Use ChromaDB (falls back to FAISS)
`CHROMA_HOST`	chromadb	ChromaDB service hostname
`CHROMA_PORT`	8000	ChromaDB service port
`DEBUG`	false	Enable debug mode
`LOG_LEVEL`	INFO	Logging level

Project Structure

RAG_healthcare/
├── rag_app.py              # Main FastAPI application
├── config.py               # Configuration management
├── requirements.txt        # Python dependencies
├── Dockerfile             # Multi-stage production build
├── docker-compose.yml     # Service orchestration
├── .env                   # Environment variables
├── data/                  # Document storage (optional)
└── README.md              # This file

Key Implementation Details

Vector Store Selection

ChromaDB: Primary choice for persistent, production-grade vector storage with HTTP client support
FAISS: In-memory fallback for development and high-availability scenarios
Automatic failover with retry logic ensures continuous operation

Embedding Strategy

Sentence transformers (all-MiniLM-L6-v2) for efficient, high-quality embeddings
384-dimensional vectors balance performance and accuracy
Batch processing support for scaling to larger document sets

Answer Generation

Context-aware prompts with retrieved documents
Temperature-controlled generation (0.7) for balanced creativity
Graceful degradation: returns retrieval results even if LLM fails

Production Features

Request/response timing middleware with X-Process-Time headers
Global exception handling with debug-aware error messages
Structured logging with timestamp, severity, and context
Health checks compatible with Kubernetes/Docker Swarm
Non-root container user for security
Multi-stage Docker builds for minimal image size

Performance

Embedding Generation: ~50-100ms (sentence-transformers)
Vector Search: ~5-20ms (ChromaDB/FAISS)
LLM Generation: ~200-500ms (OpenAI API)
Total Response Time: ~250-600ms end-to-end

Security Considerations

API keys loaded from environment (never committed)
Non-root container execution
CORS middleware with configurable origins
Input validation with Pydantic
Request size limits and query length constraints
Health check endpoints don't expose sensitive data

Scaling Strategies

Horizontal Scaling

Stateless API design allows multiple replicas
ChromaDB supports distributed deployment
Load balancer (Nginx/Traefik) for traffic distribution

Vertical Scaling

Increase uvicorn workers (CPU-bound)
Optimize embedding model size vs. accuracy
Batch processing for bulk queries

Future Enhancements

Add Pinecone/Milvus for cloud-native vector storage
Implement caching layer (Redis) for frequent queries
Add BM25 hybrid search for keyword matching
Integrate reranking models for improved relevance
Implement rate limiting and API key authentication

Development Roadmap

Add unit tests (pytest + pytest-asyncio)
Implement query caching
Add document upload endpoint
Support multiple document formats (PDF, DOCX)
Implement user authentication
Add metrics and monitoring (Prometheus)
Create evaluation pipeline (ROUGE, BLEU)

License

MIT License - see LICENSE file for details

Contact

For questions or collaboration opportunities, please open an issue or reach out via GitHub.

Built with FastAPI • ChromaDB • OpenAI • Docker

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.py		config.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
rag_app.py		rag_app.py
requirements.txt		requirements.txt

DongL/ai-healthcare-rag

Folders and files

Latest commit

History

Repository files navigation

MediRAG: Healthcare Document Retrieval System

Overview

Key Features

Tech Stack

Architecture

Getting Started

Prerequisites

Quick Start with Docker

Local Development Setup

API Documentation

Endpoints

POST /rag

GET /health

GET /docs

GET /redoc

Configuration

Project Structure

Key Implementation Details

Vector Store Selection

Embedding Strategy

Answer Generation

Production Features

Performance

Security Considerations

Scaling Strategies

Horizontal Scaling

Vertical Scaling

Future Enhancements

Development Roadmap

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`POST /rag`

`GET /health`

`GET /docs`

`GET /redoc`

Packages