Skip to content

Health Atlas— a multi-agent AI platform that autonomously validates, enriches, and prioritizes healthcare provider data. Built with FastAPI, React, and LangGraph, it delivers real-time, VLM-ready automation for accuracy, compliance, and scale.

Notifications You must be signed in to change notification settings

muskan-khushi/Health_Atlas

 
 

Repository files navigation

🩺 Health Atlas: Autonomous Provider Data Validation Service

An intelligent, full-stack AI system that autonomously verifies, corrects, and enriches healthcare provider data from diverse sources.

FeaturesTech StackGetting StartedArchitectureAPI Documentation


🎯 The Problem

Healthcare organizations struggle with one of the industry's most persistent challenges: inaccurate and outdated provider data. Manual validation is time-consuming, error-prone, and doesn't scale. Incorrect provider information leads to:

  • ❌ Patient care disruptions
  • ❌ Revenue loss from denied claims
  • ❌ Regulatory compliance issues
  • ❌ Poor member experience

💡 The Solution

Health Atlas leverages a multi-agent AI system that autonomously validates healthcare provider data at scale, transforming weeks of manual work into minutes of intelligent automation.


✨ Key Features

🚀 Real-Time Bulk Validation

  • Upload CSV files containing provider data and watch the system validate each record in parallel
  • Stream results back to the UI in real-time with live progress tracking
  • Process hundreds of records simultaneously using async architecture

👁️ Vision Language Model (VLM) Ready

  • Architected specifically for VLM integration to extract structured data from unstructured documents
  • Handle scanned PDFs, image-based documents, and handwritten forms
  • Process documents that traditional text parsers cannot read
  • Ready to integrate: Gemini Vision API, GPT-4 Vision, or Claude 3 Vision

🎯 Intelligent Prioritization System

  • Priority Score Algorithm: Combines data accuracy (Confidence Score) with business impact (Member Impact)
  • Automatically flag high-risk records for manual review
  • Focus your team's efforts on the most critical data quality issues first

📊 Actionable Reporting & Dashboards

  • Run Summary Dashboard: At-a-glance metrics for every validation job
    • Total records processed
    • Auto-validated vs. flagged records
    • Breakdown of common error types
    • Confidence score distribution
  • Professional PDF Reports: Export clean, shareable reports for stakeholders
  • Email Generation: Auto-generate follow-up emails for flagged providers

🤖 Multi-Agent AI Engine

A deterministic AI pipeline where specialized agents collaborate:

Agent Role Capabilities
🧠 Data Validation Agent Baseline Verification Cross-checks provider info against official NPI registry, validates physical addresses, verifies credentials
🌐 Information Enrichment Agent Data Enhancement Web scraping for missing data, contact information discovery, specialty validation
🔍 Quality Assurance Agent Integrity Checks Flags inconsistencies, detects mock/fake licenses, calculates reliability scores
🗂️ Directory Management Agent Data Synthesis Standardizes formats, resolves conflicts, generates final validated profiles

🚀 Tech Stack

Category Technologies
AI Backend Python 3.10+, FastAPI, LangGraph, Groq API
Frontend React 18, Vite, Tailwind CSS, jsPDF, React Query
AI/ML LangChain, LangGraph, Vision API Integration Layer
Data Processing Pandas, AsyncIO, PyPDF2
Web Automation Selenium WebDriver
APIs & Services Geoapify (Geocoding), NPI Registry API
Development Tools Faker (test data generation), ESLint, Prettier

🧩 Getting Started

Prerequisites

Before you begin, ensure you have the following installed:

⚙️ 1. Clone the Repository

git clone https://github.com/Rupali_2507/Health_Atlas
cd Health_Atlas

🖥️ 2. Backend Setup

# Navigate to backend directory
cd backend

# Create and activate virtual environment
python -m venv .venv

# On Windows
.\.venv\Scripts\activate

# On macOS/Linux
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Configure Environment Variables:

Create a .env file in the backend directory:

# Required API Keys
GROQ_API_KEY="your-groq-api-key-here"
GEOAPIFY_API_KEY="your-geoapify-api-key-here"

# Optional: VLM Integration (Uncomment when ready)
# GOOGLE_API_KEY="your-google-api-key"
# OPENAI_API_KEY="your-openai-api-key"

# Server Configuration
HOST="127.0.0.1"
PORT=8000

Start the Backend Server:

uvicorn main:app --reload

✅ Backend running at: http://127.0.0.1:8000
📚 API Documentation: http://127.0.0.1:8000/docs

💻 3. Frontend Setup

Open a new terminal window:

# Navigate to frontend directory
cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

✅ Frontend running at: http://localhost:5173

🎉 4. Access the Application

Open your browser and navigate to:


🔬 Backend Deep Dive: Dual-Flow AI Architecture

Health Atlas operates on a sophisticated dual-flow architecture, demonstrating versatility in handling different business processes.

Flow 1: AI Validation Pipeline (Core)

CSV Upload → Parallel Processing → Multi-Agent Analysis → Real-Time Streaming → Summary Report

Key Components:

  1. High-Throughput Async Processing

    • FastAPI backend uses asyncio for concurrent record processing
    • Configurable batch sizes for optimal performance
    • Handles large datasets (10,000+ records) efficiently
  2. Live Streaming Architecture

    • Server-Sent Events (SSE) push results to frontend
    • Real-time progress tracking and log visualization
    • No polling required - true push-based updates
  3. Comprehensive Analysis Pipeline

    • NPI registry cross-validation
    • Address geocoding and verification
    • Website scraping for data enrichment
    • Confidence scoring and flagging logic
  4. Actionable Outputs

    • Downloadable PDF summary reports
    • Prioritized review queue
    • Auto-generated follow-up emails for flagged providers

Flow 2: VLM Document Processing (Future-Ready)

PDF Upload → VLM Analysis → Structured Extraction → Data Validation → Profile Creation

Currently Implemented:

  • PDF text extraction using PyPDF2
  • Structured data parsing
  • Ready-to-integrate VLM API layer

VLM Integration (Ready to Enable):

# Example: Gemini Vision Integration
def analyze_provider_document_vlm(file_path: str) -> dict:
    """
    Extract structured provider data from any document type using VLM.
    Handles: scanned PDFs, images, handwritten forms, etc.
    """
    file = genai.upload_file(path=file_path)
    
    prompt = """
    Extract the following provider information:
    - Full Name
    - NPI Number
    - Specialties
    - Address (Street, City, State, ZIP)
    - Phone and Fax
    - License Numbers
    - Accepting New Patients status
    """
    
    response = model.generate_content([file, prompt])
    return parse_structured_response(response.text)

🧰 The Agent's Toolkit

Health Atlas agents are powered by specialized tools that handle distinct validation tasks.

Function Description Technology
search_npi_registry() 🔎 Connects to official NPI database for baseline verification NPI Registry API
parse_provider_pdf() 📄 Extracts text from provider documents with broad PDF compatibility PyPDF2
parse_provider_pdf_vlm() 👁️ VLM-powered extraction from scanned/image-based documents Gemini Vision API
scrape_provider_website() 🌐 Dynamically scrapes provider websites for enrichment Selenium WebDriver
validate_address() 🗺️ Confirms address accuracy with geographic confidence scoring Geoapify API
calculate_priority_score() 📊 Computes priority based on confidence × member impact Custom Algorithm
generate_follow_up_email() ✉️ Creates professional email templates for flagged records LangChain + Groq

📊 System Architecture

┌─────────────────┐
│   React UI      │  ← User uploads CSV
│   (Frontend)    │  ← Real-time results streaming
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  FastAPI Server │  ← Async job orchestration
│   (Backend)     │  ← Multi-agent coordination
└────────┬────────┘
         │
    ┌────┴────┐
    ▼         ▼
┌─────────┐ ┌──────────┐
│ LangGraph│ │  Tools   │
│  Agents  │ │  Layer   │
└─────────┘ └──────────┘
    │            │
    └─────┬──────┘
          ▼
    ┌──────────────┐
    │ External APIs│
    │ - NPI Reg    │
    │ - Geoapify   │
    │ - Web Scraper│
    └──────────────┘

📈 Performance & Scalability

  • Processing Speed: 100+ records/minute with parallel execution
  • 📦 Batch Processing: Configurable batch sizes for memory optimization
  • 🔄 Async Architecture: Non-blocking I/O for maximum throughput
  • 📊 Scalability: Horizontal scaling ready with minimal configuration

🛣️ Roadmap

Phase 1: Core Validation (✅ Complete)

  • Multi-agent AI pipeline
  • NPI registry integration
  • Address validation
  • Real-time streaming UI
  • PDF reporting

Phase 2: VLM Integration (🚧 In Progress)

  • Gemini Vision API integration
  • Scanned document processing
  • Handwriting recognition
  • Image-based PDF parsing

Phase 3: Advanced Features (📋 Planned)

  • Historical data tracking
  • Automated re-validation scheduling
  • Machine learning-based anomaly detection
  • Multi-tenant architecture
  • API rate limiting and caching
  • Advanced analytics dashboard

Phase 4: Enterprise Ready (🔮 Future)

  • SSO/SAML authentication
  • Role-based access control
  • Audit logging
  • SOC 2 compliance
  • HIPAA compliance features
  • Microservices architecture

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.


Final KPI Summary

The system successfully processed all valid records and produced the following final summary:

Analysis of Results vs. Goals

  • KPI Goal Result Status
  • Validation Accuracy 80%+ 88.89% ✅ GOAL ACHIEVED
  • Processing Speed < 300 sec ~732 sec ⚠️ TARGET MISSED*
  • Processing Throughput 500+/hr 517 providers/hr ✅ GOAL ACHIEVED

Note on Processing Speed: The 5-minute target was missed as a deliberate engineering trade-off for the demo. To guarantee a stable run without hitting API rate limits on the free tier, the number of parallel workers was set to 1. The throughput of 517 providers/hour proves the architecture is highly efficient and would easily beat the speed target with a production-level API key.

🙏 Acknowledgments

  • LangChain & LangGraph for the agent orchestration framework
  • Groq for high-speed LLM inference
  • FastAPI for the excellent async web framework
  • React Community for the robust frontend ecosystem

🧭 Vision

Health Atlas represents a step toward self-healing data ecosystems — systems that not only detect but autonomously repair data drift in critical infrastructures like healthcare.

This foundation can scale toward enterprise-grade deployments where data reliability becomes an autonomous service, reducing operational overhead and improving patient outcomes across the healthcare industry.


About

Health Atlas— a multi-agent AI platform that autonomously validates, enriches, and prioritizes healthcare provider data. Built with FastAPI, React, and LangGraph, it delivers real-time, VLM-ready automation for accuracy, compliance, and scale.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 57.9%
  • Python 26.6%
  • Java 14.9%
  • Other 0.6%