Skip to content

shroominic/open-blood-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

31 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🩸 Open Blood Analysis

An AI-powered blood test analysis tool that extracts biomarkers from lab reports (PDF/images), matches them against a knowledge base, and provides normalized results with reference range assessments.

✨ Features

  • Multi-format Support - Analyze PDFs and images (PNG, JPG, etc.)
  • AI-Powered Extraction - Uses vision models to extract biomarker data from scanned reports
  • Smart Matching - Exact alias matching + AI disambiguation for fuzzy matches
  • Auto-Research - Automatically researches unknown biomarkers via web search
  • Unit Conversion - Converts units to canonical formats using safe expression evaluation
  • Demographic Ranges - Adjusts reference ranges based on age and sex
  • Granular Status - Distinguishes low/high, optimal, moderate, and elevated
  • Growing Knowledge Base - Learns new biomarkers and saves them for future analyses

πŸ”„ Processing Pipeline

flowchart TD
    A[πŸ“„ Input: PDF/Image] --> B[πŸ–ΌοΈ Convert to Images]
    B --> C[πŸ€– AI Vision Extraction]
    C --> D{Exact Alias Match?}
    
    D -->|Yes| G[βœ… Match Found]
    D -->|No| E[πŸ” Get Fuzzy Candidates]
    
    E --> F{AI Disambiguation}
    F -->|Match| G
    F -->|Research| H[🌐 Web Research Agent]
    F -->|Unknown| I[⚠️ Skip/Mark Unknown]
    
    H -->|Success| J[πŸ’Ύ Save to DB]
    J --> G
    H -->|Fail| I
    
    G --> K[πŸ”„ Unit Conversion]
    K --> L[πŸ“Š Range Assessment]
    L --> M[πŸ“‹ Final Report]
    I --> M
Loading

πŸ“¦ Installation

Prerequisites

  • Python 3.13+
  • uv (recommended) or pip
  • Gemini API key
  • Poppler (for PDF processing)

macOS Setup

# Install poppler for PDF support
brew install poppler

# Clone the repository
git clone https://github.com/yourusername/open-blood-analysis.git
cd open-blood-analysis

# Install with uv (recommended)
uv sync

# Or with pip
pip install -e .

Configuration

Create a .env file in the project root:

GEMINI_API_KEY=your-gemini-api-key
AI_MODEL=gemini-3.1-flash-lite-preview
# Optional task-specific overrides:
# AI_OCR_MODEL=gemini-3.1-flash-lite-preview
# AI_RESEARCH_MODEL=gemini-3.1-flash-lite-preview
# AI_THINKING_MODEL=gemini-3.1-flash-lite-preview
# BIOMARKERS_PATH=biomarkers.json

Or export variables directly:

export GEMINI_API_KEY="your-gemini-api-key"

πŸš€ Usage

Basic Analysis

# Analyze a PDF report
uv run blood-analysis report.pdf

# Analyze an image
uv run blood-analysis scan.png

# With debug output
uv run blood-analysis report.pdf --debug

With Demographics (for accurate reference ranges)

uv run blood-analysis report.pdf --sex female --age 35

Output Options

# Save as JSON
uv run blood-analysis report.pdf --output results.json

# Save as CSV
uv run blood-analysis report.pdf --output results.csv

# Use an alternate biomarkers database file for this run
uv run blood-analysis report.pdf --biomarkers-path data/biomarkers.v2.json

Disable Auto-Research

# Only match against existing database
uv run blood-analysis report.pdf --no-research

Manually Re-Research One Biomarker

# Refresh an existing entry (or add if not found)
uv run blood-analysis --reresearch-biomarker apolipoprotein_b

# Provide explicit unit context for the research prompt
uv run blood-analysis --reresearch-biomarker thyroid_stimulating_hormone --reresearch-unit "uIU/mL"

# Preview researched JSON without writing the configured biomarkers DB file
uv run blood-analysis --reresearch-biomarker rdw --dry-run-reresearch

πŸ“Š Example Output

                              Analysis Results                              
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Biomarker        ┃ Value ┃ Unit  ┃ Reference ┃ Optimal ┃ Peak ┃ Status                   ┃ ID                ┃
┑━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
β”‚ COLESTEROL TOTAL β”‚ 145.0 β”‚ mg/dL β”‚ 0 - 199   β”‚ 130-160 β”‚ -    β”‚ elevated                 β”‚ total_cholesterol β”‚
β”‚ COLESTEROL HDL   β”‚ 40.0  β”‚ mg/dL β”‚ 40 - 80   β”‚ 55-70   β”‚ -    β”‚ moderate                 β”‚ hdl_cholesterol   β”‚
β”‚ LDL              β”‚ 101.0 β”‚ mg/dL β”‚ 0 - 100   β”‚ 55-85   β”‚ -    β”‚ high                     β”‚ ldl_cholesterol   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ—ƒοΈ Biomarkers Database

The tool maintains a biomarkers.json file that grows as you analyze more reports. Each entry includes:

{
  "id": "total_cholesterol",
  "aliases": ["COLESTEROL TOTAL", "cholesterol, total", "TC"],
  "canonical_unit": "g/L",
  "description": "Total cholesterol in blood",
  "value_type": "quantitative",
  "enum_values": null,
  "min_normal": null,
  "max_normal": 5.18,
  "min_optimal": 3.6,
  "max_optimal": 4.8,
  "peak_value": null,
  "molar_mass_g_per_mol": 386.65,
  "conversions": {},
  "reference_rules": [
    {"condition": "age > 60", "max_normal": 6.2, "priority": 1}
  ],
  "source": "research-agent-gemini"
}

Unit Conversions

The conversion engine handles generic concentration scaling automatically:

  • Mass scaling: mg/dL <-> g/L, ng/mL <-> ug/L, etc.
  • Molar scaling: mmol/L <-> umol/L, etc.
  • Mass↔molar conversion when molar_mass_g_per_mol is provided.

Biomarker conversions are only for special transforms not covered generically.

Special formulas use safe expression evaluation with simpleeval:

  • x represents the input value
  • Example: "mg/dL": "x / 38.67" converts mg/dL to mmol/L

Demographic Rules

Reference ranges can be customized by demographics:

  • Conditions: sex == male, sex == female, age > 50, age < 18
  • Higher priority rules override lower ones

πŸ—οΈ Architecture

openblood/
β”œβ”€β”€ main.py      # CLI entry point
β”œβ”€β”€ config.py    # Configuration management
β”œβ”€β”€ loader.py    # PDF/image ingestion
β”œβ”€β”€ llm.py       # Vision model extraction
β”œβ”€β”€ database.py  # Biomarker DB operations
β”œβ”€β”€ agent.py     # AI disambiguation & research
β”œβ”€β”€ logic.py     # Unit conversion & analysis
└── types.py     # Pydantic models

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

MIT License - see LICENSE for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors