Skip to content

giautm/vision-ocr

Repository files navigation

VisionOCR

A production-grade, LiteParse-compatible OCR microservice written in Swift 6 for macOS. Uses Apple's Vision framework for on-device text recognition with no external API dependencies.

Features

  • 100% LiteParse API compatiblePOST /ocr, GET /health, GET /metrics
  • Vision framework — native macOS OCR, Apple Silicon optimized
  • Actor-based concurrency — bounded worker pool with backpressure (HTTP 429)
  • Correct coordinates — Vision bottom-left origin converted to spec-required top-left pixel coords
  • Structured logging — OSLog with request IDs and duration tracking
  • Graceful shutdown — SIGTERM/SIGINT handled by the service group; HTTP server stops accepting, then in-flight OCR work drains before exit
  • Helpful errors — port-in-use and permission errors shown clearly

Requirements

  • macOS 15+ (Sequoia)
  • Swift 6.0+
  • Apple Silicon or Intel Mac

Quick Start

# Build (release)
make release

# Run
./.build/release/vision-ocr
# Server starts on http://0.0.0.0:8000

# Test
curl -X POST http://localhost:8000/ocr \
  -F "file=@image.png" \
  -F "language=en"

Build & Install

make build          # Debug build → .build/debug/vision-ocr
make release        # Release build → .build/release/vision-ocr
make test           # Run test suite
make install        # Install to /usr/local/bin
make install-local  # Install to ~/bin
make run            # Build debug and run

API

POST /ocr

Request: multipart/form-data

Field Type Required Description
file binary Yes Image file (PNG, JPG, TIFF, WebP, BMP, GIF)
language string No ISO 639-1 code, default en
recognition_level string No accurate (default) or fast. May also be passed as a URL query parameter (?recognition_level=fast or ?level=fast). The form field takes precedence over the query parameter.

fast trades recognition quality for lower latency and a smaller memory footprint; accurate is the default.

# Faster, lighter recognition via form field…
curl -X POST http://localhost:8000/ocr -F "file=@image.png" -F "recognition_level=fast"
# …or via query parameter
curl -X POST "http://localhost:8000/ocr?recognition_level=fast" -F "file=@image.png"

Response 200 OK:

{
  "results": [
    {
      "text": "Hello World",
      "bbox": [718.6, 749.0, 2396.5, 885.4],
      "confidence": 1.0
    }
  ]
}

bbox is [x1, y1, x2, y2] in pixels, top-left origin, x2 > x1, y2 > y1.

Error responses:

Status Trigger
400 Missing file field, invalid language code, invalid recognition level, unsupported image format
429 Worker queue full (retry with backoff)
500 OCR processing or internal failure
504 OCR timed out (> 60 s)

Error body: {"error": "description"}

GET /health

curl http://localhost:8000/health
{
  "status": "healthy",
  "timestamp": 1780871211.0,
  "poolStats": {
    "workerCount": 4,
    "activeWorkers": 0,
    "queueDepth": 0,
    "maxQueueSize": 100,
    "totalProcessed": 42
  }
}

GET /metrics

curl http://localhost:8000/metrics
{
  "timestamp": 1780871211.0,
  "metrics": {
    "totalRequests": 42,
    "successRequests": 41,
    "errorRequests": 1,
    "clientErrorRequests": 1,
    "serverErrorRequests": 0,
    "successRate": 0.976,
    "errorRate": 0.024,
    "serverErrorRate": 0.0,
    "averageLatency": 0.24,
    "p50Latency": 0.21,
    "p95Latency": 0.44,
    "p99Latency": 0.58,
    "throughput": 3.2
  },
  "pool": { ... }
}

timestamp is Unix epoch seconds. throughput is requests/second since startup. clientErrorRequests (4xx) are separated from serverErrorRequests (5xx/504); serverErrorRate reflects only server-side failures.

Configuration

# CLI flags (all optional)
vision-ocr --host 0.0.0.0       # default: 0.0.0.0 (all interfaces); use 127.0.0.1 for loopback only
vision-ocr --port 8000          # default: 8000
vision-ocr --workers 2          # default: 2 (Vision is ANE/GPU-bound; more workers mostly raise peak RAM, not throughput)
vision-ocr --max-queue-size 100 # default: 100

Project Structure

Sources/
├── CLI/           Entry point, argument parsing, signal handling
├── OCRServer/     HTTP server actor, route handlers, multipart parsing
├── HTTP/          RequestValidator, ResponseHandler, ErrorHandler
├── OCR/           VisionOCREngine, OCRPipeline, LanguageMapper
├── Image/         ImageProcessor, ImageValidator, format detection
├── Coordinates/   BoundingBox, VisionCoordinateConverter (Y-flip)
├── Results/       TextOrderingEngine, ConfidenceNormalizer
├── Concurrency/   WorkerPool, AsyncQueue, ResultStore, OCRWorker
├── Metrics/       MetricsCollector (P50/P95/P99)
├── Health/        HealthStatus, MetricsResponse
└── Utilities/     ServerError, ServerConfiguration, logging

Dependencies

Package Source Purpose
hummingbird 2.0+ hummingbird-project HTTP server
multipart-kit 4.0+ vapor Multipart form-data parser
swift-service-lifecycle 2.0+ swift-server Graceful shutdown (SIGTERM/SIGINT)
swift-log 1.0+ Apple Structured logging

Deployment

See DEPLOYMENT.md for launchd setup, log management, and production tuning.

Architecture

See ARCHITECTURE.md for design decisions, concurrency model, and component details.

LiteParse Compatibility

See API_COMPATIBILITY.md for the full spec compliance matrix.


Status: Production-ready Platform: macOS 15+ (Apple Silicon & Intel) Updated: June 5, 2026

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors