VisionOCR

A production-grade, LiteParse-compatible OCR microservice written in Swift 6 for macOS. Uses Apple's Vision framework for on-device text recognition with no external API dependencies.

Features

100% LiteParse API compatible — POST /ocr, GET /health, GET /metrics
Vision framework — native macOS OCR, Apple Silicon optimized
Actor-based concurrency — bounded worker pool with backpressure (HTTP 429)
Correct coordinates — Vision bottom-left origin converted to spec-required top-left pixel coords
Structured logging — OSLog with request IDs and duration tracking
Graceful shutdown — SIGTERM/SIGINT handled by the service group; HTTP server stops accepting, then in-flight OCR work drains before exit
Helpful errors — port-in-use and permission errors shown clearly

Requirements

macOS 15+ (Sequoia)
Swift 6.0+
Apple Silicon or Intel Mac

Quick Start

# Build (release)
make release

# Run
./.build/release/vision-ocr
# Server starts on http://0.0.0.0:8000

# Test
curl -X POST http://localhost:8000/ocr \
  -F "file=@image.png" \
  -F "language=en"

Build & Install

make build          # Debug build → .build/debug/vision-ocr
make release        # Release build → .build/release/vision-ocr
make test           # Run test suite
make install        # Install to /usr/local/bin
make install-local  # Install to ~/bin
make run            # Build debug and run

API

POST /ocr

Request: multipart/form-data

Field	Type	Required	Description
`file`	binary	Yes	Image file (PNG, JPG, TIFF, WebP, BMP, GIF)
`language`	string	No	ISO 639-1 code, default `en`
`recognition_level`	string	No	`accurate` (default) or `fast`. May also be passed as a URL query parameter (`?recognition_level=fast` or `?level=fast`). The form field takes precedence over the query parameter.

fast trades recognition quality for lower latency and a smaller memory footprint; accurate is the default.

# Faster, lighter recognition via form field…
curl -X POST http://localhost:8000/ocr -F "file=@image.png" -F "recognition_level=fast"
# …or via query parameter
curl -X POST "http://localhost:8000/ocr?recognition_level=fast" -F "file=@image.png"

Response 200 OK:

{
  "results": [
    {
      "text": "Hello World",
      "bbox": [718.6, 749.0, 2396.5, 885.4],
      "confidence": 1.0
    }
  ]
}

bbox is [x1, y1, x2, y2] in pixels, top-left origin, x2 > x1, y2 > y1.

Error responses:

Status	Trigger
`400`	Missing `file` field, invalid language code, invalid recognition level, unsupported image format
`429`	Worker queue full (retry with backoff)
`500`	OCR processing or internal failure
`504`	OCR timed out (> 60 s)

Error body: {"error": "description"}

GET /health

curl http://localhost:8000/health

{
  "status": "healthy",
  "timestamp": 1780871211.0,
  "poolStats": {
    "workerCount": 4,
    "activeWorkers": 0,
    "queueDepth": 0,
    "maxQueueSize": 100,
    "totalProcessed": 42
  }
}

GET /metrics

curl http://localhost:8000/metrics

{
  "timestamp": 1780871211.0,
  "metrics": {
    "totalRequests": 42,
    "successRequests": 41,
    "errorRequests": 1,
    "clientErrorRequests": 1,
    "serverErrorRequests": 0,
    "successRate": 0.976,
    "errorRate": 0.024,
    "serverErrorRate": 0.0,
    "averageLatency": 0.24,
    "p50Latency": 0.21,
    "p95Latency": 0.44,
    "p99Latency": 0.58,
    "throughput": 3.2
  },
  "pool": { ... }
}

timestamp is Unix epoch seconds. throughput is requests/second since startup. clientErrorRequests (4xx) are separated from serverErrorRequests (5xx/504); serverErrorRate reflects only server-side failures.

Configuration

# CLI flags (all optional)
vision-ocr --host 0.0.0.0       # default: 0.0.0.0 (all interfaces); use 127.0.0.1 for loopback only
vision-ocr --port 8000          # default: 8000
vision-ocr --workers 2          # default: 2 (Vision is ANE/GPU-bound; more workers mostly raise peak RAM, not throughput)
vision-ocr --max-queue-size 100 # default: 100

Project Structure

Sources/
├── CLI/           Entry point, argument parsing, signal handling
├── OCRServer/     HTTP server actor, route handlers, multipart parsing
├── HTTP/          RequestValidator, ResponseHandler, ErrorHandler
├── OCR/           VisionOCREngine, OCRPipeline, LanguageMapper
├── Image/         ImageProcessor, ImageValidator, format detection
├── Coordinates/   BoundingBox, VisionCoordinateConverter (Y-flip)
├── Results/       TextOrderingEngine, ConfidenceNormalizer
├── Concurrency/   WorkerPool, AsyncQueue, ResultStore, OCRWorker
├── Metrics/       MetricsCollector (P50/P95/P99)
├── Health/        HealthStatus, MetricsResponse
└── Utilities/     ServerError, ServerConfiguration, logging

Dependencies

Package	Source	Purpose
`hummingbird` 2.0+	hummingbird-project	HTTP server
`multipart-kit` 4.0+	vapor	Multipart form-data parser
`swift-service-lifecycle` 2.0+	swift-server	Graceful shutdown (SIGTERM/SIGINT)
`swift-log` 1.0+	Apple	Structured logging

Deployment

See DEPLOYMENT.md for launchd setup, log management, and production tuning.

Architecture

See ARCHITECTURE.md for design decisions, concurrency model, and component details.

LiteParse Compatibility

See API_COMPATIBILITY.md for the full spec compliance matrix.

Status: Production-ready Platform: macOS 15+ (Apple Silicon & Intel) Updated: June 5, 2026

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
Sources		Sources
Tests		Tests
.gitignore		.gitignore
.swift-version		.swift-version
.swiftformat		.swiftformat
.swiftlint.yml		.swiftlint.yml
API_COMPATIBILITY.md		API_COMPATIBILITY.md
ARCHITECTURE.md		ARCHITECTURE.md
DEPLOYMENT.md		DEPLOYMENT.md
LICENSE		LICENSE
Makefile		Makefile
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionOCR

Features

Requirements

Quick Start

Build & Install

API

POST /ocr

GET /health

GET /metrics

Configuration

Project Structure

Dependencies

Deployment

Architecture

LiteParse Compatibility

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VisionOCR

Features

Requirements

Quick Start

Build & Install

API

POST /ocr

GET /health

GET /metrics

Configuration

Project Structure

Dependencies

Deployment

Architecture

LiteParse Compatibility

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages