A production-grade, LiteParse-compatible OCR microservice written in Swift 6 for macOS. Uses Apple's Vision framework for on-device text recognition with no external API dependencies.
- 100% LiteParse API compatible —
POST /ocr,GET /health,GET /metrics - Vision framework — native macOS OCR, Apple Silicon optimized
- Actor-based concurrency — bounded worker pool with backpressure (HTTP 429)
- Correct coordinates — Vision bottom-left origin converted to spec-required top-left pixel coords
- Structured logging — OSLog with request IDs and duration tracking
- Graceful shutdown — SIGTERM/SIGINT handled by the service group; HTTP server stops accepting, then in-flight OCR work drains before exit
- Helpful errors — port-in-use and permission errors shown clearly
- macOS 15+ (Sequoia)
- Swift 6.0+
- Apple Silicon or Intel Mac
# Build (release)
make release
# Run
./.build/release/vision-ocr
# Server starts on http://0.0.0.0:8000
# Test
curl -X POST http://localhost:8000/ocr \
-F "file=@image.png" \
-F "language=en"make build # Debug build → .build/debug/vision-ocr
make release # Release build → .build/release/vision-ocr
make test # Run test suite
make install # Install to /usr/local/bin
make install-local # Install to ~/bin
make run # Build debug and runRequest: multipart/form-data
| Field | Type | Required | Description |
|---|---|---|---|
file |
binary | Yes | Image file (PNG, JPG, TIFF, WebP, BMP, GIF) |
language |
string | No | ISO 639-1 code, default en |
recognition_level |
string | No | accurate (default) or fast. May also be passed as a URL query parameter (?recognition_level=fast or ?level=fast). The form field takes precedence over the query parameter. |
fast trades recognition quality for lower latency and a smaller memory footprint; accurate is the default.
# Faster, lighter recognition via form field…
curl -X POST http://localhost:8000/ocr -F "file=@image.png" -F "recognition_level=fast"
# …or via query parameter
curl -X POST "http://localhost:8000/ocr?recognition_level=fast" -F "file=@image.png"Response 200 OK:
{
"results": [
{
"text": "Hello World",
"bbox": [718.6, 749.0, 2396.5, 885.4],
"confidence": 1.0
}
]
}bbox is [x1, y1, x2, y2] in pixels, top-left origin, x2 > x1, y2 > y1.
Error responses:
| Status | Trigger |
|---|---|
400 |
Missing file field, invalid language code, invalid recognition level, unsupported image format |
429 |
Worker queue full (retry with backoff) |
500 |
OCR processing or internal failure |
504 |
OCR timed out (> 60 s) |
Error body: {"error": "description"}
curl http://localhost:8000/health{
"status": "healthy",
"timestamp": 1780871211.0,
"poolStats": {
"workerCount": 4,
"activeWorkers": 0,
"queueDepth": 0,
"maxQueueSize": 100,
"totalProcessed": 42
}
}curl http://localhost:8000/metrics{
"timestamp": 1780871211.0,
"metrics": {
"totalRequests": 42,
"successRequests": 41,
"errorRequests": 1,
"clientErrorRequests": 1,
"serverErrorRequests": 0,
"successRate": 0.976,
"errorRate": 0.024,
"serverErrorRate": 0.0,
"averageLatency": 0.24,
"p50Latency": 0.21,
"p95Latency": 0.44,
"p99Latency": 0.58,
"throughput": 3.2
},
"pool": { ... }
}timestamp is Unix epoch seconds. throughput is requests/second since startup.
clientErrorRequests (4xx) are separated from serverErrorRequests (5xx/504);
serverErrorRate reflects only server-side failures.
# CLI flags (all optional)
vision-ocr --host 0.0.0.0 # default: 0.0.0.0 (all interfaces); use 127.0.0.1 for loopback only
vision-ocr --port 8000 # default: 8000
vision-ocr --workers 2 # default: 2 (Vision is ANE/GPU-bound; more workers mostly raise peak RAM, not throughput)
vision-ocr --max-queue-size 100 # default: 100Sources/
├── CLI/ Entry point, argument parsing, signal handling
├── OCRServer/ HTTP server actor, route handlers, multipart parsing
├── HTTP/ RequestValidator, ResponseHandler, ErrorHandler
├── OCR/ VisionOCREngine, OCRPipeline, LanguageMapper
├── Image/ ImageProcessor, ImageValidator, format detection
├── Coordinates/ BoundingBox, VisionCoordinateConverter (Y-flip)
├── Results/ TextOrderingEngine, ConfidenceNormalizer
├── Concurrency/ WorkerPool, AsyncQueue, ResultStore, OCRWorker
├── Metrics/ MetricsCollector (P50/P95/P99)
├── Health/ HealthStatus, MetricsResponse
└── Utilities/ ServerError, ServerConfiguration, logging
| Package | Source | Purpose |
|---|---|---|
hummingbird 2.0+ |
hummingbird-project | HTTP server |
multipart-kit 4.0+ |
vapor | Multipart form-data parser |
swift-service-lifecycle 2.0+ |
swift-server | Graceful shutdown (SIGTERM/SIGINT) |
swift-log 1.0+ |
Apple | Structured logging |
See DEPLOYMENT.md for launchd setup, log management, and production tuning.
See ARCHITECTURE.md for design decisions, concurrency model, and component details.
See API_COMPATIBILITY.md for the full spec compliance matrix.
Status: Production-ready Platform: macOS 15+ (Apple Silicon & Intel) Updated: June 5, 2026