Olla v0.0.28
The latest stable build of our open-source LLM proxy and load balancer.
We build the foundational tools and platforms that empower organisations to deploy, manage and scale AI applications with confidence on their own terms.
Fresh releases, announcements and writing from the workshop.
The latest stable build of our open-source LLM proxy and load balancer.
Olla v0.0.28 is a big release: native oMLX support for fast multi-model inference on Apple Silicon with Anthropic passthrough, per-endpoint authentication for local backends, opt-in CORS for browser clients, and a round of Anthropic translation and reliability hardening.
Put oMLX and your other Mac inference backends behind one OpenAI endpoint with Olla - model unification across MLX and GGUF naming, Anthropic passthrough and failover.
TensorFoundry covers the full LLM stack. Pick the layers you need.
From open-source tools to enterprise platforms
From ML pipelines to enterprise orchestration - a snapshot of our team's journey.
Deploy AI where it matters most - at the edge or closer to you, on your own terms.
Keep sensitive data on-premises. Finely tune models and maintain complete sovereignty without Cloud.
Sub-millisecond response times with edge deployment. Eliminate network overhead for real-time AI.
Reduce cloud API costs by 90%. Pay once for hardware, run inference & training indefinitely.
Hardware-accelerated inference with support for CUDA, Metal and custom accelerators.
Open-source AI inference proxy perfect for small businesses and development teams. Unified interface for Ollama, LM Studio, vLLM and others with load balancing and failover.
Join the waitlist for priority access to Alloy, FoundryOS, Pivotal and Forge as they launch through 2026.
Get exclusive updates, beta access and founding member pricing.
Let TensorFoundry build your AI Inference & Training Lab for you.