Blog

Gradium TTS, upgraded: more accurate Text-To-Speech

Gradium TTS now runs on a new model: more natural prosody and substantially more accurate pronunciation on the cases that break voice agents in production, including spelling, acronyms, emails, phone numbers, and codes. It wins head-to-head against our previous model in all five languages, and leads real-time competitors (Cartesia Sonic 3.5, Inworld TTS 1.5 Max, ElevenLabs Flash v2.5 and Multilingual v2) on the hardest pronunciation cases. Available now as the new default, with custom voices carried over.

7 min readProduct

Semantic VAD: turn detection that uses meaning, not silence

Acoustic VAD answers "is there a voice right now?" Semantic VAD answers "is the user done talking?" Here's why the distinction decides whether a voice agent cuts users off, how Gradium STT emits multi-horizon turn-completion predictions every 80 ms, and how to tune delay_in_frames, horizon, and flushing for your use case.

7 min readEngineering

Phonon update: 1.00% WER on Seed-TTS, smaller than every model we beat

Phonon, our 100M-parameter on-device Text-To-Speech model, reaches 1.00% WER on the Seed-TTS English benchmark, outperforming NeuTTS Air, KaniTTS2, and NeuTTS Nano. With a fixed voice, it drops to 0.83% WER, ahead of Kokoro and Magpie.

5 min readResearch

Gradium #1 on Coval TTS Benchmarks

Independent Coval TTS benchmark (May 2026): Gradium ranks first on P50 TTFA (158ms), latency IQR (2ms), and provides SOTA WER (3.7%) against ElevenLabs Turbo v2.5, Flash v2.5, Multilingual v2, Cartesia Sonic-3, Deepgram Aura-2, Rime Mist-v3, Arcana, and OpenAI TTS-1-HD.

7 min readResearch

Gradium Voice Launches on AWS as a SaaS Subscription and a SageMaker Model Image

Gradium is now available on AWS through two paths: a fully managed SaaS subscription via AWS Marketplace, and a deployable model image via Amazon SageMaker for teams that need in-VPC inference.

3 min readAnnouncement

The most accurate multilingual text-to-speech, by the numbers

How we measure WER for TTS at Gradium: text normalization, jiwer alignment, results on the MiniMax Multilingual benchmark across English, French, Spanish, Portuguese and German — and why the standard metric is starting to saturate.

7 min readResearch

Gradbot: Vibe code voice agents in 50 lines of code

Gradbot is our open-source framework for prototyping voice agents in minutes. Built on a Rust orchestration core, it handles turn-taking, interruptions, silence, and async tool calls so you can ship a working voice experience in around 50 lines of code.

4 min readEngineering

Evaluating Phonon: how we made the best TTS model for edge devices

An evaluation of Gradium Phonon, our on-device text-to-speech model. Despite its small size, it significantly outperforms larger models.

5 min readResearch

Gradium Phonon: On-Device TTS for Consumer Apps, NPCs, and Offline Products

Announcing Gradium Phonon, our new on-device text-to-speech model designed for consumer apps, NPCs, and offline products.

6 min readAnnouncement

Time to First Audio: Measuring and Reducing TTS Latency in Voice Agents

In natural conversation, the gap between one person finishing a sentence and the other starting to respond averages around 200 milliseconds. For voice agents this is the target to match.

4 min readEngineering

InteractionLabs (Ongo) and Gradium Partner to Redefine Human-Robot Interaction

InteractionLabs, the company behind the Ongo living lamp robot, and Gradium announce a partnership to bring expressive, real-time voice AI to robotics.

1 min readProduct

Optimizing Quality vs. Latency in Real-Time Text-to-Speech AI Models

Explore strategies for balancing quality and latency in real-time TTS AI models. Learn how Gradium achieves low-latency, high-quality speech synthesis for voice applications.

8 min readEngineering

Building Voice Agents From the Ground Up: The Gradium Startup Program

Get 6 months free access to Gradium's voice AI platform. 9M monthly credits, voice cloning, STT/TTS APIs for seed-funded startups building voice-first products.

3 min readAnnouncement

Acolad and Gradium Partner to Advance Enterprise-Ready AI Interpreting

Acolad, the global leader in language and content solutions, and Gradium just announced a strategic partnership. The partnership reflects Acolad’s commitment to delivering secure, scalable, and governed AI-powered interpreting solutions, designed for enterprise and public-sector environments.

3 min readProduct

Invincible Voice: How Gradium's Real-Time Voice AI Helps ALS Patients Speak Again

Gradium's voice AI technology powers Invincible Voice, an open-source assistive system helping people with ALS and speech loss communicate in real-time.

3 min readProduct

Why Your Voice Cloning Sounds Fake (And How to Fix It)

Discover how Gradium's instant voice cloning achieves superior speaker similarity to ElevenLabs. Benchmark results across 4 languages with 3,220 human evaluations.

10 min readResearch

Powering Wonderful's Voice Agents

We're proud to power real-time voice agents on Wonderful's platform, bringing cutting-edge voice AI from experimental to deployable.

2 min readProduct

Gradium: Solving voice

Today we're excited to launch Gradium, the core engine powering the next generation of voice products and interactions.

5 min readAnnouncement