Benchmarks

Speech-to-text benchmarks

June 2026

The Pipecat STT benchmark is an open-source evaluation of leading speech-to-text models for real-time voice agents. It measures what matters most in production voice AI: transcription accuracy and latency. The benchmark is public, reproducible, and the results table is the single source of truth for every number published on this page.

Accuracy: Soniox stt-rt-v4 reaches 1.25% semantic WER and 84.1% perfect transcripts, placing it among the most accurate models in the benchmark.
Latency: 249ms median time to final segment, with 281ms P95 and 310ms P99, making Soniox one of the fastest and lowest-latency speech AI models available.
Dataset: 1,000 real-world samples from the pipecat-ai/smart-turn-data-v3.1-train dataset, with ground truth generated by Gemini and human-reviewed.
Mode: Real-time streaming transcription, the setting that defines voice agent performance.

Results

Sorted by semantic WER (lower is better). Latency is reported as time to final segment in milliseconds. Price is the public pay-as-you-go real-time rate per hour of audio, shown where the benchmarked model maps to a listed price.

Provider	Model	Price / hr	WER mean	Pooled WER	Perfect	TTFS median	TTFS P95	TTFS P99
Azure	—	$1.00	1.21%	1.18%	82.9%	1016ms	1345ms	1791ms
Soniox	`stt-rt-v4`	$0.12	1.25%	1.29%	84.1%	249ms	281ms	310ms
Speechmatics	—	$0.56	1.40%	1.07%	83.2%	495ms	676ms	736ms
Cartesia	`ink-2`	$0.43	1.47%	1.25%	84.2%	299ms	328ms	1584ms
AWS	—	—	1.68%	1.75%	77.4%	1136ms	1527ms	1897ms
Deepgram	`nova-3-general`	$0.55	1.71%	1.62%	76.5%	247ms	298ms	326ms
AssemblyAI	`u3-rt-pro`	$0.57	1.74%	1.34%	83.9%	335ms	534ms	613ms
NVIDIA	`Nemotron 3.0 ASR (en)`	—	1.90%	1.95%	76.1%	221ms	238ms	252ms
Smallest AI	`pulse`	—	2.30%	2.37%	72.4%	398ms	533ms	1593ms
Google	`latest-long`	$0.96	2.84%	2.85%	69.0%	878ms	1155ms	1570ms
ElevenLabs	`scribe_v2_realtime`	$0.39	3.16%	3.12%	81.3%	281ms	348ms	407ms
OpenAI	`gpt-4o-transcribe`	—	3.24%	3.06%	75.9%	637ms	965ms	1655ms
AssemblyAI	`universal-streaming-english`	—	3.49%	3.02%	66.8%	256ms	362ms	417ms
Gradium	`default`	—	3.72%	3.96%	65.3%	570ms	595ms	614ms
Cartesia	`ink-whisper`	—	3.92%	4.36%	60.5%	266ms	364ms	898ms
Mistral	`voxtral-mini-transcribe-realtime-2602`	—	4.44%	4.97%	68.8%	525ms	973ms	1913ms
NVIDIA	`Nemotron 3.5 ASR (multilingual)`	—	4.54%	4.58%	62.0%	236ms	253ms	266ms

Pricing reflects public pay-as-you-go rates and may not match every benchmarked configuration. See Soniox pricing for details.

Real-time transcription accuracy (semantic WER)

Source: Pipecat STT benchmark, 1,000 samples

How the benchmark works

The Pipecat benchmark scores every provider on the same audio with two purpose-built metrics for streaming voice applications.

Semantic WER measures only transcription errors that change meaning for a downstream LLM agent. Punctuation, capitalization, contractions, filler words, and number formats are ignored, so the score reflects real-world impact rather than surface differences.
TTFS (time to final segment) measures latency from the moment the user stops speaking to when the final transcription segment arrives. For streaming voice agents, lower TTFS means faster responses, and P95 latency matters more than the median because occasional spikes break conversational flow.

The benchmark dataset is published on Hugging Face as pipecat-ai/stt-benchmark-data, and anyone can rerun the evaluation to reproduce these results.

View the benchmark on GitHub

Compare speech-to-text pricing

Top accuracy does not have to cost more. Pick a provider and your monthly volume to compare pay-as-you-go speech-to-text pricing.

Pricing calculator

Stop overpaying for speech AI

Sonioxvs

1,000 hours of audio / month

1025501002505001k2.5k5k10k100k

Pricing assumptions

Based on public pay-as-you-go pricing. Enterprise discounts and committed-use contracts may differ. Some providers charge separately for certain features. The calculator uses the public price for the provider configuration that most closely matches Soniox.

Start building with Soniox

Create an account instantly, or contact us to design a custom package for your business.

Build with API

Documentation

Get up and running in minutes and spend your time building the product, not wrestling with the API.

Explore docs

See what you’ll pay

Pay only for what you use with our flexible pricing. Built to scale with you.

Pricing details