This post explores British and American pronunciation differences through the lens of AI speech recognition, offering new insight into how we speak and how machines listen.
As someone who teaches English phonetics in Singapore, I’m constantly navigating the subtle differences in pronunciation across accents. But nothing has opened my eyes to those differences quite like trying to get AI to understand them. Training language models has become a new lens, a strange and fascinating window, into how speech works, and where it breaks down. And although we take a deep dive into the research in this post, it’s important to appreciate how difficult this challenge really is—I know firsthand as a teacher.
You: “Set a timer for my next shed-jool.”
Siri: “Searching the web for… ‘shuttle?’”
From shed-jools to vitamins, even the smartest AIs struggle to keep up with how English is spoken around the world. The reason? Speech models aren’t trained on all English equally.
Speech AI now powers your Zoom meetings, call centers, captions, classrooms, and health apps. But these tools are overwhelmingly trained on standard American English and it shows. A Stanford-led study found that African-American speech had nearly double the word error rate (35%) compared to white American speech (19%). A separate global benchmark showed that accents from India, Britain, and France had up to 49% more errors compared to American-accented input.
Even OpenAI’s Whisper, one of the best available today, still performs better on U.S. English than on British or Australian. Meanwhile, platforms like Siri and Alexa have long struggled with Irish and Scottish accents, prompting users to change how they speak just to be understood.
This isn’t about funny mishearings, it’s about fairness, inclusion, education, and opportunity. When your accent isn’t recognized, your message often isn’t either.
And the systems aren’t being rude. They’re just undertrained.
Until models are exposed to a wider range of global speech, millions of users will keep getting subtly sidelined by systems that just don’t “hear” them.
Why AI Still Trips Over Accents
Even today’s most advanced speech systems struggle to understand all accents equally well. Despite impressive improvements, multiple studies show consistent accent bias, particularly against non-American and non-standard dialects.
The Data: Bias in the Benchmarks
As mentioned above, a Stanford-led evaluation (Koenecke et al., 2020) of five major ASR systems—Google, Apple, IBM, Microsoft, and Amazon, found that transcripts for African American Vernacular English had a 35% word error rate (WER), nearly double the 19% for white American speakers.
In a larger audit spanning 2,700 speakers across five continents, DiChristofano et al. (2023) found WER gaps ranging from 2–12% for non-American accents, translating to up to 49% relative error. Indian, French, and Southeast Asian-accented English were among the hardest hit.
Even among native speakers, bias persists. OpenAI’s Whisper performs far better on American English than on British or Australian varieties (Graham & Roll, 2024). UK studies show that “prestige” accents like Received Pronunciation are recognized more accurately than regional dialects (Markl, 2022).
Why It Happens
Most speech models are trained predominantly on American data. As a result, the burden of intelligibility often falls on the speaker not the system.
Speech recognition models work by breaking down audio into phonemes and matching them against statistical patterns. But those patterns reflect the training data. And when that data skews heavily toward a single accent, everyone else is left misheard.
Sound Patterns That Commonly Confuse AI
Certain phonetic shifts are especially tricky for ASR systems:
/æ/ becomes /ɑː/ in words like advertisement, patent, and pasta. Vowel shifts are unevenly represented across training corpora.
/aɪ/ vs. /ɪ/ in words like either, neither, and vitamin. Models often guess based on spelling alone, lacking contextual accent knowledge.
Non-rhotic /r/ drops in British English words like parliament, wrath, and version, leading to alignment errors.
/ʃ/ vs. /sk/ in schedule (shed-yool vs. sked-jool). This cluster confusion breaks decoder predictions.
/ɪə/ vs. /ɛ/ in leisure and niche. American decoders often miss the glide.
T-flapping vs. T-holding in tomato and route. American English flaps the /t/; British English pronounces it clearly—confusing the ASR.
These sound mismatches often trip up models in real-world use. For instance, schedule, vitamin, and mobile are common stumbling blocks for AI depending on whether it hears British or American pronunciation.
Anecdotal Proof from the Real World
This isn’t just theory. Users have long reported needing to “code-switch” or Americanize their speech just to be understood:
Siri and Alexa had well-documented trouble with Scottish and Irish accents.
Google Assistant often mishears British time phrases due to differences in /t/ pronunciation.
Contact center pilots have shown customer satisfaction increases when using accent-neutralizing AI layers.
Reddit users frequently report toggling between British and American spellings or pronunciations to get accurate results.
Until these solutions become standard, speech recognition remains biased. For teachers, students, and users with regional or global accents, the experience can still feel exclusionary. The irony is sharp: the AI that talks back still doesn’t always listen properly.
The Ultimate British vs. American Pronunciation Table
What confuses humans also confuses machines. Below is a table of over 50 words with distinct British and American pronunciations, many of which AI often misunderstands, especially when the user’s accent doesn’t match its assumptions.
| Word | 🇺🇸 American | 🇬🇧 British |
| Advertisement | AD-ver-tize-ment | ad-VER-tiss-ment |
| Adult | AD-ult | uh-DULT |
| Aluminium | a-LOO-min-um | al-yuh-MIN-ee-um |
| Amen | AY-men | Ah-men |
| Asia | AY-zhuh | AY-shuh |
| Bald | bold | bawld |
| Basil | BAY-suhl | BAH-suhl |
| Buddha | BOOD-uh | BUD-uh |
| Clique | clik | cleek |
| Crescent | CRES-uhnt | CREZ-uhnt |
| Data | DAY-tuh | DAH-tuh |
| Dynasty | DIE-nuh-stee | DIN-uh-stee |
| Either | ee-thur | eye-thur |
| Envelope | EN-vuh-lope | ON-vuh-lope |
| Esplanade | ES-pluh-nard | ES-pluh-nayd |
| Evolution | EH-vuh-loo-shun | EE-vuh-loo-shun |
| Expatriate | ex-SPAY-tree-ut | ex-SPAT-ri-ut |
| Falcon | FAL-kun | FOL-kun |
| Garage | guh-RAHZH | GA-ridge |
| Herb | ERB | HERB |
| Laboratory | LAB-ruh-tor-ee | luh-BOR-uh-tree |
| Leisure | LEE-zher | LEZH-uh |
| Medicine | MED-i-sin | MED-sin |
| Meter | MEE-ter | MEE-tuh |
| Mobile | MOH-buhl | MOH-bile |
| Missile | MISS-uhl | MISS-eye-ul |
| Neither | nee-thur | nigh-thur |
| Niche | nitch | neesh |
| Oregano | uh-REG-uh-no | or-uh-GAH-no |
| Often | OFF-en / OFF-tuhn | same |
| Parliament | PAR-luh-ment | PAR-li-ment |
| Pasta | PAR-stuh | PAS-tuh |
| Patent | PAT-uhnt | PAY-tuhnt |
| Patronise | PAY-truh-nize | PAT-ruh-nize |
| Privacy | PRAI-vuh-see | PRIV-uh-see |
| Produce (noun) | PROH-duce | PROD-juice |
| Progress (noun) | PROG-ress | PROH-gress |
| Project (noun) | PROJ-ect | PROH-ject |
| Route | ROWT | ROOT |
| Schedule | SKED-jool | SHED-jool |
| Scone | skohn | skon |
| Semi | SEM-eye | SEM-ee |
| Stance | stans | starns |
| Tomato | tuh-MAY-to | tuh-MAH-to |
| Vase | vays | varz |
| Vendor | VEN-door | VEN-duh |
| Version | VER-zhun | VER-shun |
| Vitamin | VAI-tuh-min | VIT-uh-min |
| Wrath | rath | roth |
| Yogurt | YOH-gurt | YOG-urt |
| Zebra | ZEE-bruh | ZEB-ruh |