A "back of the envelope" evaluation intended to answer two questions I have about ASR/STT performance.
Dataset: danielrosehill/ASR-WPM-And-Background-Noise-Eval
- Results Table - Full tabulated results sorted by Word Error Rate
- AI Analysis - In-depth analysis of WPM correlation, background noise impact, and language contamination
- PDF Report - Complete report with visualizations
- CSV Export - Raw data for your own analysis
- Visualizations - Individual charts and graphs
Create annotated audio recordings to evaluate how different factors affect Whisper transcription accuracy:
- Speaking pace (fast, normal, slow, mumbled, whispered)
- Background noise (cafe, office, market, conversations, traffic, wind)
- Microphone distance
- Different microphones
# Create virtual environment and install dependencies
uv venv .venv
source .venv/bin/activate
uv pip install -r requirements.txt./run.sh
# or
source .venv/bin/activate && python recorder.py- Select sample text and microphone
- Choose annotations (pace, distance, background noise)
- Add optional notes (noise source URL, volume level, etc.)
- Record, review, then Save or Discard
Speaking Pace:
- As fast as possible
- Quicker than normal
- Normal/conversational
- Careful enunciation
- Deliberately slow
- Mumbled/unclear
- Whispered
- As loud as possible
- Weird/altered voices
Mic Distance:
- Close (< 6 inches)
- Normal (6-12 inches)
- Far (> 12 inches)
Background Noise:
- Silence
- Coffee shop/cafe
- Busy office
- Busy market
- Background music
- Conversation (same/other/mixed language)
- Traffic/street
- Wind (outdoor)
- Other (see notes)
Each recording saves:
{id}.wav- 16kHz mono audio (4-char hex ID){id}.json- metadata with all annotations
Example metadata:
{
"id": "a3f2",
"sample": "sample_01_technology",
"word_count": 138,
"duration_seconds": 62.5,
"annotations": {
"pace": "normal",
"mic_distance": "close",
"background_noise": "cafe",
"notes": "imissmycafe.com at 50% volume"
},
"equipment": {
"microphone": "Samson Q2U Microphone",
"sample_rate": 16000,
"channels": 1
}
}.
├── recorder.py # Recording GUI
├── run.sh # Launcher script
├── requirements.txt # Python dependencies
├── samples/ # Text samples to read
└── dataset/ # Saved audio + metadata