Whisper WPM & Background Noise Evaluation

A "back of the envelope" evaluation intended to answer two questions I have about ASR/STT performance.

Dataset: danielrosehill/ASR-WPM-And-Background-Noise-Eval

Results & Analysis

Results Table - Full tabulated results sorted by Word Error Rate
AI Analysis - In-depth analysis of WPM correlation, background noise impact, and language contamination
PDF Report - Complete report with visualizations
CSV Export - Raw data for your own analysis
Visualizations - Individual charts and graphs

Purpose

Create annotated audio recordings to evaluate how different factors affect Whisper transcription accuracy:

Speaking pace (fast, normal, slow, mumbled, whispered)
Background noise (cafe, office, market, conversations, traffic, wind)
Microphone distance
Different microphones

Setup

# Create virtual environment and install dependencies
uv venv .venv
source .venv/bin/activate
uv pip install -r requirements.txt

Usage

./run.sh
# or
source .venv/bin/activate && python recorder.py

Select sample text and microphone
Choose annotations (pace, distance, background noise)
Add optional notes (noise source URL, volume level, etc.)
Record, review, then Save or Discard

Annotations

Speaking Pace:

As fast as possible
Quicker than normal
Normal/conversational
Careful enunciation
Deliberately slow
Mumbled/unclear
Whispered
As loud as possible
Weird/altered voices

Mic Distance:

Close (< 6 inches)
Normal (6-12 inches)
Far (> 12 inches)

Background Noise:

Silence
Coffee shop/cafe
Busy office
Busy market
Background music
Conversation (same/other/mixed language)
Traffic/street
Wind (outdoor)
Other (see notes)

Output

Each recording saves:

{id}.wav - 16kHz mono audio (4-char hex ID)
{id}.json - metadata with all annotations

Example metadata:

{
  "id": "a3f2",
  "sample": "sample_01_technology",
  "word_count": 138,
  "duration_seconds": 62.5,
  "annotations": {
    "pace": "normal",
    "mic_distance": "close",
    "background_noise": "cafe",
    "notes": "imissmycafe.com at 50% volume"
  },
  "equipment": {
    "microphone": "Samson Q2U Microphone",
    "sample_rate": 16000,
    "channels": 1
  }
}

Directory Structure

.
├── recorder.py         # Recording GUI
├── run.sh              # Launcher script
├── requirements.txt    # Python dependencies
├── samples/            # Text samples to read
└── dataset/            # Saved audio + metadata

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
api-ref		api-ref
notes		notes
results		results
samples		samples
synthetic-music		synthetic-music
visualizations		visualizations
.gitignore		.gitignore
AI_ANALYSIS.md		AI_ANALYSIS.md
HF_README.md		HF_README.md
README.md		README.md
Whisper_ASR_Evaluation_Report.pdf		Whisper_ASR_Evaluation_Report.pdf
analyze_contamination.py		analyze_contamination.py
evaluate.py		evaluate.py
generate_pdf_report.py		generate_pdf_report.py
generate_visualizations.py		generate_visualizations.py
noise-environments.md		noise-environments.md
recorder.py		recorder.py
requirements.txt		requirements.txt
results.csv		results.csv
results.md		results.md
run.sh		run.sh
wer-exclusions.md		wer-exclusions.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Whisper WPM & Background Noise Evaluation

Results & Analysis

Purpose

Setup

Usage

Annotations

Output

Directory Structure

About

Uh oh!

Languages

danielrosehill/Whisper-WPM-Background-Noise-Eval

Folders and files

Latest commit

History

Repository files navigation

Whisper WPM & Background Noise Evaluation

Results & Analysis

Purpose

Setup

Usage

Annotations

Output

Directory Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages