Implements the LIR-ASR correction paradigm proposed in: 📄 “Listening, Imagining & Refining: A Heuristic Optimized ASR Correction Framework with LLMs” (arXiv:2509.15095)
- 📂 Datasets
- 📊 Evaluation Metrics
- ⚙️ Supported ASR Engines
- 🧩 LIR-ASR Correction Framework
- 🚀 Usage
- 📚 References
The framework supports the following datasets (and can be extended easily):
- LibriSpeech (test-clean / test-other)
- TED-LIUM
- CommonVoice
- Multilingual LibriSpeech (MLS)
- VoxPopuli
- Fleurs (multi-language, with helper script
scripts/download_fleurs.sh)
We evaluate ASR and correction performance using:
- 📝 Word Error Rate (WER) — word-level edit distance / reference count
- ✒️ Punctuation Error Rate (PER) — errors in
.,,,?, etc. - ⏱️ Core-Hour — CPU hours per 1h audio (for local models)
- 💾 Model Size — size (MB) of acoustic + language models
⚠️ For cloud ASR services, Core-Hour and Model Size are not reported.
| Cloud Services | Local / Open Models |
|---|---|
| Amazon Transcribe | OpenAI Whisper (tiny → large) |
| Azure Speech-to-Text | whisper.cpp |
| Google Speech-to-Text | Coqui STT |
| IBM Watson | Custom ASR models |
| Picovoice Cheetah | |
| Picovoice Leopard |
LIR-ASR (Listening → Imagining → Refining) is an iterative correction framework inspired by how humans “rehear” ambiguous speech.
Pipeline:
- 👂 Listening — detect uncertain/misrecognized words
- 💭 Imagining — generate candidate variants (phonetic substitutions, G2P)
- ✨ Refining — use LLMs / scoring models to pick the most consistent
Key Features:
- 📌 FSM Controller — 3 states: NoSearch → Search → Search++
- 🧭 Heuristic Optimization — rule-based semantic constraints
- 🔄 Iterative Refinement — until convergence, ensuring monotonic score improvements
✅ Achieves ~1.5% CER/WER improvements on average over uncorrected baselines.
# Install dependencies
pip3 install -r requirements.txt
# Prepare datasets
sh scripts/download_fleurs.sh # example for Fleursfrom optim import prompt_optimization, evolutionary_prompt_optimization, nbest_optimization, RIF_ASR
from normalizer import Normalizer
from languages import Languages
# Init normalizer
norm = Normalizer.create(language=Languages.ZH, keep_punctuation=True, punctuation_set=".?")
asr_text = "由于分离和重组便宜在每一代的两个库之间来回变动"
# 1. Prompt optimization
corrected = prompt_optimization(asr_text, llm="Qwen3-235B")
# 2. Evolutionary optimization
refined = evolutionary_prompt_optimization(asr_text, llm="Qwen3-235B")
# 3. RIF-ASR with multiple candidates
result = RIF_ASR(asr_text, llm="Qwen3-235B", language="ZH", normalizer=norm)
sh scripts/evaluation_whisper_qwen.sh
@misc{liu2025listeningimaginingrefining,
title={Listening, Imagining & Refining: A Heuristic Optimized ASR Correction Framework with LLMs},
author={Yutong Liu and Ziyue Zhang and Cheng Huang and Yongbin Yu and Xiangxiang Wang and Yuqing Cai and Nyima Tashi},
year={2025},
eprint={2509.15095},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2509.15095},
}