[INTERSPEECH 2026] Domain-Aware Mispronunciation Detection and Diagnosis Using Language-Specific Statistical Graphs
From the repository root, run:
conda create -n mdd python==3.10.12
conda activate mdd
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txtThe statistics script is inside the data/ folder. Run it from there so it can read train.csv and write the JSON outputs:
cd data
python stat.pyThis will generate language-specific statistic files such as:
data_arabic.jsondata_mandarin.jsondata_hindi.jsondata_korean.jsondata_spanish.jsondata_vietnamese.json
These files contain the computed confusion statistics for each language.
The code assumes all audio files referenced by train.csv, dev.csv, and test.csv exist under EN_MDD/WAV/.
- Copy or move all
.wavfiles intoEN_MDD/WAV/ - Make sure the
Pathcolumn values indata/train.csv,data/dev.csv, anddata/test.csvmatch the file names underEN_MDD/WAV/ - If
Pathincludes subdirectories, preserve the same relative structure underEN_MDD/WAV/
The project currently uses a hardcoded audio base path in these files:
dataloader.pyline 32:./EN_MDD/WAV/train.pyline 168:./EN_MDD/WAV/
If your audio files are stored elsewhere, update those paths accordingly.
From the repository root, run:
python train.pyOptional arguments:
python train.py --num_epoch 100 --batch_size 4 --lr 2e-5