urgent2025_challenge

Official data preparation scripts for the URGENT 2025 Challenge.

The metadata files generated by this repo is compatible with the baseline code. See the instruction for more details about how to run the baseline code.

Updates

❗️❗️[2024-11-18] We have added some missing files which are necessary for data preparation in Track 2, commonvoice_19.0_es_train_track2.json.gz. If you cloned the repogitory before Nov. 18, please pull the latest commit.

❗️❗️[2024-11-16] We have modified some data preparation and evaluation scripts. If you cloned the repogitory before Nov. 16, please pull the latest commit.

Notes

The default generated data/speech_train subset is only intended for dynamic mixing (on-the-fly simulation) in the ESPnet framework. It has the same content in spk1.scp (clean reference speech) and wav.scp (noisy speech) files to facilitate on-the-fly simulation of different distortions.
The validation set made by this script is different from the official validation set used in the leaderboard, although the data source and the type of distortions do not change. The official one will be provided on when the leaderboard opens (Nov. 25). Note that we only provide the noisy data but not the ground truth of the official validation set until the leaderboard swithces to test phase (Dec. 23) to avoid cheating in the leaderboard.
The unofficial validation set made by this script can be used to select the best checkpoint. Participants can freely change the configuration to generate the unofficial validation set.

Requirements

>8 Cores
At least 1.3 TB of free disk space for the track 1 and ??? TB for the track 2
Note that we only counted audio files and did not include the size of archived files (e.g., .zip or .tar.gz files)
- Speech
  - DNS5 speech (original 131 GB + resampled 187 GB): 318 GB
  - LibriTTS (original 44 GB + resampled 7 GB): 51 GB
  - VCTK: 12 GB
  - WSJ (original sph 24GB + converted 31 GB): 55 GB
  - EARS: 61 GB
  - CommonVoice 19.0 speech
    - Track 1 (original mp3 221 GB + resampled 200 GB): 421 GB
    - Track 2 (original mp3 221 GB + resampled fr102+ g23 GB): ??? GB
  - MLS (less compressed version downloaded from LibriVox)
    - Track 1 (original 60 GB + resampled 60 GB): 120 GB
    - Track 2 (original 6TB + resampled ???TB): ???TB
- Noise
  - DNS5 noise (original 58 GB + resampled 35 GB): 93 GB
  - WHAM! noise (48 kHz): 76 GB
  - FSD50K (original 24 GB + resampled 6 GB): 30 GB
  - FMA: (original 24 GB + resampled 36 GB): 60 GB
- RIR
  - DNS5 RIRs (48 kHz): 6 GB
- Others
  - default simulated validation data: 2 GB
  - simulated wind noise for training (with default config): 1 GB

Instructions

After cloning this repository, run the following command to initialize the submodules:
```
git submodule update --init --recursive
```

Install environmemnt. Python 3.10 and Torch 2.0.1+ are recommended. With Conda, just run

conda env create -f environment.yaml
conda activate urgent2025

In case of the following error
  ERROR: Failed building wheel for pypesq
ERROR: Could not build wheels for pypesq, which is required to install pyproject.toml-based projects
you could manually install pypesq in advance via: (make sure you have numpy installed before trying this to avoid compilation errors)
python -m pip install https://github.com/vBaiCai/python-pesq/archive/master.zip

Get the download link of Commonvoice dataset v19.0 from https://commonvoice.mozilla.org/en/datasets

For German, English, Spanish, French, and Chinese (China), please do the following.

a. Select Common Voice Corpus 19.0

b. Enter your email and check the two mandatory boxes

c. Right-click the Download Dataset Bundle button and select "Copy link"

d. Paste the link to utils/prepare_CommonVoice19_speech.sh
Make a symbolic link to wsj0 and wsj1 data

a. Make a directory ./wsj

b. Make a symbolic link to wsj0 and wsj1 under ./wsj (./wsj/wsj0/ and ./wsj/wsj1/)

FFmpeg-related

To simulate wind noise and codec artifacts, our scripts utilize FFmpeg.

a. Activate your python environment

b. Get the path to FFmpeg by which ffmpeg

c. Change /path/to/ffmpeg in simulation/simulate_data_from_param.py to the path to your ffmpeg.

Run the script

./prepare_espnet_data.sh

NOTE: Please do not change output_dir in each shell script called in prepare_{dataset}.sh. If you want to download datasets to somewhere else, make a symbolic link to that directory.

# example when you want to download FSD50K noise to /path/to/somewhere
# prepare_fsd50k_noise.sh specifies ./fsd50k as output_dir, so make a symbolic link from /path/to/somewhere to ./fsd50k
mkdir -p /path/to/somewhere
ln -s /path/to/somewhere ./fsd50k

Install eSpeak-NG (used for the phoneme similarity metric computation)
- Follow the instructions in https://github.com/espeak-ng/espeak-ng/blob/master/docs/guide.md#linux

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
conf		conf
datafiles		datafiles
evaluation_metrics		evaluation_metrics
lib		lib
simulation		simulation
utils		utils
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
prepare_espnet_data.sh		prepare_espnet_data.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

urgent2025_challenge

Updates

Notes

Requirements

Instructions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

urgent2025_challenge

Updates

Notes

Requirements

Instructions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages