Skip to content

vvwangvv/urgent2025_challenge

 
 

Repository files navigation

urgent2025_challenge

Official data preparation scripts for the URGENT 2025 Challenge.

The metadata files generated by this repo is compatible with the baseline code. See the instruction for more details about how to run the baseline code.

Updates

❗️❗️[2024-11-18] We have added some missing files which are necessary for data preparation in Track 2, commonvoice_19.0_es_train_track2.json.gz. If you cloned the repogitory before Nov. 18, please pull the latest commit.

❗️❗️[2024-11-16] We have modified some data preparation and evaluation scripts. If you cloned the repogitory before Nov. 16, please pull the latest commit.

Notes

  • The default generated data/speech_train subset is only intended for dynamic mixing (on-the-fly simulation) in the ESPnet framework. It has the same content in spk1.scp (clean reference speech) and wav.scp (noisy speech) files to facilitate on-the-fly simulation of different distortions.

  • The validation set made by this script is different from the official validation set used in the leaderboard, although the data source and the type of distortions do not change. The official one will be provided on when the leaderboard opens (Nov. 25). Note that we only provide the noisy data but not the ground truth of the official validation set until the leaderboard swithces to test phase (Dec. 23) to avoid cheating in the leaderboard.

  • The unofficial validation set made by this script can be used to select the best checkpoint. Participants can freely change the configuration to generate the unofficial validation set.

Requirements

  • >8 Cores
  • At least 1.3 TB of free disk space for the track 1 and ??? TB for the track 2
  • Note that we only counted audio files and did not include the size of archived files (e.g., .zip or .tar.gz files)
    • Speech
      • DNS5 speech (original 131 GB + resampled 187 GB): 318 GB
      • LibriTTS (original 44 GB + resampled 7 GB): 51 GB
      • VCTK: 12 GB
      • WSJ (original sph 24GB + converted 31 GB): 55 GB
      • EARS: 61 GB
      • CommonVoice 19.0 speech
        • Track 1 (original mp3 221 GB + resampled 200 GB): 421 GB
        • Track 2 (original mp3 221 GB + resampled fr102+ g23 GB): ??? GB
      • MLS (less compressed version downloaded from LibriVox)
        • Track 1 (original 60 GB + resampled 60 GB): 120 GB
        • Track 2 (original 6TB + resampled ???TB): ???TB
    • Noise
      • DNS5 noise (original 58 GB + resampled 35 GB): 93 GB
      • WHAM! noise (48 kHz): 76 GB
      • FSD50K (original 24 GB + resampled 6 GB): 30 GB
      • FMA: (original 24 GB + resampled 36 GB): 60 GB
    • RIR
      • DNS5 RIRs (48 kHz): 6 GB
    • Others
      • default simulated validation data: 2 GB
      • simulated wind noise for training (with default config): 1 GB

Instructions

  1. After cloning this repository, run the following command to initialize the submodules:

    git submodule update --init --recursive
  2. Install environmemnt. Python 3.10 and Torch 2.0.1+ are recommended. With Conda, just run

    conda env create -f environment.yaml
    conda activate urgent2025

    In case of the following error

      ERROR: Failed building wheel for pypesq
    ERROR: Could not build wheels for pypesq, which is required to install pyproject.toml-based projects
    

    you could manually install pypesq in advance via: (make sure you have numpy installed before trying this to avoid compilation errors)

    python -m pip install https://github.com/vBaiCai/python-pesq/archive/master.zip
  3. Get the download link of Commonvoice dataset v19.0 from https://commonvoice.mozilla.org/en/datasets

    For German, English, Spanish, French, and Chinese (China), please do the following.

    a. Select Common Voice Corpus 19.0

    b. Enter your email and check the two mandatory boxes

    c. Right-click the Download Dataset Bundle button and select "Copy link"

    d. Paste the link to utils/prepare_CommonVoice19_speech.sh

  4. Make a symbolic link to wsj0 and wsj1 data

    a. Make a directory ./wsj

    b. Make a symbolic link to wsj0 and wsj1 under ./wsj (./wsj/wsj0/ and ./wsj/wsj1/)

  1. FFmpeg-related

    To simulate wind noise and codec artifacts, our scripts utilize FFmpeg.

    a. Activate your python environment

    b. Get the path to FFmpeg by which ffmpeg

    c. Change /path/to/ffmpeg in simulation/simulate_data_from_param.py to the path to your ffmpeg.

  2. Run the script

    ./prepare_espnet_data.sh

    NOTE: Please do not change output_dir in each shell script called in prepare_{dataset}.sh. If you want to download datasets to somewhere else, make a symbolic link to that directory.

    # example when you want to download FSD50K noise to /path/to/somewhere
    # prepare_fsd50k_noise.sh specifies ./fsd50k as output_dir, so make a symbolic link from /path/to/somewhere to ./fsd50k
    mkdir -p /path/to/somewhere
    ln -s /path/to/somewhere ./fsd50k
  3. Install eSpeak-NG (used for the phoneme similarity metric computation)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 51.1%
  • Shell 29.4%
  • Perl 19.5%