librosa
soundfile
accelerate
ffmpeg
torchaudio
transformers==4.45.12.1.1 Download the LibriSpeech dataset from LibriSpeech.
The model will be saved in checkpoints/model.
By default, the pre-trained model laion/clap-htsat-unfused is used.
python train.pyNote: Please remember to modify data path and other parameters in train.py before running.
cd local
pip install -r requirements.txtpython generate.pypython clap_opt_1_minut.pyThe sample code is in local/vote.py.
cd local
python vote.py