Skip to content

ESPnet version 202511

Latest

Choose a tag to compare

@Fhrozen Fhrozen released this 17 Nov 12:46
2d139b9

Summary

Release date: 2025‑11‑17
Milestone: 202511 – A version that brings a large number of new features, stability improvements, and a refreshed CI & Docker workflow.

ESPnet 202511 introduces robust parallel processing primitives, a fully‑refactored inference & evaluation pipeline, extensive SpeechLM support, and a modernized Docker/CI stack. The release also resolves lingering bugs in codec EMA logic, MPS device handling, and category‑balanced batching while tightening dependency management and documentation quality.


Highlighted Pull Requests

# Title Category Key Impact
6300 Bump js‑yaml from 4.1.0 to 4.1.1 in /doc/vuepress Dep‑Update Secures the documentation build against a prototype‑pollution CVE in yaml merge
6284 codec fix: DDP logic and dead code revival logic Bugfix Restores EMA state for dead‑code recovery and synchronizes codec updates across all DDP workers
6286 [SpeechLM] Deepspeed trainer New Feature Adds full DeepSpeed support (train.py + deepspeed_trainer.py) for large‑scale SpeechLM training
6279 [SpeechLM] model, preprocessor and collect_stats New Feature Core SpeechLM components – job templates, preprocessing, multimodal IO, and stats collection
6278 [SpeechLM] Deepspeed trainer New Feature See above – DeepSpeed integration for SpeechLM workflows
6276 Docker Updates Refactor Upgrades Ubuntu 24.04, CUDA 12.6, PyTorch 2.8.0, and transitions to Miniforge; modernizes Dockerfile syntax
6275 CI Installation fix Bugfix Adds --no-build-isolation for editable installs, improving reproducibility across CI environments
6273 [ESPnet‑Codec] Bug fix on codec activation function Bugfix Enables BF16 inference by registering torch.ones for auto‑cast
6272 Add Pytorch version 2.9 Dep‑Update Extends supported PyTorch releases (2.5.1, 2.7.1, 2.8.0, 2.9.0) in CI and docs
6263 [ESPnet‑3] Merge master into espnet3 branch Merge Syncs espnet3 with master, fixing CI and dependency mismatches
6260 SpeechLM Data Infra: dataset management New Feature Implements data registry, dataset loaders, and configuration templates for SpeechLM
6259 pre‑commit.ci autoupdate Tooling Updates black and isort to latest stable versions
6255 Fix default batch sampler fallback for category iterator Bugfix Restores legacy foldedcatbel mapping, improving backward compatibility
6253 Restrict Docker Github Actions to Original Repo Security Prevents accidental image publishing from forks or non‑master branches
6249 [espnet3‑7] Add Callbacks New Feature Adds AverageCheckpointsCallback and standard callback factory for Lightning trainers
6248 Get forced alignments from CTC model Feature Enables forced alignment extraction for any CTC‑based S2T model
6246 MPS Support for loading float64 models Bugfix Handles float‑64 to float‑32 conversion for MPS device, avoiding dtype errors
6244 LID‑7: VoxLingua107 recipe Recipe Adds a new spoken‑language‑identification recipe for VoxLingua107
6243 [espnet‑3] Merge master into espnet3 and fixed CI Merge Syncs espnet3 with master, removing underthesea dependency
6239 Upgrade pyopenjtalk to 0.4.1 Dep‑Update Updates pyopenjtalk installer to the latest version
6238 Add Pytorch version 2.9 Dep‑Update See 6272
6238 Package Build Patch Build Moves g2p_en & ctc‑segmentation installation to Makefile, fixing pip package build
6238 Docker Updates Refactor See 6276
6238 CI Installation fix Bugfix See 6275
6238 [ESPnet‑Codec] Bug fix on codec activation function Bugfix See 6273
6238 Add Pytorch version 2.9 Dep‑Update See 6272
6227 Terry/parallelize spk emb extraction Feature Parallel speaker‑embedding extraction for TTS recipes
6210 LID‑8: CI and unit tests Test Adds comprehensive unit tests for LID functionality
6178 [espnet3‑6] Add evaluation scripts Feature Modularizes inference & evaluation pipelines in espnet3
6179 [espnet3] ESPnet1 Support Sunset Refactor Removes legacy ESPnet1 support, consolidates to espnet2.legacy
6177 Merge master into espnet3 Merge Syncs espnet3 with master, fixing CI issues
6175 [espnet3‑5] Add parallel module and collect_stats Feature Adds Dask‑based parallel processing and collect_stats for data stats collection
6174 LID‑7: VoxLingua107 recipe Recipe See 6244
6173 LID‑8: CI and unit tests Test See 6210
6172 [espnet3‑5] Add parallel module and collect_stats Feature See 6175
6171 [espnet3‑5] Add parallel module and collect_stats Feature See 6175
6170 LID‑8: CI and unit tests Test See 6210
6168 [espnet3‑5] Add parallel module and collect_stats Feature See 6175
6165 LID‑8: CI and unit tests Test See 6210
6164 LID‑8: CI and unit tests Test See 6210
6163 LID‑8: CI and unit tests Test See 6210
6162 LID‑8: CI and unit tests Test See 6210
6161 LID‑8: CI and unit tests Test See 6210
6160 LID‑8: CI and unit tests Test See 6210
6159 LID‑7: VoxLingua107 recipe Recipe See 6244
6158 LID‑7: VoxLingua107 recipe Recipe See 6244
6157 LID‑7: VoxLingua107 recipe Recipe See 6244
6156 LID‑7: VoxLingua107 recipe Recipe See 6244
6155 LID‑7: VoxLingua107 recipe Recipe See 6244
6154 LID‑7: VoxLingua107 recipe Recipe See 6244

Note: The table above summarizes the most impactful PRs for this release. Several PRs are grouped by shared functionality (e.g., SpeechLM, Docker, and LID). Contributors for these changes include dependabot[bot], whr-a, chinjouli, jctian98, Fhrozen, Masao‑Someki, KanTakahiro, akreal, pre‑commit‑ci[bot], Qingzheng‑Wang, Shikhar‑S, SanderGi, sw005320, and ZhuoyanTao.


Key Takeaways

  • Parallelism & Scalability – Dask‑based espnet3.parallel, collect_stats, and new callbacks enable efficient distributed training, inference, and checkpoint ensembling.
  • SpeechLM Maturity – Core modules, DeepSpeed integration, multimodal IO, and data infrastructure create a solid foundation for large‑scale speech‑language models.
  • Stability & Security – Updated dependencies (js‑yaml, PyTorch, CUDA), Docker 12.6, and Miniforge; bugfixes for codec EMA, MPS device handling, and category sampling.
  • CI & Packaging – Modernized GitHub Actions, improved pip install flags, and new Docker images for Ubuntu 24.

What's Changed (Full changelog)

New Features

Recipe

Bugfix

Documentation

Refactoring

  • [espnet3] ESPnet1 Support Sunset and Migration to espnet2.legacy (See #6179, by @Masao-Someki)

Others

Acknowledgements

@Fhrozen, @KanTakahiro, @Masao-Someki, @Qingzheng-Wang, @SanderGi, @Shikhar-S, @ZhuoyanTao, @akreal, @chinjouli, @dependabot[bot], @jctian98, @sw005320, @whr-a.