Skip to content

[espnet3] Add LibriTTS recipe (for VITS model)#6451

Draft
NewGamezzz wants to merge 61 commits into
espnet:masterfrom
NewGamezzz:libritts_vits
Draft

[espnet3] Add LibriTTS recipe (for VITS model)#6451
NewGamezzz wants to merge 61 commits into
espnet:masterfrom
NewGamezzz:libritts_vits

Conversation

@NewGamezzz

@NewGamezzz NewGamezzz commented May 28, 2026

Copy link
Copy Markdown
Contributor

What did you change?

Add an ESPnet3 LibriTTS recipe for the VITS model (multi-speaker TTS)


Why did you make this change?

This PR adds a new recipe for LibriTTS + VITS to egs3/. It introduces three new stages:

  • compute_xvectors — extracts SpeechBrain/ECAPA-TDNN speaker embeddings.
  • remove_long_short — filters utterances by duration (matches the espnet2 tts.sh behavior).
  • create_token_list — builds the phoneme/character/BPE token list from a manifest (mirrors espnet2/bin/tokenize_text.py).

Is your PR small enough?

19 files, ~2000 lines added.


Additional Context

This PR is currently a draft. I will mark it as ready for review once VITS training finishes and inference is verified end-to-end.

Plan / checklist:

  • Implement provider and runner for parallel x-vector computation
  • Add the new stages to src/system.py
  • Add a stage for metric computation using Versa
  • Verify all stages end-to-end:
    • compute_xvectors
    • remove_long_short
    • create_token_list
    • collect_stats
    • train (in progress)
    • infer
    • metrics
  • Recipe README.md

NewGamezzz and others added 30 commits May 29, 2026 02:11
remove chapter name from speaker key

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant