What's Changed
- Add LongCat-AudioDiT 1B TTS model by @Blaizzy in #627
- feat: add WebM audio format support by @regcs in #635
- Add MkDocs docs site and docs guardrails by @shreyaskarnik in #626
- Update branch for GitHub Actions workflow by @Blaizzy in #639
- feat: add MeloTTS-English MLX port by @shreyaskarnik in #629
- feat: add OmniVoice zero-shot multilingual TTS (646+ languages) by @beshkenadze in #630
- Register client disconnects while streaming TTS audio. by @orbitalquark in #634
- fix(kokoro): support quantized checkpoint layout and guard NaN durations by @beshkenadze in #624
- Remove docs check for user-facing changes by @Blaizzy in #658
- fix(stt): correct granite_speech Conv1d weight sanitization and add parakeet model_type by @ryancee in #657
- fix(cohere): restore quantized inference for 8-bit and 4-bit checkpoints by @beshkenadze in #650
- feat(irodori-tts): add v2 model support with VoiceDesign and chunked DACVAE decode by @yoshphys in #660
- feat: add Higgs Audio v2 — 3B Llama-backed TTS with voice cloning by @Kairos-a in #656
- Remove librosa dependency by @lucasnewman in #662
- Replace all soundfile calls with core equivalents by @lucasnewman in #663
- Move misaki to an optional install to reduce dependency graph by @lucasnewman in #664
- Improve performance of Parakeet TDT on longform content by @lucasnewman in #665
- Fix Voxtral Realtime streaming and speed up the 4-bit path by ~3x by @iris-sfg in #661
- feat(higgs_audio): add ReferenceContext for reusable encoded-reference state by @Kairos-a in #666
- Fix Voxtral TTS tokenizer dependency contract by @lyonsno in #633
- Remove pyloudnorm dependency by @lucasnewman in #667
- Support concurrent requests to the server by @lucasnewman in #668
- Add a standard model loading path for STS models by @lucasnewman in #670
- Remove pydub dependency by @lucasnewman in #671
- Clean up bare scipy usage by @lucasnewman in #672
- Remove explicit tiktoken dependency by @lucasnewman in #673
- docs: add Svara TTS (multilingual Indic) entry by @shreyaskarnik in #678
- Fix Voxtral STT crash on eos_token_ids initialization by @contrapuntal in #677
- feat: add Mel-Band-RoFormer architecture for vocal source separation by @xocialize in #654
- Improved dep handling for mlx-lm by @lucasnewman in #683
- Add MOSS-TTS-Nano by @lucasnewman in #676
- docs: add shields.io badges and table of contents to README by @Gingiris in #680
- Adjust Trendshift badge to README by @Blaizzy in #684
- Add batching support for Fish Speech S2 Pro by @lucasnewman in #675
- Add continuous batching support for Qwen3 TTS to the server by @lucasnewman in #674
New Contributors
- @regcs made their first contribution in #635
- @ryancee made their first contribution in #657
- @Kairos-a made their first contribution in #656
- @iris-sfg made their first contribution in #661
- @lyonsno made their first contribution in #633
- @contrapuntal made their first contribution in #677
- @xocialize made their first contribution in #654
- @Gingiris made their first contribution in #680
Full Changelog: v0.4.2...v0.4.3