Stars
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
A local markdown preview server. npx mdts — and you're done.
JGLUE: Japanese General Language Understanding Evaluation
Long-form streaming TTS system for multi-speaker dialogue generation
OneShot Learning-based hotword detection.
Codename's rvc fork version 3, based on Applio.
litagin02 / Style-Bert-VITS2
Forked from fishaudio/Bert-VITS2Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
zero-shot voice conversion & singing voice conversion, with real-time support
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoregressive.
Python interface to the WebRTC Voice Activity Detector
The lean application framework for Python. Build sophisticated user interfaces with a simple Python API. Run your apps in the terminal and a web browser.
Library for building powerful interactive command line applications in Python
Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
Faster Whisper transcription with CTranslate2