Stars
Code for the paper Hybrid Spectrogram and Waveform Source Separation
so-vits-svc fork with realtime support, improved interface and more features.
vits2 backbone with multilingual-bert
Simultaneous speech-to-text model
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
🔥 2D and 3D Face alignment library build using pytorch
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Official repo for consistency models.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Vector (and Scalar) Quantization, in Pytorch
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
zero-shot voice conversion & singing voice conversion, with real-time support
Have a natural, spoken conversation with AI!
Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"
A python package to analyze and compare voices with deep learning
AudioLDM: Generate speech, sound effects, music and beyond, with text.
A simple, high-quality voice conversion tool focused on ease of use and performance.
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Offline Text To Speech synthesis for python
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
Audio generation using diffusion models, in PyTorch.