Stars
Code for the paper Hybrid Spectrogram and Waveform Source Separation
so-vits-svc fork with realtime support, improved interface and more features.
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
🔥 2D and 3D Face alignment library build using pytorch
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Official repo for consistency models.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Vector (and Scalar) Quantization, in Pytorch
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
zero-shot voice conversion & singing voice conversion, with real-time support
Have a natural, spoken conversation with AI!
Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"
A python package to analyze and compare voices with deep learning
AudioLDM: Generate speech, sound effects, music and beyond, with text.
A simple, high-quality voice conversion tool focused on ease of use and performance.
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Offline Text To Speech synthesis for python
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
Audio generation using diffusion models, in PyTorch.
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning