Stars
Official implementation of AnimateDiff.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Machine Learning Engineering Open Book
Turn expensive prompts into cheap fine-tuned models
An unofficial PyTorch implementation of the audio LM VALL-E
LLM papers I'm reading, mostly on inference and model compression
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
A concise but complete full-attention transformer with a set of promising experimental features from various papers
Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
WhisperTalk is an audio-to-text model based on the transformer architecture which takes audio input and generates predictions for the next utterance.
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
SoftVC VITS Singing Voice Conversion
Easily train a good VC model with voice data <= 10 mins!
serp-ai / bark-with-voice-clone
Forked from suno-ai/bark🔊 Text-prompted Generative Audio Model - With the ability to clone voices
The code for the bark-voicecloning model. Training and inference.
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, or on-prem).
Code for "Learning to summarize from human feedback"
PyTorch deep learning projects made easy.
Running large language models on a single GPU for throughput-oriented scenarios.
LAVIS - A One-stop Library for Language-Vision Intelligence
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
A modular RL library to fine-tune language models to human preferences