Stars
Real-time streaming voice anonymization & voice conversion
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A lightweight psychoacoustic bass enhancement plugin - in stereo where available!
A lightweight, local-first, and 🆓 experiment tracking library from Hugging Face 🤗
Pytorch implementation of MaskGIT: Masked Generative Image Transformer (https://arxiv.org/pdf/2202.04200.pdf)
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.
An opinionated docker container for a web-interface around the music organizer beets
Beets plugin to manage external files
🎛 Stemgen is a Stem file generator. Convert any track into a Stem and have fun with Traktor.
Download Tidal tracks, videos, albums, playlists & artists! Tidal downloader that supports master quality.
Audio Dataset for training CLAP and other models
Official implementation of the paper MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
Generative Model Evaluation Lab - An evaluation suite for your generative models.
an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
TorchCFM: a Conditional Flow Matching library
🎥 Python and OpenCV-based scene cut/transition detection program & library.
Livecoding networked visuals in the browser
Awesome list for vjing/visuals-related resources
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Frontier Multimodal Foundation Models for Image and Video Understanding
Code, slides, and examples from my generative AI video course... taking you all the way from VAEs to near real-time Stable Diffusion with PyTorch and Hugging Face!
Fine-tune Stable Audio Open with DiT ControlNet.
This repo contains the source code of the first deep learning-base singing voice beat tracking system. It leverages WavLM and DistilHuBERT pre-trained speech models to create vocal embeddings and t…
Flexible LoRA Implementation to use with stable-audio-tools