Highlights
- Pro
Stars
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
ASCEND Chinese-English code-switching dataset
Google Chromium, sans integration with Google
Skills for Real Engineers. Straight from my .claude directory.
Command line utility for forced alignment using Kaldi
Robust Speech Recognition via Large-Scale Weak Supervision
[ICASSP'26] Real-time streaming voice anonymization & voice conversion
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A lightweight psychoacoustic bass enhancement plugin - in stereo where available!
A lightweight, local-first, and π experiment tracking library from Hugging Face π€
Pytorch implementation of MaskGIT: Masked Generative Image Transformer (https://arxiv.org/pdf/2202.04200.pdf)
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.
An opinionated docker container for a web-interface around the music organizer beets
Beets plugin to manage external files
π Stemgen is a Stem file generator. Convert any track into a Stem and have fun with Traktor.
Download Tidal tracks, videos, albums, playlists & artists! Tidal downloader that supports master quality.
Audio Dataset for training CLAP and other models
Official implementation of the paper MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
Generative Model Evaluation Lab - An evaluation suite for your generative models.
an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
TorchCFM: a Conditional Flow Matching library
π₯ Python and OpenCV-based scene cut/transition detection program & library.
Livecoding networked visuals in the browser
Awesome list for vjing/visuals-related resources
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.