Lists (1)
Sort Name ascending (A-Z)
Stars
Music repair method to convert lossy MP3 compressed music to lossless music.
Main reference implementation for NLWeb, implemented in Python.
ACE-Step: A Step Towards Music Generation Foundation Model
Robust Speech Recognition via Large-Scale Weak Supervision
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Audio Plugin for Audio to MIDI transcription using deep learning.
⏩ Ship faster with Continuous AI. Open-source CLI that can be used in TUI mode as a coding agent or Headless mode to run background agents
A zero-config VS Code database extension with affordances to aid development and debugging.
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Code for FLAVR: A fast and efficient frame interpolation technique.
FILM: Frame Interpolation for Large Motion, In ECCV 2022.
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Nodes related to video workflows
A custom node set for Video Frame Interpolation in ComfyUI.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
The official GitHub page for the survey paper "Foundation Models for Music: A Survey".
Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
A single Gradio + React WebUI with extensions for ACE-Step, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, …