Lists (1)
Sort Name ascending (A-Z)
Stars
Teravus / TranscriberModelDemo
Forked from QwenLM/Qwen2.5-OmniA transformers-based song transcriber using Ace-Step's Qwen2.5-Omni-based song transcription model, acestep-transcriber
The most powerful local music generation model that outperforms most commercial alternatives
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Music repair method to convert lossy MP3 compressed music to lossless music.
Main reference implementation for NLWeb, implemented in Python.
ACE-Step: A Step Towards Music Generation Foundation Model
Robust Speech Recognition via Large-Scale Weak Supervision
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Audio Plugin for Audio to MIDI transcription using deep learning.
⏩ Ship faster with Continuous AI. Open-source CLI that can be used in Headless mode to run async cloud agents or TUI mode as an in sync coding agent
A zero-config VS Code database extension with affordances to aid development and debugging.
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Code for FLAVR: A fast and efficient frame interpolation technique.
FILM: Frame Interpolation for Large Motion, In ECCV 2022.
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Nodes related to video workflows
A custom node set for Video Frame Interpolation in ComfyUI.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
The official GitHub page for the survey paper "Foundation Models for Music: A Survey".