Stars
Flutter makes it easy and fast to build beautiful apps for mobile and beyond
Robust Speech Recognition via Large-Scale Weak Supervision
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
real time face swap and one-click video deepfake with only a single image
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🔊 Text-Prompted Generative Audio Model
State-of-the-art 2D and 3D Face Analysis Project
Wan: Open and Advanced Large-Scale Video Generative Models
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
A concise but complete full-attention transformer with a set of promising experimental features from various papers
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
Examples of ComfyUI workflows
[ICCV 2025] Official implementations for paper: VACE: All-in-One Video Creation and Editing
Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"
[CVPR2025 Highlight] Video Generation Foundation Models: https://saiyan-world.github.io/goku/
Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.
[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
[CVPR 2026] PersonaLive! : Expressive Portrait Image Animation for Live Streaming
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
MLX native implementations of state-of-the-art generative image models
Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
[TMLR] Memory-Guided Diffusion for Expressive Talking Video Generation