Stars
ICCV 2025 ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).
Have a natural, spoken conversation with AI!
[ICCV2025] LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and …
[CVPR 2025] Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency
[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
[CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
A machine learning-based video super resolution and frame interpolation framework. Est. Hack the Valley II, 2018.
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
[ECCV 2024] Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models
[CVPR 2024 Highlight] Enhancing Video Super-Resolution via Implicit Resampling-based Alignment.
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
[ACM MM 2025] FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
zero-shot voice conversion & singing voice conversion, with real-time support
DICE-Talk is a diffusion-based emotional talking head generation method that can generate vivid and diverse emotions for speaking portraits.
Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"
Powerful & Easy-to-Use Video Face Swapping and Editing Software
Open-Sora: Democratizing Efficient Video Production for All
unofficial implementation of Comfyui magic clothing
Installer & Activited Microsoft Office For MacOS
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code