Starred repositories
TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis
repo collection for NVIDIA Audio2Face-3D models and tools
Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"
This repository provides a comprehensive sample project showcasing the integration of Meta's Avatars with the Meta XR Interaction SDK in Unity. It serves as a practical guide for developers, demons…
A service to convert audio to facial blendshapes for lipsyncing and facial performances.
fay是一个帮助数字人(2.5d、3d、移动、pc、网页)或大语言模型(openai兼容、deepseek)连通业务系统的agent框架。
Towards Real-Time Diffusion-Based Streaming Video Super-Resolution — An efficient one-step diffusion framework for streaming VSR with locality-constrained sparse attention and a tiny conditional de…
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
GMTalker 由光明实验室媒体智能团队打造的3d数字人。系统集成了语音识别、语音合成、自然语言理解、嘴型动画驱动。支持windows、Linux、安卓快速部署。
deepbeepmeep / Wan2GP
Forked from Wan-Video/Wan2.1A fast AI Video Generator for the GPU Poor. Supports Wan 2.1/2.2, Qwen Image, Hunyuan Video, LTX Video and Flux.
We present StableAvatar, the first end-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-processing, conditioned on a re…
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…
[AAAI 2026] EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
Kaleido: Open-sourced multi-subject reference video generation model, enabling controllable, high-fidelity video synthesis from multiple image references.
rtmp streaming from opencv with ffmpeg / avcodec using C++ or Python
实时交互数字人,可自定义形象与音色,支持音色克隆,对话延迟低至3s。Real-time voice interactive digital human, customizable appearance and voice, supporting voice cloning, with initial package delay as low as 3s.
Digital Human Resource: 2D/3D/4D Human Modeling, Avatar Generation & Animation, Clothed People Digitalization, Virtual Try-On, and Others.
A curated list of awesome research papers, projects, code, dataset, workshops etc. related to virtual try-on.