Stars
FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation
Official repository for the paper "MICo-150K: A Comprehensive Dataset for Multi-Image Composition".
ModelTC / Wan2.2-Lightning
Forked from Wan-Video/Wan2.2Wan2.2-Lightning: Speed up wan2.2 model with distillation
We present FlashPortrait, an end-to-end video diffusion transformer capable of synthesizing ID-preserving, infinite-length videos while achieving up to 6$\times$ acceleration in inference speed.
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis; ICLR 2024 Spotlight; Official code
GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code
StreamDiffusion, Live Stream APP
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer
Pose Extraction & Rendering for SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations
Offical Implementation of SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations
PersonaLive! : Expressive Portrait Image Animation for Live Streaming
Automate your mobile devices with natural language commands - an LLM agnostic mobile Agent 🤖
Source code of the paper "V-Droid: Advancing Mobile GUI Agent Through Generative Verifiers"
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
RIFE, Real-Time Intermediate Flow Estimation for Video Frame Interpolation implemented with ncnn library
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone
Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"
Open Source DeepWiki: AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories. Join the discord: https://discord.gg/gMwThUMeme
A simple screen parsing tool towards pure vision based GUI agent
MineContext is your proactive context-aware AI partner(Context-Engineering+ChatGPT Pulse)
The absolute trainer to light up AI agents.
An Open Source implementation of Notebook LM with more flexibility and features
iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation
AnyTalker: Scaling Multi-person Talking Video Generation with Interactivity Refinement