Stars
🚀 Self-hosted open-source WebRTC video conferencing platform built on peer-to-peer (P2P) architecture for fast, secure real-time communication with end-to-end privacy.
[SIGGRAPH 2025] LAM: Large Avatar Model for One-shot Animatable Gaussian Head
Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation"
【Accepted by TPAMI】Human Motion Video Generation: A Survey (https://ieeexplore.ieee.org/document/11106267)
Unified Codebase for Advanced World Models.
A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone
Open source impl of **MV-DUSt3R+ Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds** from Meta Reality Labs. Project page https://mv-dust3rp.github.io/
✨ Self-hosted open-source WebRTC cam-to-cam peer-to-peer video calling platform for immersive 1-to-1 real-time communication with end-to-end privacy. Each room is limited to two participants for ma…
📡 Self-hosted open-source WebRTC live broadcasting platform for real-time video, audio, and screen streaming to unlimited connected viewers.
A curated list of awesome human-human interaction resources.
Official implementation of "OmniForcing: Unleashing Real-time Joint Audio-Visual Generation"[arXiv:2603.11647]. OmniForcing is the first framework to distill bidirectional audio-visual diffusion mo…
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
CastleHill: Separable Causal Diffusion / Varitaion Flow Maps for LTX-2 long-form video generation
[SIGGRAPH‘2026] PEAR :Pixel-aligned Expressive humAn mesh Recovery
Code Implementation of "WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation"
Official inference code for SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory
SoulX-FlashHead: A unified 1.3B-parameter framework designed for high-fidelity, infinite-length, and real-time streaming portrait video generation.
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Official repository of paper "CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos"
Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"
🧂 Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
[ICLR 2026] LumiTex: Towards High-Fidelity PBR Texture Generation with Illumination Context
[ICLR'26] code for paper "Token-level Data Selection for Safe LLM Fine-tuning"
SoulX-FlashTalk is the first 14B model to achieve sub-second start-up latency (0.87s) while maintaining a real-time throughput of 32 FPS on an 8xH800 node.
[Tech Report] Alive: A Unified Audio-Video Generation Model
[ACM MM 2024] GS3LAM: Gaussian Semantic Splatting SLAM