Stars
SoulX-FlashTalk is the first 14B model to achieve sub-second start-up latency (0.87s) while maintaining a real-time throughput of 32 FPS on an 8xH800 node.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
This repository collects papers on Human-Interaction-Motion-Generation applications. We will update new papers irregularly.
Offical Implementation of SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations
[ICCV2025] LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
CLIP+MLP Aesthetic Score Predictor
Enjoy the magic of Diffusion models!
[CVPR2025] We present StableAnimator, the first end-to-end ID-preserving video diffusion framework, which synthesizes high-quality videos without any post-processing, conditioned on a reference ima…
[ICCV 2023, Official Code] for paper "Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives". Official Weights and Demos provided.
🎥 Python and OpenCV-based scene cut/transition detection program & library.
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
[CVPR'25 Oral] Official implementation for "DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models"
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
RTMPose series (RTMPose, DWPose, RTMO, RTMW) without mmcv, mmpose, mmdet etc.
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
[ECCV 2024] Official implementation of the paper "GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views"
DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation
[ICCV 2025] Official implementations for paper: VACE: All-in-One Video Creation and Editing
Official project page of MTVCrafter, a new paradigm for animating arbitrary characters with 4D motion tokens.
Official implementation for "Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches" (CVPR 2024)
Dynamic human image animation with strong identity preservation, heterogeneous character driving, and controllable backgrounds.
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets