CrowdGaussian: Reconstructing High-Fidelity 3D Gaussians for Human Crowd from a Single Image

CrowdGaussian addresses the challenges of multi-person 3D reconstruction from a single image by utilizing a self-supervised adaptation pipeline and Self-Calibrated Learning to generate photorealistic and geometrically complete 3D Gaussian Splatting representations despite heavy occlusions and low clarity.
CVPR 2026

Bringing Your Portrait to 3D Presence

We present a unified framework for reconstructing animatable 3D human avatars from a single portrait by introducing a Dual-UV representation to handle pose/framing sensitivity and a factorized synthetic data manifold to achieve state-of-the-art generalization across head, half-body, and full-body inputs.
CVPR 2026 Project Page Code

UIKA: Universal Head Avatar from Pose-Free Images

UIKA introduces a feed-forward, animatable Gaussian head model that achieves state-of-the-art reconstruction from arbitrary unposed inputs by utilizing a UV-guided remapping strategy and learnable UV tokens to aggregate view-independent features into canonical Gaussian attributes.
CVPR 2026 Project Page

DeX-Portrait: Disentangled and Expressive Portrait Animation via Explicit and Latent Motion Representations

DeX-Portrait introduces a diffusion-based portrait animation framework that achieves high-fidelity, disentangled control over head pose and facial expression through a dual-branch conditioning mechanism and a progressive hybrid classifier-free guidance strategy.
CVPR 2026 Project Page

Pressure2Motion: Hierarchical Human Motion Reconstruction from Ground Pressure with Text Guidance

Pressure2Motion establishes a new state-of-the-art in privacy-preserving motion capture by introducing a hierarchical diffusion model that resolves the ambiguities of ground pressure data through the integration of dual-level pressure features and high-level linguistic priors.
CVPR 2026

TEXTRIX: Latent Attribute Grid for Native Texture Generation and Beyond

TEXTRIX introduces a native 3D attribute generation framework that bypasses the inconsistencies of multi-view fusion by utilizing a Diffusion Transformer on a latent 3D grid, enabling both high-fidelity, seamless texture synthesis and precise 3D part segmentation.
CVPR 2026 Project Page

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

SpatialVID provides a massive-scale dataset of over 21,000 hours of in-the-wild videos with dense 3D annotations—including camera poses, depth maps, and motion instructions—to overcome the data scarcity and scalability limitations currently hindering spatial intelligence and 3D vision research.
CVPR 2026 Project Page Code

Sketch2PoseNet: Efficient and Generalized Sketch to 3D Human Pose Prediction

We address the challenge of speed and generalization in sketch-based 3D pose estimation by employing a learn-from-synthesis strategy. Through training on our established synthetic sketch-pose dataset, we present Sketch2PoseNet that efficiently and accurately predicts generalized 3D human poses across various sketch styles.
SIGGRAPH Asia 2025 Project Page Code

TeRA : Rethinking Text-Guided Realistic 3D Avatar Generation

TeRA introduces a highly efficient two-stage 3D human generative framework that outperforms SDS-based models by training a text-controlled latent diffusion model within a structured latent space, enabling fast, photorealistic avatar generation and text-based partial customization.
ICCV 2025 Project Page Code

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

We introduce a large-scale HUman-centric GEnerated dataset, HuGe100K. Leveraging the diversity in views, poses, and appearances within HuGe100K, we propose a scalable feed-forward transformer model to predict a 3D human Gaussian representation in a uniform space from a given human image.
CVPR 2025 Project Page Code

Hao Zhu