-
University of Maryland, College Park
-
04:55
(UTC -04:00) - http://xiyichen.github.io
Highlights
- Pro
Stars
Latent Spatial Memory for Video World Models
A collection of examples for the MediaPipe Task APIs that can run fully inside your browser.
PaGeR — Unified Panoramic Geometry Estimation via Multi-View Foundation Models
Official Code for the CVPR 2026 Paper "MATCH: Feed-forward Gaussian Registration for Head Avatar Creation and Editing"
[CVPR2026] Official Implementation of Voxify3D
[CVPR 2026 Oral] Official implementation for ChordEdit: One-Step Low-Energy Transport for Image Editing
A novel multi-view feedforward network that enables direct and robust object pose estimation from a query image.
Official implementation for the CVPR'23 paper: Visibility Aware Human-Object Interaction Tracking from Single RGB Camera
[CVPR 2026 Oral] 4D Primitive-Mâché: Glueing Primitives for Persistent 4D Scene Reconstruction
Awesome Unified Multimodal Models
TripoSplat converts a single 2D image into high-quality and variable number of 3D Gaussians, developed by TripoAI.
Finetune HunyuanImage 3.0, a 80B unified understanding and generation model
[CVPR 2026] Official code for BulletTime: Decoupled Control of Time and Camera Pose for Video Generation
[CVPR 2026] Official code for BulletTime: Decoupled Control of Time and Camera Pose for Video Generation
TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction
Official code for MAMMA: Markerless Accurate Multi-person Motion Acquisition.
[SIGGRAPH 2026 Conference] FreeOrbit4D: Training-free Arbitrary Camera Redirection for Monocular Videos via Foreground-Complete 4D Reconstruction
Re-implementation Code for "Archon: A Unified Multimodal Model for Holistic Digital Human Generation", CVPR 2026
ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation (NeurIPS 2023 Spotlight)
A Comprehensive Survey of Interactive Video World Models
Official implementation of No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
Claude Code skill: read & write Overleaf projects via the git bridge. Works on Mac/Linux/WSL.
Implementation of Open-World Visual Odometry with Temporal Dynamics Awareness (CVPR'26)
[SIGGRAPH 2026] Pixal3D: Pixel-Aligned 3D Generation from Images
HumanNet: Scaling Human-centric Video Learning to One Million Hours