-
Nanjing University
- Hong Kong SAR
- https://yoyo000.github.io
Stars
CLI-Anything: Making ALL Software Agent-Native
Causal video-action world model for generalist robot control
Masked Depth Modeling for Spatial Perception
Sharp Monocular View Synthesis in Less Than a Second
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
[CVPR 2026] SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
[NeurIPS 2025] Direct3D‑S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
Official Code Release for [SIGGRAPH 2025] RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
[CVPR 2025 Highlight] Matrix3D: Large Photogrammetry Model All-in-One
[ICLR 2025] Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors
[NeurIPS 2024] Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer
[CVPR 2025] Relative camera pose estimation and visual localization with Reloc3r
[CVPR 2025 Highlight] Real-time dense scene reconstruction with SLAM3R
A generative world for general-purpose robotics & embodied AI learning.
[CVPR'2024 Highlight] Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
The best OSS video generation models, created by Genmo
[ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
[ICLR 2025] Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
Vulkan-based Gaussian Splatting viewer, and python binding
A differentiable point-based rendering framework.
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation