-
Huazhong University of Science & Technology
- Wuhan, Hubei Province, China
-
02:26
(UTC +08:00) - https://orcid.org/0009-0009-4752-6118
- @THELMDOFZHOUXIN
- https://lmd0311.github.io/
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
A Unified Driving World Model for Future Generation and Perception
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
Pusa: Thousands Timesteps Video Diffusion Model
[ICCV 2025] Aether: Geometric-Aware Unified World Modeling
Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"
Code of π^3: Scalable Permutation-Equivariant Visual Geometry Learning
[ICCV 2025] HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
[ICML 2025] Official PyTorch Implementation of "History-Guided Video Diffusion"
RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. 🎉🎉🎉
A modular high-level library to train embodied AI agents across a variety of tasks and environments.
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Towards a Generative 3D World Engine for Embodied Intelligence
[ICCV 2025] LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching
The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.
[ICCV 2025] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
A lightweight LMM-based Document Parsing Model
A Native Multimodal LLM for 3D Generation and Understanding
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
Official Implementation: Training-Free Efficient Video Generation via Dynamic Token Carving
Interactive visualizations of the geometric intuition behind diffusion models.
Open source repo for Locate 3D Model, 3D-JEPA and Locate 3D Dataset