- Beijing
Stars
Replication of EgoScale kind data collection tool based on Unitree-g1 robot.
Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch
A curated awesome list for dexterous robot manipulation, tactile sensing, dexterous hands, robot learning, datasets, benchmarks, and simulators.
HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos, CVPR 2025
[ICLR 2026] Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
Deep Learning for Visual-Inertial Odometry
A Curated List of Vision-Language-Action (VLA) and World Action Models (WAM) Research and Beyond
[CVPR 2026] UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos
[CVPR 2026] ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training
This repository holds the code that wraps habitat-sim. The main purpose of this code is data collection. Datasets like [mvl-dataset](https://huggingface.co/datasets/EnriqueSolarte/mvl_datasets) wer…
ViPE: Video Pose Engine for Geometric 3D Perception
An agentic skills framework & software development methodology that works.
A Curated List of Awesome Video World Models with AR Diffusion: Covering Algorithms, Applications, and Infrastructure, Aimed at Serving as a Comprehensive Resource for Researchers, Practitioners, a…
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
Paper list for robot learning from human videos (LfHV)
Official Implementation of SAGE-GRPO:Manifold-Aware Exploration for Reinforcement Learning in Video Generation
[CVPR 2026] Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"
A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
[NeurIPS 2024 Datasets and Benchmarks Track] Closed-Loop E2E-AD Benchmark Enhanced by World Model RL Expert
Simulate and correct images for dichromatic color blindness
RLLaVA is a user-friendly framework for multi-modal RL research and optimized for resource-constrained teams.
CoReVLA: A Dual-Stage End-to-End Autonomous Driving Framework for Long-Tail Scenarios via Collect-and-Refine
Official repo for "GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization"
A Survey on Reinforcement Learning of Vision-Language-Action Models for Robotic Manipulation