- Hong Kong SAR
Starred repositories
Escaping the Big Data Paradigm in Self-Supervised Representation Learning
[ICRA 2026] VITRA: Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
[CVPR 2026] ForeAct: Steering Your VLA with Efficient Visual Foresight Planning
Causal video-action world model for generalist robot control
[ICLR 2026] Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
AgentFlow: In-the-Flow Agentic System Optimization
Running VLA at 30Hz frame rate and 480Hz trajectory frequency
A general-purpose robotic agent framework based on LLMs. The LLM can independently reason, plan, and execute actions to operate diverse robot types across various scenarios to complete unpredictabl…
Offical code release for DynoSAM: Dynamic Object Smoothing And Mapping. Accepted Transactions on Robotics (Visual SLAM SI). A visual SLAM framework and pipeline for Dynamic environements, estimatin…
1st place solution of 2025 BEHAVIOR Challenge
[CVPR 2026] Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"
This repository contains code for the paper "Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training" by T. Bonnaire, R. Urfin, G. Biroli and M. Mézard.
(ICRA 2025) Inverse Mixed Strategy Games with Generative Trajectory Models
A highly robust and accurate LiDAR-only, LiDAR-inertial odometry
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
Visual Imitation Enables Contextual Humanoid Control. CoRL 2025, Best Student Paper Award.
Official Repo of "Disentangled Reinforcement Learning for Robust Visual Quality Assessment"
[NeurIPS'25 Spotlight] (DANCE) Disentangled Concepts Speak Louder Than Words:Explainable Video Action Recognition official code repository
[NeurIPS 2025] NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding
Your personal AI trading assistant. Any market. Any model. Pay with USDC, not API keys.
Official implementation of paper: Characterizing Dataset Bias via Disentangled Visual Concepts
StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
[NeurIPS 2025] Flow x RL. "ReinFlow: Fine-tuning Flow Policy with Online Reinforcement Learning". Support VLAs e.g., pi0, pi0.5. Fully open-sourced.
This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.
[ICCV 2025] SuperDec: 3D Scene Decomposition with Superquadric Primitives.
GPU-Powered Sequential Manipulation in Milliseconds
The official repo for SpaceVista: All-Scale Visual Spatial Reasoning from mm to km.