Stars
[CVPR 2026] HiF-VLA: An efficient, bidirectional spatiotemporal expansion Vision-Language-Action Model
[CVPR2026]AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots
[CVPR2026] Chain of World: World Model Thinking in Latent Motion
Prior-Guided Vision-Language-Action Models via World Knowledge Variation
Official repository of " UniEmo: Unifying Emotional Understanding and Generation with Learnable Expert Queries"
[ACM MM 2025] Official repository of "EmoSym: A Symbiotic Framework for Unified Emotional Understanding and Generation via Latent Reasoning"
[AAAI 2026] H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Reffnement for Robotic Manipulation
The official implementation of Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight
The official code for the paper Multi-granularity Facial Emotional Representation with Unlabeled Data and Textual Supervision.
[NeurIPS 2025] DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge
Alignedreid++: Dynamically Matching Local Information for Person Re-Identification.
PyTorch implemented C3D, R3D, R2Plus1D models for video activity recognition.
This repo contains a curative list of robot learning (mainly for manipulation) resources.