robomimic v0.5 Release Notes
Highlights
- ๐ง New Algorithm (Diffusion Policy)
- ๐น๏ธ Action dicts and normalization support
- ๐๏ธ Multi-dataset training
- ๐ Language-conditioned policy learning
- โฏ๏ธ Resume functionality
- Other quality of life improvements
๐ง New Algorithm - Diffusion Policy
Diffusion Policy (see paper) has become increasingly prevalent in the research community. We provide an implementation of UNet-based Diffusion Policy in this release, which comes with an umbrella support for action normalization and training with per-batch learning rate update. It is a strong baseline for the robomimic datasets, often outperforming BC-RNN. We hope that this provides an easy way for our users to start using diffusion-based policies.
We provide a template configuration file for Diffusion Policy that contains suggested parameters for the model. To learn how to train a Diffusion Policy, either with our default configuration file or with custom parameters, please refer to our tutorial.
๐น๏ธ Action dictionaries and normalization
You can now configure multiple action spaces and normalization in robomimic, particularly useful for tasks with multiple different action spaces such as in robot manipulation. This is particularly useful for:
- Robot manipulation tasks with different action components (e.g., end-effector position and rotation)
- Actions that require different normalization schemes
The action configuration consists of two main components:
action_keys: List of action components to useaction_config: Dictionary specifying how each action component should be processed
For more details on how to use action configs, please refer to the documentation.
๐๏ธ Multi-dataset Training
Robomimic supports training on multiple datasets simultaneously. This is useful when you want to:
- Train a single model on multiple tasks
- Combine datasets with different qualities (e.g., expert and suboptimal demonstrations)
- Balance data from different sources
Each dataset can have its own weight for sampling, and you can control whether these weights are normalized by dataset size or not. For more details on how to setup multi-dataset training, please refer to the documentation.
๐ Language-conditioned policy learning
You can now train language-conditioned policies in robomimic. We support CLIP embeddings for encoding language, and two different ways of conditioning policies on embeddings:
- As feature input to action head
- FiLM over vision encoder
For more details on how to train language-conditioned policies, please refer to the documentation.
โฏ๏ธ Resume functionality
If your training job fails due to any reason, you can re-launch your job with the additional --resume flag to resume training from the last saved epoch. This will resume training from the last.pth checkpoint in your output directory. For more details, please refer to the documentation.
Other Improvements
We outline other improvements here.
- Better data augmentation support (chaining multiple observation randomizers)
- Cosine learning rate scheduler with per-batch stepping
- Updates to BC-Transformer to predict future actions
- In addition to using env metadata extracted from a dataset for policy rollout (see docs), added ability to update env metadata using
env_meta_update_dictin a training config (e.g. for evaluating with absolute actions) - Observation extraction with multi-processing
- Deprecated API:
EnvGibsonMOMARTno longer supported,postprocess_visual_obsmoved from from env wrappers toRolloutPolicy - Updated docs and tutorials
Breaking Changes
We make breaking changes that might affect some users:
EnvGibsonMOMARTis no longer supportedpostprocess_visual_obshas been moved from from env wrappers toRolloutPolicy- Observation normalization stats from old checkpoints cannot be loaded
Contributor Spotlight
This release was a major team effort. Here is a breakdown of contributed features.
- Diffusion Policy (@cheng-chi)
- Cosine learning rate scheduler with per-batch stepping (@vaibhavsaxena11)
- Support for multi-dataset training (e.g. co-training) (@snasiriany)
- Action dictionaries and action normalization (@snasiriany)
- Language-conditioned policy training (CLIP language encoder and FiLM language conditioning support for ResNet18 and VisualCore) (@snasiriany, @amandlek, @NKrypt26, @vaibhavsaxena11)
- Resume training functionality (@amandlek)
- Support for chaining multiple observation randomizers (@vaibhavsaxena11)
- Support postprocess_visual_obs in RolloutPolicy (previously in env wrappers) (@vaibhavsaxena11)
- Support for MimicLabs environments in EnvRobosuite (@vaibhavsaxena11)
- Update BC-Transformer with functionality to predict future action chunks (@snasiriany)
- Dockerfile for robomimic (@SurafelAnshebo)