git clone https://github.com/GuHuangAI/LaDiWM.git
cd LaDiWM
conda env create -f environment.yml
conda activate ladiwmNote: We use the siglip model in the transformers package, however, the default get_image_features function of siglip model only return pooled tokens instead of patch tokens.
Therefore, we modify the source code (~/anaconda3/envs/ladiwm/lib/python3.8/site-packages/transformers/models/siglip/modeling_siglip.py), letting the get_image_features function to directly return vision_outputs instead of pooled_output. (similar modification to the get_text_features function)
- Download LIBERO dataset, note that we train the world model with LIBERO-90, and policy model with LIBERO-LONG (LIBERO-10).
- Process the dataset following ATM.
- Download DINO pretrained weight from here, and search for 'dinov2_vitb14_pretrain.pth' in the wm config file and policy config file, and replace the original path by your local path.
- To train the world model, you need to modify line-20 of the training script to your local data path. In addition, you should modify line-8 and line-10 of the wm config file to change the save path.
PYTHONPATH=$(pwd) python ./scripts/train_libero_diffusion_transformer_action_base.py - To train the policy model, you need to modify line-30 of the training script to your local data path. You should modify line-8 and line-10 of the policy config file to change the save path.
PYTHONPATH=$(pwd) python ./scripts/train_libero_policy_diff_action.py -tt $Your local path for saving world modelmodify line-21 of the testing script to your local data path.
PYTHONPATH=$(pwd) python ./scripts/eval_libero_policy_action.py --exp-dir $Your local path for saving policy modelThe weights contain DINO, Siglip, world model and policy model.
BaiduDisk: https://pan.baidu.com/s/1jXTi52U_GODJp9euAr8RDA?pwd=fu2j password: fu2j
GoogleDrive: https://drive.google.com/file/d/1E3X_RfdZISwOW2l5SWuMUSiJmRnX-0Dg/view?usp=drive_link
Thanks to the public repos: ADM and ATM for providing the base codes. If you have some questions, please contact with huangai@nudt.edu.cn.
@inproceedings{huang2025ladi,
title={LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation},
author={Huang, Yuhang and Zhang, Jiazhao and Zou, Shilong and Liu, Xinwang and Hu, Ruizhen and Xu, Kai},
booktitle={CoRL},
year={2025}
}