PACT is a self-evolving post-training framework for aligning diffusion policies with physical safety constraints in embodied manipulation. Starting from pretrained diffusion policies, PACT uses self-rollouts and automatically computed physical constraints to distill constraint gradients into the policy, improving safety without requiring demonstrations, task rewards, interventions, or outcome annotations.
This repository contains the code for running PACT for robotic manipulation tasks on RoboTwin, including base-policy evaluation, on-policy constraint distillation, and post-training policy evaluation.
- 🔥 [Jun 2026] We released the project website, arXiv paper, and required data and pre-trained policy checkpoints on Hugging Face.
We recommend using a Linux system with NVIDIA GPUs. Our experiments were conducted on RTX 4090 GPUs (24 GB VRAM). For Diffusion Policy (DP) training, we use a per-GPU batch size of 128, demonstrating that the method is relatively memory efficient and can be trained comfortably within a budget less than 24 GB VRAM.
Clone this repository and create a conda environment:
conda create -n pact python=3.10 -y
conda activate pactInstall RoboTwin and download RoboTwin assets:
bash script/_install.sh
bash script/_download_assets.shPACT modifies several assets used by custom tasks such as handover_apple and pour_water_to_cup. Replace the downloaded assets with the PACT versions:
bash script/_replace_assets.shInstall PACT dependencies:
pip install -r requirements.txtInstall base policy dependencies:
cd policy/DP
pip install -e .
cd ../..Create the symbolic link to cost-function utilities:
ln -s ../../extend ./policy/DP/extendDownload the pretrained base policy checkpoints from Hugging Face, and organize them as follows:
# generally, the structure is as follows:
PACT/policy/DP/checkpoints/
└── ${TASK_NAME}-demo_randomized-200-0/600.ckpt
# concretely, the arranged structure containing all the released checkpoints is as follows:
PACT/policy/DP/checkpoints/
├── handover_apple-demo_randomized-200-0/600.ckpt
├── handover_block-demo_randomized-200-0/600.ckpt
├── pick_diverse_bottles-demo_randomized-200-0/600.ckpt
├── pick_dual_bottles-demo_randomized-200-0/600.ckpt
├── place_dual_shoes-demo_randomized-200-0/600.ckpt
├── pour_water_to_cup-demo_randomized-200-0/600.ckpt
└── stack_blocks_two-demo_randomized-200-0/600.ckptDownload the pre-generated instruction dataset data.tar.gz from Hugging Face, and organize it as follows:
# generally, the structure is as follows:
PACT/data/
├── data/${TASK_NAME}/demo_randomized/instructions
└── env_meta.pkl # meta environment infomation shared by all tasks
# concretely, the arranged structure containing all the released instruction datasets is as follows:
PACT/data/
├── data/handover_apple/demo_randomized/instructions
├── data/handover_block/demo_randomized/instructions
├── data/pick_diverse_bottles/demo_randomized/instructions
├── data/pick_dual_bottles/demo_randomized/instructions
├── data/place_dual_shoes/demo_randomized/instructions
├── data/pour_water_to_cup/demo_randomized/instructions
├── data/stack_blocks_two/demo_randomized/instructions
└── env_meta.pkl # meta environment infomation shared by all tasksRun evaluation from policy/DP:
cd policy/DP
# bash eval_dr.sh ${TASK_NAME} ${TASK_CONFIG} ${CKPT_CONFIG} ${EPISODE_NUM} ${SEED} ${CKPT_NUM} ${GPU_ID}
bash eval_dr.sh pick_dual_bottles demo_randomized demo_randomized 200 0 600 0Run on-policy distillation from the repository root:
# bash policy/DP/on_policy_distill_multigpu.sh ${TASK_NAME} ${TASK_CONFIG} ${BASE_EPISODE_NUM} ${SEED} ${ACTION}
# Train with four specified GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 bash policy/DP/on_policy_distill_multigpu.sh pick_dual_bottles onpolicy_randomized 200 0 14The default hyperparameters use a larger training budget to strengthen the compared baselines. For PACT, the proposed distillation objective is more efficient, so you can reduce the number of rollout trajectories and parameter updates by modifying
policy/DP/diffusion_policy/config/on_policy_robot_dp_distillation_14.yaml:... rollout: num_batches_per_epoch: 36 # original 72 ... training: num_inner_epochs: 50 # original 100If training becomes unstable for some tasks, such as
handover_apple, try enabling soft updates by settingtraining.decay_typeto1or2in the config. The default value0uses hard updates.
Run evaluation from policy/DP:
cd policy/DP
# bash eval_dr.sh ${TASK_NAME} ${TASK_CONFIG} ${CKPT_CONFIG} ${EPISODE_NUM} ${SEED} ${CKPT_NUM} ${GPU_ID} ${CKPT_PATH} ${EVAL_DISTILLATION}
bash eval_dr.sh pick_dual_bottles demo_randomized onpolicy_randomized 200 0 20 0 playground/dp/pick_dual_bottles/exptime_robot_on_policy_distillation_pick_dual_bottles TrueWe gratefully acknowledge the authors of RoboTwin and Diffusion Policy. PACT builds on their excellent simulation benchmark and policy implementation.
If you find our work helpful, please cite us:
@article{wu2026pact,
title={PACT: Self-Evolving Physical Safety Alignment for Diffusion Policies in Embodied Manipulation},
author={Wu, Lingxuan and Zhu, Zijian and Wang, Lizhong and Ying, Chengyang and Chen, Huayu and Yang, Xiao and Liu, Fangming and Zhu, Jun},
journal={arXiv preprint arXiv:2606.08414},
year={2026},
}