Skip to content

thu-ml/PACT

Repository files navigation

PACT: Self-Evolving Physical Safety Alignment for Diffusion Policies in Embodied Manipulation

Homepage Paper Hugging Face

Overview

PACT is a self-evolving post-training framework for aligning diffusion policies with physical safety constraints in embodied manipulation. Starting from pretrained diffusion policies, PACT uses self-rollouts and automatically computed physical constraints to distill constraint gradients into the policy, improving safety without requiring demonstrations, task rewards, interventions, or outcome annotations.

This repository contains the code for running PACT for robotic manipulation tasks on RoboTwin, including base-policy evaluation, on-policy constraint distillation, and post-training policy evaluation.

Updates

Table of Contents

Installation

We recommend using a Linux system with NVIDIA GPUs. Our experiments were conducted on RTX 4090 GPUs (24 GB VRAM). For Diffusion Policy (DP) training, we use a per-GPU batch size of 128, demonstrating that the method is relatively memory efficient and can be trained comfortably within a budget less than 24 GB VRAM.

Clone this repository and create a conda environment:

conda create -n pact python=3.10 -y
conda activate pact

Install RoboTwin and download RoboTwin assets:

bash script/_install.sh
bash script/_download_assets.sh

PACT modifies several assets used by custom tasks such as handover_apple and pour_water_to_cup. Replace the downloaded assets with the PACT versions:

bash script/_replace_assets.sh

Install PACT dependencies:

pip install -r requirements.txt

Install base policy dependencies:

cd policy/DP
pip install -e .
cd ../..

Create the symbolic link to cost-function utilities:

ln -s ../../extend ./policy/DP/extend

Model Checkpoints and Data

Download the pretrained base policy checkpoints from Hugging Face, and organize them as follows:

# generally, the structure is as follows:
PACT/policy/DP/checkpoints/
└── ${TASK_NAME}-demo_randomized-200-0/600.ckpt

# concretely, the arranged structure containing all the released checkpoints is as follows:
PACT/policy/DP/checkpoints/
├── handover_apple-demo_randomized-200-0/600.ckpt
├── handover_block-demo_randomized-200-0/600.ckpt
├── pick_diverse_bottles-demo_randomized-200-0/600.ckpt
├── pick_dual_bottles-demo_randomized-200-0/600.ckpt
├── place_dual_shoes-demo_randomized-200-0/600.ckpt
├── pour_water_to_cup-demo_randomized-200-0/600.ckpt
└── stack_blocks_two-demo_randomized-200-0/600.ckpt

Download the pre-generated instruction dataset data.tar.gz from Hugging Face, and organize it as follows:

# generally, the structure is as follows:
PACT/data/
├── data/${TASK_NAME}/demo_randomized/instructions
└── env_meta.pkl    # meta environment infomation shared by all tasks

# concretely, the arranged structure containing all the released instruction datasets is as follows:
PACT/data/
├── data/handover_apple/demo_randomized/instructions
├── data/handover_block/demo_randomized/instructions
├── data/pick_diverse_bottles/demo_randomized/instructions
├── data/pick_dual_bottles/demo_randomized/instructions
├── data/place_dual_shoes/demo_randomized/instructions
├── data/pour_water_to_cup/demo_randomized/instructions
├── data/stack_blocks_two/demo_randomized/instructions
└── env_meta.pkl    # meta environment infomation shared by all tasks

Evaluation and Post-Training

Evaluate a Base Policy

Run evaluation from policy/DP:

cd policy/DP

# bash eval_dr.sh ${TASK_NAME} ${TASK_CONFIG} ${CKPT_CONFIG} ${EPISODE_NUM} ${SEED} ${CKPT_NUM} ${GPU_ID}
bash eval_dr.sh pick_dual_bottles demo_randomized demo_randomized 200 0 600 0

Run PACT Post-Training

Run on-policy distillation from the repository root:

# bash policy/DP/on_policy_distill_multigpu.sh ${TASK_NAME} ${TASK_CONFIG} ${BASE_EPISODE_NUM} ${SEED} ${ACTION}
# Train with four specified GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 bash policy/DP/on_policy_distill_multigpu.sh pick_dual_bottles onpolicy_randomized 200 0 14

The default hyperparameters use a larger training budget to strengthen the compared baselines. For PACT, the proposed distillation objective is more efficient, so you can reduce the number of rollout trajectories and parameter updates by modifying policy/DP/diffusion_policy/config/on_policy_robot_dp_distillation_14.yaml:

...

rollout:
  num_batches_per_epoch: 36  # original 72

...

training:
  num_inner_epochs: 50  # original 100

If training becomes unstable for some tasks, such as handover_apple, try enabling soft updates by setting training.decay_type to 1 or 2 in the config. The default value 0 uses hard updates.

Evaluate a PACT Post-Trained Policy

Run evaluation from policy/DP:

cd policy/DP

# bash eval_dr.sh ${TASK_NAME} ${TASK_CONFIG} ${CKPT_CONFIG} ${EPISODE_NUM} ${SEED} ${CKPT_NUM} ${GPU_ID} ${CKPT_PATH} ${EVAL_DISTILLATION}
bash eval_dr.sh pick_dual_bottles demo_randomized onpolicy_randomized 200 0 20 0 playground/dp/pick_dual_bottles/exptime_robot_on_policy_distillation_pick_dual_bottles True

Acknowledgments

We gratefully acknowledge the authors of RoboTwin and Diffusion Policy. PACT builds on their excellent simulation benchmark and policy implementation.

Citation

If you find our work helpful, please cite us:

@article{wu2026pact,
  title={PACT: Self-Evolving Physical Safety Alignment for Diffusion Policies in Embodied Manipulation}, 
  author={Wu, Lingxuan and Zhu, Zijian and Wang, Lizhong and Ying, Chengyang and Chen, Huayu and Yang, Xiao and Liu, Fangming and Zhu, Jun},
  journal={arXiv preprint arXiv:2606.08414},
  year={2026},
}

About

[ICML 2026 Spotlight] Official implementation of "PACT: Self-Evolving Physical Safety Alignment for Diffusion Policies in Embodied Manipulation".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors