PACT: Self-Evolving Physical Safety Alignment for Diffusion Policies in Embodied Manipulation

Overview

PACT is a self-evolving post-training framework for aligning diffusion policies with physical safety constraints in embodied manipulation. Starting from pretrained diffusion policies, PACT uses self-rollouts and automatically computed physical constraints to distill constraint gradients into the policy, improving safety without requiring demonstrations, task rewards, interventions, or outcome annotations.

This repository contains the code for running PACT for robotic manipulation tasks on RoboTwin, including base-policy evaluation, on-policy constraint distillation, and post-training policy evaluation.

Updates

🔥 [Jun 2026] We released the project website, arXiv paper, and required data and pre-trained policy checkpoints on Hugging Face.

Installation

We recommend using a Linux system with NVIDIA GPUs. Our experiments were conducted on RTX 4090 GPUs (24 GB VRAM). For Diffusion Policy (DP) training, we use a per-GPU batch size of 128, demonstrating that the method is relatively memory efficient and can be trained comfortably within a budget less than 24 GB VRAM.

Clone this repository and create a conda environment:

conda create -n pact python=3.10 -y
conda activate pact

Install RoboTwin and download RoboTwin assets:

bash script/_install.sh
bash script/_download_assets.sh

PACT modifies several assets used by custom tasks such as handover_apple and pour_water_to_cup. Replace the downloaded assets with the PACT versions:

bash script/_replace_assets.sh

Install PACT dependencies:

pip install -r requirements.txt

Install base policy dependencies:

cd policy/DP
pip install -e .
cd ../..

Create the symbolic link to cost-function utilities:

ln -s ../../extend ./policy/DP/extend

Model Checkpoints and Data

Download the pretrained base policy checkpoints from Hugging Face, and organize them as follows:

# generally, the structure is as follows:
PACT/policy/DP/checkpoints/
└── ${TASK_NAME}-demo_randomized-200-0/600.ckpt

# concretely, the arranged structure containing all the released checkpoints is as follows:
PACT/policy/DP/checkpoints/
├── handover_apple-demo_randomized-200-0/600.ckpt
├── handover_block-demo_randomized-200-0/600.ckpt
├── pick_diverse_bottles-demo_randomized-200-0/600.ckpt
├── pick_dual_bottles-demo_randomized-200-0/600.ckpt
├── place_dual_shoes-demo_randomized-200-0/600.ckpt
├── pour_water_to_cup-demo_randomized-200-0/600.ckpt
└── stack_blocks_two-demo_randomized-200-0/600.ckpt

Download the pre-generated instruction dataset data.tar.gz from Hugging Face, and organize it as follows:

# generally, the structure is as follows:
PACT/data/
├── data/${TASK_NAME}/demo_randomized/instructions
└── env_meta.pkl    # meta environment infomation shared by all tasks

# concretely, the arranged structure containing all the released instruction datasets is as follows:
PACT/data/
├── data/handover_apple/demo_randomized/instructions
├── data/handover_block/demo_randomized/instructions
├── data/pick_diverse_bottles/demo_randomized/instructions
├── data/pick_dual_bottles/demo_randomized/instructions
├── data/place_dual_shoes/demo_randomized/instructions
├── data/pour_water_to_cup/demo_randomized/instructions
├── data/stack_blocks_two/demo_randomized/instructions
└── env_meta.pkl    # meta environment infomation shared by all tasks

Evaluation and Post-Training

Evaluate a Base Policy

Run evaluation from policy/DP:

cd policy/DP

# bash eval_dr.sh ${TASK_NAME} ${TASK_CONFIG} ${CKPT_CONFIG} ${EPISODE_NUM} ${SEED} ${CKPT_NUM} ${GPU_ID}
bash eval_dr.sh pick_dual_bottles demo_randomized demo_randomized 200 0 600 0

Run PACT Post-Training

Run on-policy distillation from the repository root:

# bash policy/DP/on_policy_distill_multigpu.sh ${TASK_NAME} ${TASK_CONFIG} ${BASE_EPISODE_NUM} ${SEED} ${ACTION}
# Train with four specified GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 bash policy/DP/on_policy_distill_multigpu.sh pick_dual_bottles onpolicy_randomized 200 0 14

The default hyperparameters use a larger training budget to strengthen the compared baselines. For PACT, the proposed distillation objective is more efficient, so you can reduce the number of rollout trajectories and parameter updates by modifying policy/DP/diffusion_policy/config/on_policy_robot_dp_distillation_14.yaml:
...

rollout:
  num_batches_per_epoch: 36  # original 72

...

training:
  num_inner_epochs: 50  # original 100
If training becomes unstable for some tasks, such as handover_apple, try enabling soft updates by setting training.decay_type to 1 or 2 in the config. The default value 0 uses hard updates.

Evaluate a PACT Post-Trained Policy

Run evaluation from policy/DP:

cd policy/DP

# bash eval_dr.sh ${TASK_NAME} ${TASK_CONFIG} ${CKPT_CONFIG} ${EPISODE_NUM} ${SEED} ${CKPT_NUM} ${GPU_ID} ${CKPT_PATH} ${EVAL_DISTILLATION}
bash eval_dr.sh pick_dual_bottles demo_randomized onpolicy_randomized 200 0 20 0 playground/dp/pick_dual_bottles/exptime_robot_on_policy_distillation_pick_dual_bottles True

Acknowledgments

We gratefully acknowledge the authors of RoboTwin and Diffusion Policy. PACT builds on their excellent simulation benchmark and policy implementation.

Citation

If you find our work helpful, please cite us:

@article{wu2026pact,
  title={PACT: Self-Evolving Physical Safety Alignment for Diffusion Policies in Embodied Manipulation}, 
  author={Wu, Lingxuan and Zhu, Zijian and Wang, Lizhong and Ying, Chengyang and Chen, Huayu and Yang, Xiao and Liu, Fangming and Zhu, Jun},
  journal={arXiv preprint arXiv:2606.08414},
  year={2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
data		data
data_task_seeds		data_task_seeds
description		description
envs		envs
extend		extend
modified_assets/objects		modified_assets/objects
playground		playground
policy/DP		policy/DP
script		script
task_config		task_config
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
collect_data.sh		collect_data.sh
requirements.txt		requirements.txt
rollout_worker.py		rollout_worker.py
worker.sh		worker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PACT: Self-Evolving Physical Safety Alignment for Diffusion Policies in Embodied Manipulation

Overview

Updates

Table of Contents

Installation

Model Checkpoints and Data

Evaluation and Post-Training

Evaluate a Base Policy

Run PACT Post-Training

Evaluate a PACT Post-Trained Policy

Acknowledgments

Citation

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PACT: Self-Evolving Physical Safety Alignment for Diffusion Policies in Embodied Manipulation

Overview

Updates

Table of Contents

Installation

Model Checkpoints and Data

Evaluation and Post-Training

Evaluate a Base Policy

Run PACT Post-Training

Evaluate a PACT Post-Trained Policy

Acknowledgments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages