Demo
Method
The training pipeline of OPFA follows a two-stage paradigm. (1) We first construct a Geometry-Aware Latent Representation (GaLR) by encoding sampled reachable-state point clouds with 3D convolutions and geometric transformers for local/global feature extraction. A unified latent retargeting decoder then disentangles embodiment-specific actions from the latent space, enabling end-to-end training without manual annotations. (2) The pretrained encoder–decoder pair is integrated into any downstream policy (e.g., DP3), allowing cross-embodiment data to be jointly trained in a unified latent action space.
BibTeX
@article{mu2026one,
title={One-Policy-Fits-All: Geometry-Aware Action Latents for Cross-Embodiment Manipulation},
author={Mu, Juncheng and Yang, Sizhe and Bae, Hojin and Jia, Feiyu and Ben, Qingwei and Li, Boyi and Xu, Huazhe and Pang, Jiangmiao},
journal={arXiv preprint arXiv:2603.14522},
year={2026}
}