XSkill: Cross Embodiment Skill Discovery

Mengda Xu, Zhenjia Xu, Cheng Chi, Manuela Veloso, Shuran Song
Proceedings of The 7th Conference on Robot Learning, PMLR 229:3536-3555, 2023.

Abstract

Human demonstration videos are a widely available data source for robot learning and an intuitive user interface for expressing desired behavior. However, directly extracting reusable robot manipulation skills from unstructured human videos is challenging due to the big embodiment difference and unobserved action parameters. To bridge this embodiment gap, this paper introduces XSkill, an imitation learning framework that 1) discovers a cross-embodiment representation called skill prototypes purely from unlabeled human and robot manipulation videos, 2) transfers the skill representation to robot actions using conditional diffusion policy, and finally, 3) composes the learned skill to accomplish unseen tasks specified by a human prompt video. Our experiments in simulation and real-world environments show that the discovered skill prototypes facilitate both skill transfer and composition for unseen tasks, resulting in a more general and scalable imitation learning framework.

Cite this Paper


BibTeX
@InProceedings{pmlr-v229-xu23a, title = {XSkill: Cross Embodiment Skill Discovery}, author = {Xu, Mengda and Xu, Zhenjia and Chi, Cheng and Veloso, Manuela and Song, Shuran}, booktitle = {Proceedings of The 7th Conference on Robot Learning}, pages = {3536--3555}, year = {2023}, editor = {Tan, Jie and Toussaint, Marc and Darvish, Kourosh}, volume = {229}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v229/xu23a/xu23a.pdf}, url = {https://proceedings.mlr.press/v229/xu23a.html}, abstract = {Human demonstration videos are a widely available data source for robot learning and an intuitive user interface for expressing desired behavior. However, directly extracting reusable robot manipulation skills from unstructured human videos is challenging due to the big embodiment difference and unobserved action parameters. To bridge this embodiment gap, this paper introduces XSkill, an imitation learning framework that 1) discovers a cross-embodiment representation called skill prototypes purely from unlabeled human and robot manipulation videos, 2) transfers the skill representation to robot actions using conditional diffusion policy, and finally, 3) composes the learned skill to accomplish unseen tasks specified by a human prompt video. Our experiments in simulation and real-world environments show that the discovered skill prototypes facilitate both skill transfer and composition for unseen tasks, resulting in a more general and scalable imitation learning framework.} }
Endnote
%0 Conference Paper %T XSkill: Cross Embodiment Skill Discovery %A Mengda Xu %A Zhenjia Xu %A Cheng Chi %A Manuela Veloso %A Shuran Song %B Proceedings of The 7th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Jie Tan %E Marc Toussaint %E Kourosh Darvish %F pmlr-v229-xu23a %I PMLR %P 3536--3555 %U https://proceedings.mlr.press/v229/xu23a.html %V 229 %X Human demonstration videos are a widely available data source for robot learning and an intuitive user interface for expressing desired behavior. However, directly extracting reusable robot manipulation skills from unstructured human videos is challenging due to the big embodiment difference and unobserved action parameters. To bridge this embodiment gap, this paper introduces XSkill, an imitation learning framework that 1) discovers a cross-embodiment representation called skill prototypes purely from unlabeled human and robot manipulation videos, 2) transfers the skill representation to robot actions using conditional diffusion policy, and finally, 3) composes the learned skill to accomplish unseen tasks specified by a human prompt video. Our experiments in simulation and real-world environments show that the discovered skill prototypes facilitate both skill transfer and composition for unseen tasks, resulting in a more general and scalable imitation learning framework.
APA
Xu, M., Xu, Z., Chi, C., Veloso, M. & Song, S.. (2023). XSkill: Cross Embodiment Skill Discovery. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:3536-3555 Available from https://proceedings.mlr.press/v229/xu23a.html.

Related Material