β οΈ Active Development Notice: This codebase is under active development. APIs and components may change, and some may be moved to separate repositories. Documentation may be incomplete or reference features still in development.
π Research Paper: This project was first introduced and developed for the D2E project. For more details, see D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI. If you find this work useful, please cite our paper.
π‘ This is a conceptual overview. See the Quick Start Guide for detailed instructions.
# 1. Record desktop interaction
$ ocap my-session.mcap
# 2. Process to training format
$ python scripts/01_raw_to_event.py --train-dir ./
# 3. Train your model (coming soon)
$ python train.py --dataset ./event-dataset# For video recording, install GStreamer first. Skip if you only need data processing.
$ conda install open-world-agents::gstreamer-bundle
# Install OWA
$ pip install owa| Resource | Description |
|---|---|
| π Full Documentation | Complete docs with all guides and references |
| π Quick Start Guide | Complete tutorial: Record β Process β Train |
| π€ Community Datasets | Browse and share datasets |
- π Environment Framework: "USB-C of desktop agents" - universal interface for native desktop automation with pre-built plugins for desktop control, high-performance screen capture, and zero-configuration plugin system
- π Data Infrastructure: Complete desktop agent data pipeline from recording to training with
OWAMcapformat - a universal standard powered by MCAP - π οΈ CLI Tools: Command-line utilities (
owl) for recording, analyzing, and managing agent data - π€ Examples: Complete implementations and training pipelines for multimodal agents
We welcome contributions! See our Contributing Guide.
MIT License. See LICENSE.
@article{choi2025d2e,
title={D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI},
author={Choi, Suhwan and Jung, Jaeyoon and Seong, Haebin and Kim, Minchan and Kim, Minyeong and Cho, Yongjun and Kim, Yoonshik and Park, Yubeen and Yu, Youngjae and Lee, Yunsung},
journal={arXiv preprint arXiv:2510.05684},
year={2025}
}