🚀 Open World Agents

Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.

⚠️ Active Development Notice: This codebase is under active development. APIs and components may change, and some may be moved to separate repositories. Documentation may be incomplete or reference features still in development.

📄 Research Paper: This project was first introduced and developed for the D2E project. For more details, see D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI. If you find this work useful, please cite our paper.

Quick Start

💡 This is a conceptual overview. See the Quick Start Guide for detailed instructions.

# 1. Record desktop interaction
$ ocap my-session.mcap

# 2. Process to training format
$ python scripts/01_raw_to_event.py --train-dir ./

# 3. Train your model (coming soon)
$ python train.py --dataset ./event-dataset

Installation

# For video recording, install GStreamer first. Skip if you only need data processing.
$ conda install open-world-agents::gstreamer-bundle

# Install OWA
$ pip install owa

Documentation

Resource	Description
🏠 Full Documentation	Complete docs with all guides and references
📖 Quick Start Guide	Complete tutorial: Record → Process → Train
🤗 Community Datasets	Browse and share datasets

Core Components

🌍 Environment Framework: "USB-C of desktop agents" - universal interface for native desktop automation with pre-built plugins for desktop control, high-performance screen capture, and zero-configuration plugin system
📊 Data Infrastructure: Complete desktop agent data pipeline from recording to training with OWAMcap format - a universal standard powered by MCAP
🛠️ CLI Tools: Command-line utilities (owl) for recording, analyzing, and managing agent data
🤖 Examples: Complete implementations and training pipelines for multimodal agents

Contributing

We welcome contributions! See our Contributing Guide.

License

MIT License. See LICENSE.

Citation

@article{choi2025d2e,
  title={D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI},
  author={Choi, Suhwan and Jung, Jaeyoon and Seong, Haebin and Kim, Minchan and Kim, Minyeong and Cho, Yongjun and Kim, Yoonshik and Park, Yubeen and Yu, Youngjae and Lee, Yunsung},
  journal={arXiv preprint arXiv:2510.05684},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,960 Commits
.devcontainer		.devcontainer
.github		.github
docker		docker
docs		docs
oeps		oeps
projects		projects
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Open World Agents

Quick Start

Installation

Documentation

Core Components

Contributing

License

Citation

About

Uh oh!

Releases 26

Packages

Uh oh!

Contributors 10

Uh oh!

Languages

License

open-world-agents/open-world-agents

Folders and files

Latest commit

History

Repository files navigation

🚀 Open World Agents

Quick Start

Installation

Documentation

Core Components

Contributing

License

Citation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 26

Packages 0

Uh oh!

Contributors 10

Uh oh!

Languages

Packages