My lab for learning AI agent tooling by building agents that play ARC-AGI-3 games.
The repo is based on the official ARC-AGI-3-Agents runner, with local adaptations for hands-on experimentation:
- local/offline game runs
- simple agent templates to modify
- recordings for manual inspection
- a small LLM smoke-test agent
- tests for the runner and agent harness
Install dependencies with uv:
uv syncCreate local config:
cp .env.example .envFor local development, .env should usually contain:
OPERATION_MODE=offlineAdd OPENAI_API_KEY to .env if you want to run LLM-based agents.
Run the baseline random agent:
uv run main.py --agent=random --game=ls20Run the short LLM smoke-test agent:
uv run main.py --agent=smokellm --game=ls20smokellm uses gpt-4o-mini and stops after 3 actions. It exists to verify
that local execution and OpenAI credentials work before running longer
experiments.
Local mode uses downloaded game files under environment_files/. It is faster
and avoids online scorecard/replay calls.
If a game is not available locally yet, run once in normal or online mode to
download it, then switch back to offline.
Print registered agents:
uv run python -c "from agents import AVAILABLE_AGENTS; print(sorted(AVAILABLE_AGENTS))"Current useful starting points:
randomfastllmllmreasoningllmguidedllmlanggraphrandomlanggraphfunclanggraphtextonlylanggraphthinkingsmokellm
uv run pytestThe tests cover the runner, agent contract, recordings, and basic template behavior. They do not evaluate agent quality.
- Recordings are written to
recordings/and ignored by git. - Downloaded environments are written to
environment_files/and ignored by git. - Real secrets belong in
.env;.env.exampleis only for placeholders. - Full frame dumps can be very token-heavy for LLM agents, so prompt compression is an important next area to explore.