AI agent that lives on your desktop.
It sees your screen, understands what you need, and gets things done — hands-free.
⚠️ Atlas is in active development (v0.2.3).
- 🤖 LLM support: Gemini (including native Computer Use API) and OpenAI. More providers on the way.
- 🖥 Screen control: Gemini 3.x models use native Computer Use API for precise actions. Older models use vision-based coordinate prediction.
- 💻 Platform: Windows only for now. macOS & Linux support is planned.
- 🐛 Found a bug? We'd love to hear about it — open an issue.
Atlas is an AI-powered desktop agent that works alongside you as a transparent overlay. Press Ctrl+Space, tell it what to do — and it figures out the rest: navigating apps, clicking buttons, typing text, searching the web, finding files, running commands.
Think of it as a copilot for your entire OS.
- 🖥 Sees your screen — captures what's on your display and understands the context
- 🧠 Thinks before it acts — plans multi-step tasks and shows progress in real time
- 🖱 Controls your computer — mouse, keyboard, and terminal — all automated
- 🎯 Shows what it's doing — you can see the agent's cursor moving on screen
- 🔍 Searches the web — finds answers and brings them back, no tab-switching needed
- 📂 Finds your files — searches local files and folders by name, right from chat
- 🗣 Speaks to you — real-time voice responses with streaming TTS
- 🎙 Listens to you — local speech-to-text with wake word activation, no cloud required
- 🔊 Sound feedback — distinct sounds for every state: activation, processing, task complete, warnings
- 🛡 Asks before doing anything risky — built-in safety system with permission prompts
A glowing AI indicator that shows you exactly what Atlas is doing — idle, thinking, acting, or waiting for your input. Always visible, never in the way.
Context-aware floating panels that appear when relevant:
- Action Island — shows the current task and progress
- Response Island — streams Atlas's thoughts and replies word by word
- Permission Island — asks for confirmation before risky operations
- Microtask Island — your task queue with real-time step progress (queue new tasks while the agent is busy)
- Search Island — web search results and local file search results
- Listening Island — live transcript display during voice input
- Warning Island — dismissable warnings for errors and quota issues
When Atlas controls your desktop, you can see its cursor moving on screen — clicking, typing, and scrolling — so you always know what's happening.
With compatible Gemini 3.x models, Atlas uses the native Computer Use API for precise screen control — clicking, typing, scrolling, navigating, and searching — all without opening extra apps. Multi-monitor setups are supported.
Before executing complex commands, Atlas breaks them into high-level steps (2–5) and displays them in the Task Queue. You see planned steps before execution begins and watch progress as each step completes.
Create multiple AI agents with unique personalities, knowledge, and voices. Each persona has its own memory and prompt settings — switch between them from the tray menu.
Atlas remembers your preferences and context across sessions. It learns facts about you from conversations and uses them to give better responses over time. Browse conversation history and view, edit, or delete learned facts in Settings.
Local offline speech-to-text via Vosk — just say the wake word (the active persona's name) and Atlas starts listening. No cloud API required.
Full control over the AI's behavior — modify system, action, and safety prompts directly from the Settings UI. Reset to defaults anytime.
Choose where Atlas appears on screen (left, right, or center) and configure your preferred activation hotkey — all from Settings.
Enable per-request session logs to trace the full pipeline: intent classification → LLM calls → actions → response streaming — with precise timing for every stage.
-
Go to Releases and download the latest installer for Windows
-
Run the installer — Atlas will appear in your system tray
-
Get a Gemini API key: go to Google AI Studio → sign in → Create API Key → copy it
-
Click the Atlas tray icon → Settings → Intelligence tab → paste your API key
-
Set the recommended models in the Intelligence tab:
Setting Free tier Paid tier Text model gemini-3.1-flash-lite-previewgemini-3.1-flash-lite-previewVision model gemini-3.1-flash-lite-previewgemini-3-flash-previewVision model handles screen control & Computer Use. Paid tier model is more accurate but requires a billing-enabled API key.
-
(Optional) For voice output:
- Alice (free, no API key): Voice tab → select Alice → done!
- ElevenLabs (premium voices): get an ElevenLabs API key → Voice tab → paste key + voice ID
-
Press
Ctrl+Spaceand start giving Atlas tasks 🎉
For contributors and developers who want to run Atlas from source.
git clone https://github.com/dortanes/atlas.git
cd atlas
yarn install
yarn dev| Status | Feature |
|---|---|
| ✅ | Transparent glassmorphism overlay with Orb + Island UI |
| ✅ | LLM integration (Gemini + OpenAI) with multi-provider architecture |
| ✅ | Screen vision + desktop automation (robotjs) |
| ✅ | Native Gemini Computer Use API |
| ✅ | Smart task planning with step-by-step progress |
| ✅ | Agent cursor animations (click, type, scroll overlays) |
| ✅ | Streaming TTS (ElevenLabs + Alice) |
| ✅ | Persona system with isolated memory & custom voices |
| ✅ | Web search + local file search |
| ✅ | Settings UI with prompt editor + debug logging |
| ✅ | Intent classification (direct / action / chat) |
| ✅ | Context caching (Gemini prompt caching for token optimization) |
| ✅ | Voice input (wake word + local STT via Vosk) |
| 🔜 | Action whitelist/blacklist & audit log |
| 🔜 | Onboarding flow |
| 🔜 | Auto-update |
If you find Atlas useful, please consider giving the repository a star ⭐ — it helps others discover the project and motivates further development!
Contributions are welcome! Feel free to open an issue or submit a pull request.
Apache License 2.0 — use it, modify it, build on it.
Vibecoded with ❤️ by dortanes