Blog ⢠Showcases ⢠Features ⢠Quick Start ⢠Development ⢠Roadmap
Star GACUA on GitHub to be instantly notified of updates. Your support means everything to us! ā¤ļø
GACUA (Gemini CLI as Computer Use Agent) is the world's first out-of-the-box computer use agent powered by Gemini CLI.
assist_gameplay.mp4 |
install_vscode.mp4 |
summarize_blog_gpt.mp4 |
show_hidden_files.mp4 |
GACUA extends the core capabilities of Gemini CLI to provide a robust agentic experience. It enables you to:
- š» Enjoy Out-of-the-Box Computer Use: Get started with a single command. GACUA provides a free and immediate way to experience computer use, from assisting with gameplay, installing software, and more.
- šÆ Execute Tasks with High Accuracy: GACUA enhances Gemini 2.5 Pro's grounding capability through a "Image Slicing + Two-Step Grounding" method.
- š¬ Gain Step-by-Step Control & Observability: Unlike black-box agents, GACUA offers a transparent, step-by-step execution flow. You can review, accept, or reject each action the agent proposes, giving you full control over the task's completion.
- š Enable Remote Operation: You can access your agent from a separate device. The agent runs in its own independent environment, so you no longer have to "fight" with it for mouse and keyboard control while the agent works.
For the Technical Journey Behind GACUA, see GACUA: A Free and Open-Source Computer Use Agent for Developers.
Get up and running with GACUA in just a few steps.
- Node.js ā„ 20: GACUA is built on Node.js. The installer will also install npm.
- Gemini Authentication: GACUA needs to authenticate with the Gemini API. While Gemini CLI is not required to run GACUA, the easiest way to set up authentication is by installing and configuring the Gemini CLI first. GACUA will automatically reuse the configuration created by it.
Simply run the following command to start GACUA.
npx @gacua/backend
This command uses npx
to download and run the GACUA backend package without needing to install it globally. The first time you run this, it may take a few moments to download the necessary files.
To see detailed installation progress, run the following command.
npx --verbose @gacua/backend
Alternatively, you can install GACUA globally using npm. This will install the GACUA package on your system, allowing you to run it from any directory by simply typing gacua
.
npm install -g @gacua/backend && gacua
Follow the on-screen prompts to complete the setup. Once the setup is finished, you can access the GACUA server from a web browser on your controlling device.
Important
Network Configuration
GACUA operates as a local web server, allowing you to control your PC from another device, like a mobile phone. For this to work, both devices must be on the same network.
- Connect to the same Wi-Fi: The simplest method is to connect your computer and your controlling device (e.g., your phone) to the same Wi-Fi network.
- Use a mobile hotspot: If you don't have a shared Wi-Fi network, you can use your phone's hotspot and connect your computer to it.
- Check your firewall: Your computer's firewall might block incoming connections. If you can't connect, ensure that your firewall settings allow access to the port GACUA is running on. You may need to create a new inbound rule for Node.js or the specific port.
GACUA includes a specialized MCP tool for computer control and operates as a web server. This architecture creates a seamless connection between the computer you want to control and the device you're using to issue commands.
By default, GACUA runs as an all-in-one application. However, for more advanced use cases, such as controlling a computer on a different network, you can run its core components separately.
This "decoupled mode" separates GACUA's š§ Brain (which requires API access) from its šŖ Body (which executes commands), allowing them to operate on different machines.
Important
A stable network connection between the two machines is crucial for this mode to function correctly.
-
Start the MCP computer server (the šŖ Body).
On the computer you want to control, run the following command. This machine does not need your Gemini API keys.
npx @gacua/mcp-computer --host <MCP_HOST> --port <MCP_PORT>
This command starts the MCP server, which will listen for commands to execute on the local machine.
-
Launch the GACUA backend (the š§ Brain).
On the controlling device with authenticated access to the Gemini API, run the following command:
GACUA_MCP_COMPUTER_URL=http://<MCP_HOST>:<MCP_PORT>/mcp npx @gacua/backend
GACUA_MCP_COMPUTER_URL
: it tells the "Brain" the endpoint of the "Body" you started in the previous step.
Interested in contributing to GACUA? Hereās how you can get your development environment set up and run the project from source.
After cloning the repository, you need to install the dependencies and perform an initial build.
-
Install all package dependencies.
npm install
-
Build all packages.
This command compiles all the packages within the monorepo.
npm run build
For active development, this mode provides hot-reloading for the frontend and backend.
Start development servers.
npm run dev:gacua
This command starts the Vite frontend server (on port 5173
) and the Express backend server (on port 3000
). Follow the link printed in your terminal, but remember to change the backend URL's port from 3000
to 5173
in the UI. The Vite server is configured to proxy requests to the backend.
Important
The dev
command only watches for changes in the @gacua/backend
and @gacua/frontend
packages. If you modify any other package, you will need to stop the server and run npm run build
again.
To run the application as it would be in production, where the backend serves the built frontend files.
-
Build the project (if you have new changes).
npm run build
-
Start the application.
npm run start:gacua
In this mode, the frontend artifacts are served by the backend, so you can access the entire application on port 3000
.
To test the gacua
command-line interface from your local build (simulating how a user would run it), follow these steps carefully.
-
Install dependencies.
npm install
-
Build all packages.
npm run build
-
Install again to link the binary.
npm install
This second
npm install
is crucial. After thebuild
step creates the executable files, this command links the localgacua
binary into thenode_modules/.bin
directory, making it available tonpx
. -
Run GACUA.
npx gacua
- GACUA: A Free and Open-Source Computer Use Agent for Developers: The technical journey behind GACUA, GACUAās design philosophy, our thoughts on the future, and more.
- Under the Hood: GACUA's Architecture: A deep dive into GACUA's core decoupled components.
- Troubleshooting: Solutions to common issues, such as the agent capturing black screenshots when run via SSH.
GACUA is just getting started. Here are some of the key directions we can explore to make GACUA more powerful, flexible, and reliable.
-
Enhanced grounding
Details
To further improve grounding accuracy, we can adopt a "heavy mode". This mode calls the model twice consecutively (with varying temperatures). If the bounding box overlap exceeds 50%, then the overlap is adopted as the result. Otherwise, the process is repeated until two consecutive results exceed 50%.
-
Pluggable agent architecture
Details
GACUA's architecture decouples the Interface from the Agent, which allows you to replace various componentsāāincluding models, tools, system prompts, and workflows. Additionally, you can leverage the GACUA UI for debugging, as it shows the entire "Planning" and "Grounding" process. Moreover, you can also use GACUA for rapidly testing and benchmarking different vision models.
-
Autonomous tool & skill acquisition
Details
Repetitive sub-tasks, like "opening a specific webpage in Chrome," are inefficient and token-intensive. You can empower GACUA to recognize and summarize these recurring operational patterns, automatically creating new, persistent tools. These self-generated tools can then be called by the agent in future runs, allowing it to learn and continuously improve its capabilities over time.
-
CLI mode
Details
Once GACUA's capabilities are robust enough for users to trust it with full autonomy, we can introduce a CLI mode (similar to Gemini CLI). This will also allow GACUA to function as a standardized tool that can be used by other agents.
-
Prompt management
Details
To improve efficiency, we can optimize manage complex prompts. This will allow you to save long, frequently used prompts as configurations and reference them later with a simple @alias (a form of manual RAG), keeping your process streamlined.
GACUA is built on top of Gemini CLI and inspired by Agent-S and nut.js. We're grateful for their contributions.
GACUA is licensed under the Apache License 2.0.