An interactive, on-device AI agent platform that runs a ReAct (Reasoning + Action) loop inside a Web Worker. It leverages Chrome's experimental built-in Gemini Prompt API for private, local, and cost-free inference, combining custom tools, modular skills, and persistent long-term memory.
- On-Device LLM Inference: Runs entirely in the browser using Chrome's built-in
LanguageModelAPI (window.LanguageModel), eliminating the need for external API keys or network latency. - Asynchronous Web Worker Architecture:
- prompt-chain-host.js runs on the main browser thread to manage the LLM session.
- prompt-chain-worker.js runs in a background thread to orchestrate the agent loop, execute tools, and handle errors, keeping the user interface completely responsive.
- Dynamic Skill & Tool Retrieval (Lightweight RAG): Matches the user prompt against loaded skills and tools using a token-overlap scorer, feeding only relevant context to the prompt and preserving token limits.
- Long-term Memory with Auto-Summarization:
- Uses agent-memory.js to store conversation histories locally in the browser via IndexedDB.
- Implements automatic conversation summarization (defined in utils.js) once the chat history exceeds 5 turns, ensuring the context window remains optimized.
- Interactive UI Stream: A sleek interface built with HTML/CSS that displays the real-time agent reasoning steps (Thoughts, Actions, and Observations) alongside the final response.
- index.html & styles.css: The frontend user interface containing input fields, suggestion chips, reasoning stream log viewports, and loaded skills indicators.
- my-agent.js: The Web Worker entry point. It instantiates the worker, defines global tools (like
CalculatorandFetchData), and loads dynamic skills. - prompt-chain-worker.js: The core ReAct loop manager. Parses LLM output JSON structure, calls tool execution logic, handles timeouts/retries, and updates IndexedDB.
- prompt-chain-host.js: Manages main thread events, initializes Chrome's built-in model, translates LLM requests from the worker, and dispatches log streams to the UI.
- prompt-template.js: Assembles the prompt structure including system rules, few-shot examples, relevant tools, active skill instructions, and prior history summary.
- skills/:
- weather/: A sample modular skill containing:
- SKILL.md: Markdown file containing attributes (YAML frontmatter) and system instructions.
- tools.js: Local implementation of weather-fetching mock tools.
- weather/: A sample modular skill containing:
This project requires a Chrome version (or Chromium-based browser like Chrome Canary) with the experimental Prompt API enabled.
- Open Google Chrome.
- Navigate to
chrome://flags/#optimization-guide-on-device-modeland set it to Enabled BypassPrefRequirement (or Enabled). - Navigate to
chrome://flags/#prompt-api-for-gemini-nanoand set it to Enabled. - Relaunch Chrome.
- Wait for the on-device model to download in the background (you can verify it by opening DevTools console and checking if
window.LanguageModelis defined).
Because the project loads ES6 modules (import/export) and spins up Web Workers dynamically, opening index.html directly from your file system (file:// protocol) will fail due to CORS security policies. You must serve it using a local web server.
If you have Node.js installed, open a terminal in the project directory and run:
npx serveOr:
npx http-serverThen navigate to the URL provided in the console (usually http://localhost:3000 or http://localhost:8080).
If you have Python installed, run the following command in your terminal:
python -m http.server 8000Then open your browser and navigate to http://localhost:8000.
If you are using Visual Studio Code, you can install the Live Server extension, open the project workspace, and click the Go Live button in the status bar.
- Make sure your browser has the Prompt API enabled.
- Launch the local server and open the page.
- The page will display the WeatherExpert skill loaded in the sidebar.
- Select one of the pre-defined suggestions (e.g., Weather in Tokyo) or type your own query.
- Click Run Agent.
- Follow the Agent Reasoning Stream to see the agent's thought process, how it chooses to execute the weather or math tools, and its final response.
This project is licensed under the MIT License.