Browse Agent

A browser automation toolkit consisting of a Chrome extension and an npm SDK that communicate via authenticated WebSocket.

Architecture

┌──────────────────┐     WebSocket (HMAC Auth)     ┌──────────────────┐
│   Your Node.js   │◄─────────────────────────────►│ Chrome Extension │
│   Application    │    Commands & Responses       │ (Service Worker) │
│  (browse-agent-  │                               │                  │
│       sdk)       │                               │  ┌───────────────┤
│                  │                               │  │ Content Script│
└──────────────────┘                               └──┴───────────────┘
                                                          │
                                                    ┌─────▼─────┐
                                                    │  Browser  │
                                                    │   Tabs    │
                                                    └───────────┘

Features

Capability	Method
Open URL & get response	`agent.navigate(url)` / `agent.getContent()`
Inject JavaScript	`agent.injectScript(code)` / `agent.evaluate(expr)`
Inject CSS	`agent.injectCSS(code)`
Query DOM	`agent.getDOM(selector)`
Full-page screenshot	`agent.screenshotFullPage()`
Viewport screenshot	`agent.screenshotVisible()`
Area screenshot	`agent.screenshotArea({ x, y, width, height })`
List / close tabs	`agent.listTabs()` / `agent.closeTab(id)`

Security

Communication between the SDK and extension supports two modes:

Shared-secret mode: HMAC-SHA256 mutual authentication
No-secret mode: handshake without HMAC signatures

Shared-secret mode flow:

Server sends a random challenge to the extension
Extension signs the challenge with the shared secret and sends back its own challenge
Server verifies the HMAC, signs the extension's challenge, and sends acknowledgement
Extension verifies the server's HMAC — mutual authentication complete
All subsequent messages include HMAC signatures and timestamps (replay protection)

When secret is empty on both sides, authentication still completes, but HMAC checks are skipped.

Only 127.0.0.1 connections are accepted by the WebSocket server.

Project Structure

browse-agent/
├── .github/workflows/
│   └── release-extension-draft.yml  # Manual action: build + draft release
├── packages/
│   ├── shared/          # Shared types, protocol, HMAC utilities
│   ├── extension/       # Chrome MV3 extension
│   │   └── build/       # Built extension (load this in Chrome)
│   └── sdk/             # npm library for Node.js
├── examples/
│   └── basic-usage.mjs  # Usage example
├── package.json         # Workspace root
└── tsconfig.base.json

Quick Start

Build

npm install
npm run build

Load the Chrome Extension

Open Chrome → chrome://extensions/
Enable "Developer mode" (top right)
Click "Load unpacked"
Select the packages/extension/build/ directory

Configure the Extension

Click the Browse Agent extension icon in Chrome toolbar
Set WebSocket URL: ws://127.0.0.1:9315 (default)
Set Shared Secret: optional (leave empty to use no-secret handshake)
Click Save

Use the SDK

import { BrowserAgent } from 'browse-agent-sdk';

const agent = new BrowserAgent({
  // Optional: set to '' (or omit) for no-secret handshake
  secret: 'same-secret-as-extension',
  port: 9315,
});

await agent.start();
await agent.waitForConnection();

// Navigate to a page
const result = await agent.navigate('https://example.com');
console.log(result.title); // "Example Domain"

// Get page text
const content = await agent.getContent({ format: 'text' });
console.log(content.content);

// Take a full page screenshot
const screenshot = await agent.screenshotFullPage();
// screenshot.data is base64-encoded PNG

// Inject JavaScript
const evalResult = await agent.evaluate('document.querySelectorAll("a").length');
console.log(evalResult.result); // number of links

// Inject CSS
await agent.injectCSS('body { background: yellow !important; }');

// Query DOM elements
const headings = await agent.getDOM('h1', { property: 'innerText', all: true });
console.log(headings.elements); // ["Example Domain"]

// Clean up
await agent.stop();

Run the Example

node examples/basic-usage.mjs

Browse Agent Skill

You can call the built-in skill at skills/browse-agent from any AI assistant that supports loading local Skills in the current workspace.

Open your AI assistant in this repository, or import skills/browse-agent as a Skill module.

Make sure your assistant supports local Skills (for example, AgentGPT, LangSmith, LangAgent, etc.).
Or keep the current workspace at this repository so the assistant can discover the local skill.

Trigger the skill in either way (depends on your assistant UI):

Slash command style: /browse-agent <your task description>
Natural language: describe a web browsing task directly (for example: "visit a URL and extract page text")

Example prompts:

/browse-agent Visit https://example.com and return title + main text
Open https://news.ycombinator.com and list the first 10 post titles
Take a full-page screenshot of https://example.com and save it

Review the returned output.

The skill returns structured data for your task (for example title, url, content, screenshot metadata, or DOM query results).

Note

On first use, the skill workflow should prepare dependencies automatically. If your environment blocks that step or initialization fails, run this fallback manually: node skills/browse-agent/scripts/setup.mjs This command installs browse-agent-sdk and downloads the extension to .browse-agent/extension/.

API Reference

`BrowserAgent(options)`

Option	Type	Default	Description
`secret`	`string`	`''`	Optional shared HMAC secret
`port`	`number`	`9315`	WebSocket server port
`host`	`string`	`127.0.0.1`	WebSocket server host
`timeout`	`number`	`30000`	Default command timeout (ms)

Navigation

navigate(url, options?) — Open a URL in a new tab
getContent(options?) — Get page HTML or text content
listTabs() — List all open tabs
closeTab(tabId) — Close a specific tab

Injection

injectScript(code, tabId?) — Execute JavaScript in the page
injectCSS(code, tabId?) — Inject CSS styles
evaluate(expression, tabId?) — Evaluate a JS expression and return result
getDOM(selector, options?) — Query DOM elements by CSS selector

Screenshots

screenshotFullPage(options?) — Capture entire scrollable page
screenshotVisible(options?) — Capture visible viewport
screenshotArea(clip, options?) — Capture a specific region
screenshot(options) — Generic screenshot method

Development

# Watch mode for extension
npm run dev:extension

# Build everything
npm run build

# Build one package
npm run build:shared
npm run build:sdk
npm run build:extension

# Clean dist folders under packages/*
npm run clean

# Clean extension build output
npm run clean -w packages/extension

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
.vscode		.vscode
examples		examples
packages		packages
skills/browse-agent		skills/browse-agent
.gitignore		.gitignore
README.md		README.md
README.zh-CN.md		README.zh-CN.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Browse Agent

Architecture

Features

Security

Project Structure

Quick Start

Build

Load the Chrome Extension

Configure the Extension

Use the SDK

Run the Example

Browse Agent Skill

API Reference

`BrowserAgent(options)`

Navigation

Injection

Screenshots

Development

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Browse Agent

Architecture

Features

Security

Project Structure

Quick Start

Build

Load the Chrome Extension

Configure the Extension

Use the SDK

Run the Example

Browse Agent Skill

API Reference

BrowserAgent(options)

Navigation

Injection

Screenshots

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`BrowserAgent(options)`

Packages