Skip to content

lsd-so/agentflare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Agentflare

Otter as a reference to Cloudflare mascot

This project provides an agent that can use a web browser and a computer to fulfill tasks specified by a user using Cloudflare Workers plus Containers.

Contents

Why?

General

This was motivated by the ChatGPT Agent and how its rather simple capabilities were constrained in terms of available usages (the below's from their announcement):

Pro users have 400 messages per month, while other paid users get 40 messages monthly, with additional usage available via flexible credit-based options.

Expensive

While Cloudflare does offer a product for headless browsers, their pricing can be a bit steep so here we run a chromium container (see here for the inspiration).

Usage

You can view the app (till we sunset this on August 31st) at: https://agentflare.yev-81d.workers.dev/

Note: Some of the below recordings have sections where it's loading cut out for viewability, the end-to-end one shotting is still true and various caching or optimizations can be applied to make live usage more alike the GIFs shown below.

Search example

Here's an example of doing web search to look up news related to Cloudflare and bots:

Using search to look up news related to Cloudflare and bots

Browser example

Here's an example of using a web browser to get page content in order to summarize:

Using a browser to get content related to Cloudflare and bots

Computer example

Here's an example of using a computer to run a command inside the terminal that's opened using a NoVNC server hosted on a Cloudflare container:

Using a computer to run a command inside the terminal

Search and browser example

Here's an example of using search results to fetch links and then a browser to obtain the content to provide a summary of:

Using search and a browser together

Architecture

There's plenty of examples of multi-agent architectures but the underlying premise is simple - why overwhelm the context of a single LLM when you can scope tasks into smaller accomplishable steps?

flowchart TD
    A[Prompt] -->|Deconstruct| B{Top-level agent}
    B -->|Delegate to| C[Search tool]
	C -->|Return result| B
    B -->|Delegate to| D[Browser agent]
	D -->|Return result| B
    B -->|Delegate to| E[Computer agent]
	E -->|Return result| B
Loading

Otherwise you end up trying to one-shot like you're drawing an owl.

Step one draw circles followed by step two showing an entire own

Using Chromium

Due to a security measure that prevents chromium from arbitrarily being accessed or controlled remotely, the image proxies requests with nginx to mask the actual traffic origin. This allows you to run or debug from a machine that's not part of your Cloudflare deployment like so:

// The below code can be run on your local machine while a browser is run in a Cloudflare container
import puppeteer from 'puppeteer';

(async () => {
  const result = await fetch("https://<your deployment identifier>.workers.dev/json/version").then(res => res.json());

  // Launch the browser and open a new blank page
  const browser = await puppeteer.connect({
    browserWSEndpoint: result.webSocketDebuggerUrl.replaceAll('ws://localhost', 'wss://<your deployment identifier>.workers.dev')
  });

  const page = await browser.newPage();

  // Navigate the page to a URL
  await page.goto('https://news.ycombinator.com/');

  // Set screen size
  await page.setViewport({ width: 1080, height: 1024 });

  // Locate the full title with a unique string
  const textSelector = await page.waitForSelector(
    'title',
  );
  const fullTitle = await textSelector?.evaluate(el => el.textContent);

  // Print the full title
  console.log('The title of this page is "%s".', fullTitle);

  await browser.close();
})();

Using computer

If you're unfamiliar with Anthropic's computer use, here's a simplified version of how it looks under the hood (minus the ineditable configuration).

flowchart TD
    A[Goal] -->|Deconstruct| B(Plan)
    B -->|Handled by| C{Computer Agent}
    D -->|Return result| C
    C -->|Call if relevant| D[Keyboard tool]
    C -->|Call if relevant| E[Mouse tool]
    E -->|Return result| C
    C -->|Call if relevant| F[Screenshot tool]
    F -->|Return result| C
Loading

So, first, there needs to be a computer that be tinkered with hence spinning up a VNC server that can be connected to by web browser using NoVNC (as shown below) at https://agentflare.<your_account_id>.workers.dev/vnc.html:

Demo of NoVNC controlling a remote computer

Or by using a JavaScript client with its usage being shown in computer/app.ts.

Running

First, install dependencies.

$ yarn install

Log in to your account if this is your first time interacting with the wrangler CLI.

$ yarn wrangler login

Then, simply deploy.

$ yarn wrangler deploy

At the end of the output, you should see the URL you can open in your browser to view the application.

...
Deployed agentflare triggers (0.34 sec)
    https://agentflare.<your_account_id>.workers.dev      <---- This
  Current Version ID: some-pretty-cool-uuid
  Cloudflare collects anonymous telemetry about your usage of Wrangler. Learn more at https://github.com/cloudflare/workers-sdk/tree/main/packages/wrangler/telemetry.md
  Done in 225.40s.
🏁 Wrangler Action completed