- UC Berkeley
Stars
This dataset contains 3,167 completed tasks of human-computer interactions captured with video, screenshots, DOM snapshots, and detailed interaction events. Created by Paradigm Shift AI for advanci…
Neurosim is a Python framework for building, running, and evaluating AI agent systems. It provides core primitives for agent evaluation, cloud storage integration, and an LLM-as-a-judge system for …
Screen recording and computer interaction capture tool that records keyboard/mouse input, screen video, DOM snapshots, and accessibility trees. Perfect for creating datasets to train and evaluate c…
Evaluation system for computer-use agents that uses LLMs to assess agent performance on web browsing and interaction tasks. This judge system reads screenshots, agent trajectories, and final result…
Agent-CE is a containerized continuous evaluation (CE) platform for web browsing agents. It provides production-ready Docker images and CI/CD pipelines for running and evaluating multiple agent fra…
Screen recording and computer interaction capture tool that records keyboard/mouse input, screen video, DOM snapshots, and accessibility trees. Perfect for creating datasets to train and evaluate c…
Python scripts for generating and categorizing web browsing tasks for benchmark datasets
This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.
A Quick and Dirty Progressive Neural Network written in TensorFlow.
Advantage async actor-critic Algorithms (A3C) and Progressive Neural Network implemented by tensorflow.
The Hacker Within at the University of California - Berkeley