Lists (2)
Sort Name ascending (A-Z)
Stars
YouTube Thumbnail Generator with AI-powered face detection and image generation
OpenViking is an open-source context database designed specifically for AI Agents(such as openclaw). OpenViking unifies the management of context (memory, resources, and skills) that Agents need th…
The design language that makes your AI harness better at design.
Fully automatic censorship removal for language models
A complete AI agency at your fingertips - From frontend wizards to Reddit community ninjas, from whimsy injectors to reality checkers. Each agent is a specialized expert with personality, processes…
A Simple and Universal Swarm Intelligence Engine, Predicting Anything. 简洁通用的群体智能引擎,预测万物
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…
Model Context Protocol (MCP) server for AI-assisted development ("vibe coding") of MDK applications.
AI agents can now use real Android and iOS apps, just like a human.
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Every front-end GUI client for ChatGPT, Claude, and other LLMs
[NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
GUI Grounding for Professional High-Resolution Computer Use
Agent S: an open agentic framework that uses computers like a human
This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use" (ACL 2025 Oral).
Python script to upload videos on YouTube using Selenium
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
ui-screenshot-to-prompt is an AI-powered tool that analyzes UI images to generate detailed prompts for AI coders. It uses computer vision and natural language processing to break down UI components…
A curated list of of awesome UI agents resources, encompassing Web, App, OS, and beyond (continually updated)
A collection of AI Agents papers (Updated biweekly)
A one stop repository for generative AI research updates, interview resources, notebooks and much more!
JavaScript API for Chrome and Firefox
The model, data and code for the visual GUI Agent SeeClick
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework