AlphaPav

🏠

Working from home

AlphaPav

🏠

Working from home

163 followers · 73 following

University of Illinois Urbana-Champaign
https://alphapav.github.io/

Achievements

x2 x2

Achievements

x2 x2

Highlights

Organizations

Stars

tailcallhq / forgecode

AI enabled pair programmer for Claude, GPT, O Series, Grok, Deepseek, Gemini and 300+ models

Rust 7,409 1,448 Updated Jun 15, 2026

sunblaze-ucb / cybergym

CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on real-world vulnerability analysis tasks.

Python 413 59 Updated May 18, 2026

openai / codex

Lightweight coding agent that runs in your terminal

Rust 91,027 13,443 Updated Jun 15, 2026

ServiceNow / BrowserGym

🌎💪 BrowserGym, a Gym environment for web task automation

Python 1,250 177 Updated Mar 17, 2026

JuliusBrussee / caveman

🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman

JavaScript 72,519 4,089 Updated Jun 12, 2026

openai / whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Python 102,727 12,535 Updated Apr 15, 2026

alchaincyf / nuwa-skill

你想蒸馏的下一个员工，何必是同事。蒸馏任何人的思维方式——心智模型、决策启发式、表达DNA。Distill how anyone thinks.

Python 24,301 3,562 Updated Jun 14, 2026

google / oss-fuzz

OSS-Fuzz - continuous fuzzing for open source software.

Shell 12,345 2,788 Updated Jun 14, 2026

karpathy / autoresearch

AI agents running research on single-GPU nanochat training automatically

Python 86,744 12,565 Updated Mar 26, 2026

openai / frontier-evals

OpenAI Frontier Evals

Python 1,220 162 Updated Apr 21, 2026

anomalyco / opencode

The open source coding agent.

TypeScript 174,416 21,089 Updated Jun 15, 2026

relai-ai / relai-sdk

A platform for building reliable AI agents

Python 101 6 Updated Apr 3, 2026

trailofbits / anamorpher

image scaling attacks for multi-modal prompt injection

Python 1,060 93 Updated May 19, 2026

google-research / android_world

AndroidWorld is an environment and benchmark for autonomous agents

Python 794 155 Updated Jun 12, 2026

OSU-NLP-Group / Online-Mind2Web

An Illusion of Progress? Assessing the Current State of Web Agents

Python 180 12 Updated May 28, 2026

MinorJerry / WebVoyager

Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"

Python 1,096 119 Updated Mar 4, 2024

open-thought / reasoning-gym

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Python 1,444 120 Updated Apr 17, 2026

facebookresearch / Meta_SecAlign

Repo for the paper "Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks".

Python 68 18 Updated Jun 11, 2026

uiuc-kang-lab / InjecAgent

Python 144 26 Updated Jul 2, 2024

facebookresearch / rl-injector

Official release of code for the paper RL is a hammer and LLMs are nails A simple RL approach to stronger prompt injection attacks

Python 52 6 Updated May 6, 2026

AI-secure / PolyGuard

Python 22 2 Updated Jun 18, 2025

algorithmicsuperintelligence / openevolve

Open-source implementation of AlphaEvolve

Python 6,544 1,046 Updated Mar 18, 2026

docling-project / docling

Get your documents ready for gen AI

Python 61,560 4,304 Updated Jun 14, 2026

microsoft / latent-zoning-networks

[NeurIPS 2025] Latent Zoning Networks

Python 61 3 Updated Jun 5, 2026

SWE-agent / mini-swe-agent

The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!

Python 5,160 710 Updated Jun 13, 2026

google-gemini / gemini-cli

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 105,277 14,067 Updated Jun 15, 2026

purpcode-uiuc / purpcode

🔮Reasoning for Safer Code Generation; 🥇Winner Solution of Amazon Nova AI Challenge 2025

Python 39 3 Updated Aug 24, 2025

QwenLM / qwen-code

An open-source AI coding agent that lives in your terminal.

TypeScript 25,213 2,505 Updated Jun 14, 2026

metauto-ai / agent-as-a-judge

👩‍⚖️ Agent-as-a-Judge: The Magic for Open-Endedness

HTML 781 105 Updated Mar 28, 2026

eval-sys / mcpmark

MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.

Python 428 37 Updated Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AlphaPav

Achievements

Achievements

Highlights

Organizations

Block or report AlphaPav

Stars

tailcallhq / forgecode

sunblaze-ucb / cybergym

openai / codex

ServiceNow / BrowserGym

JuliusBrussee / caveman

openai / whisper

alchaincyf / nuwa-skill

google / oss-fuzz

karpathy / autoresearch

openai / frontier-evals

anomalyco / opencode

relai-ai / relai-sdk

trailofbits / anamorpher

google-research / android_world

OSU-NLP-Group / Online-Mind2Web

MinorJerry / WebVoyager

open-thought / reasoning-gym

facebookresearch / Meta_SecAlign

uiuc-kang-lab / InjecAgent

facebookresearch / rl-injector

AI-secure / PolyGuard

algorithmicsuperintelligence / openevolve

docling-project / docling

microsoft / latent-zoning-networks

SWE-agent / mini-swe-agent

google-gemini / gemini-cli

purpcode-uiuc / purpcode

QwenLM / qwen-code

metauto-ai / agent-as-a-judge

eval-sys / mcpmark