Shangyint

Shangyin Tan Shangyint

73 followers · 41 following

UC Berkeley
shangyit.me

Achievements

x2 x2

Achievements

x2 x2

Highlights

Organizations

Stars

andborth / RoboPhD

RoboPhD: Evolving Diverse Complex Agents Under Tight Evaluation Budgets

Python 23 2 Updated Jun 13, 2026

Human-Agent-Society / CORAL

CORAL is a robust, lightweight infrastructure for multi-agent autonomous self-evolution, built for autoresearch. Works with Claude Code, Codex, Cursor, OpenCode, Kiro, and more.

Python 726 94 Updated Jun 12, 2026

eth-sri / agentbench

Python 59 7 Updated Feb 24, 2026

open-thoughts / OpenThoughts-TBLite

A Difficulty-Calibrated Benchmark for Building Terminal Agents

Kotlin 21 1 Updated Feb 20, 2026

zjunlp / SkillNet

Create, Evaluate, and Connect AI Skills

Python 1,045 119 Updated May 27, 2026

alexzhang13 / rlm-minimal

Super basic implementation (gist-like) of RLMs with REPL environments.

Python 797 135 Updated Jan 7, 2026

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 23,287 2,151 Updated Jan 27, 2026

UCB-ADRS / ADRS

AI-Driven Research Systems (ADRS)

Jupyter Notebook 143 23 Updated Dec 17, 2025

checkpoint-restore / criu

Checkpoint/Restore tool

C 3,874 748 Updated Jun 12, 2026

guestrin-lab / deepscholar

build and benchmark deep research

Python 243 31 Updated Mar 28, 2026

harbor-framework / harbor

Harbor is a framework for running agent evaluations and creating and using RL environments.

Python 2,439 1,155 Updated Jun 14, 2026

R2E-Gym / R2E-Gym

[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Python 290 61 Updated Jul 13, 2025

ltzheng / SimpleTIR

[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Python 389 24 Updated Mar 30, 2026

gepa-ai / gepa

Optimize prompts, code, and more with AI-powered Reflective Text Evolution

Jupyter Notebook 5,141 431 Updated Jun 13, 2026

algorithmicsuperintelligence / openevolve

Open-source implementation of AlphaEvolve

Python 6,544 1,045 Updated Mar 18, 2026

letta-ai / recovery-bench

Recovery-Bench is a benchmark for evaluating the capability of LLM agents to recover from mistakes

Python 25 5 Updated Apr 20, 2026

anthropics / claude-code

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

Python 132,374 21,431 Updated Jun 13, 2026