Skip to content
View Shangyint's full-sized avatar

Highlights

  • Pro

Organizations

@Generative-Program-Analysis

Block or report Shangyint

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

RoboPhD: Evolving Diverse Complex Agents Under Tight Evaluation Budgets

Python 23 2 Updated Jun 13, 2026

CORAL is a robust, lightweight infrastructure for multi-agent autonomous self-evolution, built for autoresearch. Works with Claude Code, Codex, Cursor, OpenCode, Kiro, and more.

Python 726 94 Updated Jun 12, 2026
Python 59 7 Updated Feb 24, 2026

A Difficulty-Calibrated Benchmark for Building Terminal Agents

Kotlin 21 1 Updated Feb 20, 2026

Create, Evaluate, and Connect AI Skills

Python 1,045 119 Updated May 27, 2026

Super basic implementation (gist-like) of RLMs with REPL environments.

Python 797 135 Updated Jan 7, 2026

Contexts Optical Compression

Python 23,287 2,151 Updated Jan 27, 2026

AI-Driven Research Systems (ADRS)

Jupyter Notebook 143 23 Updated Dec 17, 2025

Checkpoint/Restore tool

C 3,874 748 Updated Jun 12, 2026

build and benchmark deep research

Python 243 31 Updated Mar 28, 2026

Harbor is a framework for running agent evaluations and creating and using RL environments.

Python 2,439 1,155 Updated Jun 14, 2026

[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Python 290 61 Updated Jul 13, 2025

[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Python 389 24 Updated Mar 30, 2026

Optimize prompts, code, and more with AI-powered Reflective Text Evolution

Jupyter Notebook 5,141 431 Updated Jun 13, 2026

Open-source implementation of AlphaEvolve

Python 6,544 1,045 Updated Mar 18, 2026

Recovery-Bench is a benchmark for evaluating the capability of LLM agents to recover from mistakes

Python 25 5 Updated Apr 20, 2026

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

Python 132,374 21,431 Updated Jun 13, 2026

Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.

468 13 Updated Apr 18, 2024

NPUEval is an LLM evaluation dataset written specifically to target AIE kernel code generation on RyzenAI hardware.

C++ 32 5 Updated Nov 8, 2025

MCP server integrating GEPA (Genetic-Evolutionary Prompt Architecture) for automatic prompt optimization with Claude Desktop

Python 49 5 Updated Nov 10, 2025

Test Generation for Prompts

TeX 163 19 Updated May 23, 2026

Renderer for the harmony response format to be used with gpt-oss

Rust 4,406 285 Updated Apr 8, 2026

slime is an LLM post-training framework for RL Scaling.

Python 6,118 895 Updated Jun 14, 2026

Trajectories for running OpenHands on Terminal Bench

4 Updated Jul 25, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 4,277 332 Updated Jun 13, 2026

[NeurIPS '25] GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents

Python 85 5 Updated Apr 27, 2026
HTML 1 Updated Jun 1, 2025

A benchmark for LLMs on complicated tasks in the terminal

Python 2,355 542 Updated Jan 22, 2026

Sky-T1: Train your own O1 preview model within $450

Python 3,390 345 Updated Jul 12, 2025

Agentic testing for agentic codebases

Python 897 65 Updated Jun 14, 2026
Next