Skip to content
View RulinShao's full-sized avatar
™️
™️

Organizations

@uwnlp

Block or report RulinShao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Agentic RL on Any Harness at Scale

Python 569 60 Updated Jun 17, 2026

Benchmarking Open-Ended Inference Optimization by AI Agents

Python 27 4 Updated May 16, 2026

Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍

Shell 25,091 3,590 Updated Jun 14, 2026

Can Language Models Rebuild Programs From Scratch?

Python 767 51 Updated Jun 18, 2026
TypeScript 44 4 Updated Jun 17, 2026

A benchmark for evaluating LLMs on open-ended CS problems. Exploring the Next Frontier of Computer Science.

C++ 244 38 Updated Jun 17, 2026

AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI R&D.

Python 1,323 194 Updated May 2, 2026

🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman

JavaScript 74,115 4,174 Updated Jun 12, 2026

OpenSeeker: A search agent with open-source data and models

Python 749 56 Updated May 22, 2026

Production-grade engineering skills for AI coding agents.

Shell 62,345 6,761 Updated Jun 16, 2026

Real-time AI assistant for Meta Ray-Ban smart glasses -- voice + vision + agentic actions via Gemini Live and OpenClaw

2,386 449 Updated May 6, 2026

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 5,131 3,334 Updated May 4, 2026

Meta-Harness: 76.4% on Terminal-Bench 2.0 (Claude Opus 4.6)

Python 1,101 161 Updated Mar 26, 2026
JavaScript 13,610 1,187 Updated May 31, 2026

SkillsBench evaluates how well skills work and how effective agents are at using them.

PDDL 1,370 317 Updated Jun 18, 2026

Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.

Python 16,575 1,647 Updated Mar 4, 2026

My learning notes for ML SYS.

Python 6,537 445 Updated Jun 18, 2026

OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis

Python 784 78 Updated Jun 10, 2026

CL-bench: A Benchmark for Context Learning

Python 560 29 Updated May 12, 2026

OpenTinker is an RL-as-a-Service infrastructure for foundation models

Python 675 63 Updated Mar 21, 2026

Self-Adapting Language Models

Python 1,779 308 Updated Aug 1, 2025

[AAAI26]: DS SERVE: The Largest Open Vector Store over Pretain Data; A Framework for Efficient and Scalable Neural Retrieval

Python 51 5 Updated Jan 28, 2026

800,000 step-level correctness labels on LLM solutions to MATH problems

Python 2,143 127 Updated Jun 1, 2023

Code repo for "LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners"

Python 91 6 Updated May 30, 2025

Gas Town - multi-agent workspace manager

Go 15,953 1,487 Updated Jun 17, 2026

A curated collection of 1000+ agent skills from official dev teams and the community, compatible with Claude Code, Codex, Gemini CLI, Cursor, and more.

25,677 2,731 Updated Jun 16, 2026

Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepowe…

TeX 9,806 734 Updated Jun 16, 2026

Official JAX implementation of End-to-End Test-Time Training for Long Context

Python 621 47 Updated Feb 15, 2026
Next