Skip to content
View GnarlyMshtep's full-sized avatar
🟣
boolin!
🟣
boolin!
  • Carnegie Mellon (PhD, AI Safety)
  • Pittsburgh, USA
  • 05:59 (UTC -04:00)

Highlights

  • Pro

Block or report GnarlyMshtep

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

slime is an LLM post-training framework for RL Scaling.

Python 5,735 803 Updated May 20, 2026

This is a LASR Labs project supervised by Mary Phuong.

Python 12 3 Updated Feb 15, 2026

GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.

Python 384 26 Updated Aug 24, 2025

The open-source agent-serving project

Python 454 28 Updated May 12, 2026

Build and train systems of agents.

Python 4 2 Updated May 17, 2026

Post-training with Tinker

Python 3,327 421 Updated May 20, 2026

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python 5,199 499 Updated May 20, 2026

Incentivizing externlization via early exiting

Jupyter Notebook 41 2 Updated Feb 6, 2026

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Python 4,743 427 Updated Nov 13, 2025

Compositional Verification of Security Protocols

Rust 34 2 Updated May 7, 2026
Python 7 1 Updated Mar 9, 2026

Agentic RL Training at Scale

Python 1,387 292 Updated May 20, 2026

CLI interfaces & config objects, from types

Python 1,050 46 Updated May 9, 2026

Analyze AI agent trajectories: extract actions, summarize, embed, and visualize.

Python 111 10 Updated Apr 14, 2026

A light-weight and powerful meta-prompting, context engineering and spec-driven development system for Claude Code by TÂCHES.

JavaScript 63,229 5,372 Updated May 20, 2026

An SMT solver for program verification

SMT 13 2 Updated May 19, 2026

ControlArena is a collection of settings, model organisms and protocols - for running control experiments.

Python 195 117 Updated May 18, 2026

The OpenAI Model Spec

786 91 Updated Mar 25, 2026

iTerm2 is a terminal emulator for Mac OS X that does amazing things.

Objective-C 17,583 1,385 Updated May 20, 2026

A CLI tool that helps AI researchers share datasets responsibly.

Python 22 10 Updated Sep 15, 2025

A terminal spreadsheet multitool for discovering and arranging data

Python 9,096 346 Updated May 20, 2026

Practice The CodeSignal Pre-screen for the Industry Coding Framework.

Python 350 81 Updated Mar 17, 2026
Python 1 1 Updated Jan 2, 2026

Menubar countdown timer for macOS

Swift 200 26 Updated Nov 21, 2023

An agentic skills framework & software development methodology that works.

Shell 199,291 17,770 Updated May 14, 2026

Master Claude Code Hooks

Python 3,693 616 Updated Mar 4, 2026

https://scale.com/research/mrt

Jupyter Notebook 18 3 Updated Mar 16, 2026
Python 103 11 Updated May 19, 2026

Optimize prompts, code, and more with AI-powered Reflective Text Evolution

Jupyter Notebook 4,532 378 Updated May 18, 2026

Release repo for donethat

1 Updated May 15, 2026
Next