GnarlyMshtep

🟣

boolin!

Matan Shtepel GnarlyMshtep

🟣

boolin!

AI Safety PhD student at CMU, goon out in the world

18 followers · 75 following

Carnegie Mellon (PhD, AI Safety)
Pittsburgh, USA
05:59 (UTC -04:00)
matanshtepel.com

Achievements

Highlights

Starred repositories

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 5,735 803 Updated May 20, 2026

tolgadur / phantom-transfer

This is a LASR Labs project supervised by Mary Phuong.

Python 12 3 Updated Feb 15, 2026

Danau5tin / terminal-bench-rl

GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.

Python 384 26 Updated Aug 24, 2025

lithos-ai / motus

The open-source agent-serving project

Python 454 28 Updated May 12, 2026

ApGa / platoon

Build and train systems of agents.

Python 4 2 Updated May 17, 2026

thinking-machines-lab / tinker-cookbook

Post-training with Tinker

Python 3,327 421 Updated May 20, 2026

areal-project / AReaL

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python 5,199 499 Updated May 20, 2026

GeodesicResearch / externalization

Incentivizing externlization via early exiting

Jupyter Notebook 41 2 Updated Feb 6, 2026

PeterGriffinJin / Search-R1

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Python 4,743 427 Updated Nov 13, 2025

secure-foundations / owl

Compositional Verification of Security Protocols

Rust 34 2 Updated May 7, 2026

goombalab / mohawk

Python 7 1 Updated Mar 9, 2026

PrimeIntellect-ai / prime-rl

Agentic RL Training at Scale

Python 1,387 292 Updated May 20, 2026

brentyi / tyro

CLI interfaces & config objects, from types

Python 1,050 46 Updated May 9, 2026

AR-FORUM / hodoscope

Analyze AI agent trajectories: extract actions, summarize, embed, and visualize.

Python 111 10 Updated Apr 14, 2026

gsd-build / get-shit-done

A light-weight and powerful meta-prompting, context engineering and spec-driven development system for Claude Code by TÂCHES.

JavaScript 63,229 5,372 Updated May 20, 2026

yaspar-org / Sundance-SMT

An SMT solver for program verification

SMT 13 2 Updated May 19, 2026

UKGovernmentBEIS / control-arena

ControlArena is a collection of settings, model organisms and protocols - for running control experiments.

Python 195 117 Updated May 18, 2026

openai / model_spec

The OpenAI Model Spec

786 91 Updated Mar 25, 2026

gnachman / iTerm2

iTerm2 is a terminal emulator for Mac OS X that does amazing things.

Objective-C 17,583 1,385 Updated May 20, 2026

Responsible-Dataset-Sharing / easy-dataset-share

A CLI tool that helps AI researchers share datasets responsibly.

Python 22 10 Updated Sep 15, 2025

saulpw / visidata

A terminal spreadsheet multitool for discovering and arranging data

Python 9,096 346 Updated May 20, 2026

PaulLockett / CodeSignal_Practice_Industry_Coding_Framework

Practice The CodeSignal Pre-screen for the Industry Coding Framework.

Python 350 81 Updated Mar 17, 2026

rgreenblatt / compose_facts

Python 1 1 Updated Jan 2, 2026

kristopherjohnson / MenubarCountdown

Menubar countdown timer for macOS

Swift 200 26 Updated Nov 21, 2023

obra / superpowers

An agentic skills framework & software development methodology that works.

Shell 199,291 17,770 Updated May 14, 2026

disler / claude-code-hooks-mastery

Master Claude Code Hooks

Python 3,693 616 Updated Mar 4, 2026

scaleapi / mrt

https://scale.com/research/mrt

Jupyter Notebook 18 3 Updated Mar 16, 2026

TransluceAI / docent

Python 103 11 Updated May 19, 2026

gepa-ai / gepa

Optimize prompts, code, and more with AI-powered Reflective Text Evolution

Jupyter Notebook 4,532 378 Updated May 18, 2026

donethatai / donethat-releases

Release repo for donethat

1 Updated May 15, 2026

Matan Shtepel GnarlyMshtep

Highlights

Starred repositories

oram