Starred repositories
LLM Chess - evaluating Large Language Models' reasoning and instruction-following abilities by simulating chess games
A collection of various llm pruning implementations, training code for GPUs & TPUs, and evaluation script.
CATArena is an engineering-level tournament evaluation platform for Large Language Model-driven code agents (LLM-driven code agents), based on an iterative competitive peer learning framework.
"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://ai4trade.ai Tech Report Link: https://arxiv.org/abs/2512.10971
Synthetic data curation for post-training and structured data extraction
Benchmark LLM reasoning capability by solving chess puzzles.
Training VLM agents with multi-turn reinforcement learning
Harsh Jhamtani*, Varun Gangal*, Eduard Hovy, Graham Neubig, Taylor Berg-Kirkpatrick. Learning to Generate Move-by-Move Commentary for Chess Games from Large-Scale Social Forum Data. ACL 2018
Open source neural network chess engine with GPU acceleration and broad hardware support.
A Text-Based Environment for Interactive Debugging
This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models".
Fully open reproduction of DeepSeek-R1
[ICLR 2026] Learning to Reason without External Rewards
A library for generative social simulation
AI paper trading project inspired by nof1 Alpha Arena, using cctx for quotation.
Procgen Benchmark: Procedurally-Generated Game-Like Gym-Environments
Defeating the Training-Inference Mismatch via FP16
Natural Language Reinforcement Learning
Post-training with Tinker
A library for mechanistic interpretability of GPT-style language models
MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
[ICLR 2026] Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play.
An extensible benchmark for evaluating large language models on planning
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
All the source code for "Robot Learning: A Tutorial". Get involved to be featured in the next iteration!