Highlights
Lists (6)
Sort Name ascending (A-Z)
Stars
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
A set of examples based on verl for end-to-end RL training recipes.
Repair malformed JSON from LLMs, APIs, logs, and user input in Python.
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
slime is an LLM post-training framework for RL Scaling.
This is a Chinese translation of the CUDA programming guide
An open-source AI coding agent that lives in your terminal.
SkillsBench evaluates how well skills work and how effective agents are at using them.
🤗 ml-intern: an open-source ML engineer that reads papers, trains models, and ships ML models
Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Skills for Real Engineers. Straight from my .claude directory.
Agent Skills for Google products and technologies
A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.
AI agent toolkit: unified LLM API, agent loop, TUI, coding agent CLI
Official Repository of "Learning to Reason under Off-Policy Guidance"
CL-bench: A Benchmark for Context Learning
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…
50+ tutorials and implementations for Generative AI Agent techniques, from basic conversational bots to complex multi-agent systems.
Official Implementation of "Simulating Environments with Reasoning Models for Agent Training"
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Autonomous AI development loop for Claude Code with intelligent exit detection
💻 vibe coding 2026 | Your first modern Coding course beginners to master step by step.
【三年面试五年模拟】AIGC/LLM/AI Agent算法工程师面试秘籍。涵盖AIGC、LLM大模型、AI Agent、传统深度学习、自动驾驶、机器学习、计算机视觉、自然语言处理、强化学习、大数据挖掘、具身智能、元宇宙、AGI等AI行业面试笔试干货经验与核心知识。
Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA
Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay