Stars
An open-source AI agent that lives in your terminal.
[RSS'25] This repository is the implementation of "NaVILA: Legged Robot Vision-Language-Action Model for Navigation"
🚀 Efficient implementations of state-of-the-art linear attention models
SkyRL: A Modular Full-stack RL Library for LLMs
Biomni: a general-purpose biomedical AI agent
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource Paper.
A clean, modular SDK for building AI agents with OpenHands V1.
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
slime is an LLM post-training framework for RL Scaling.
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
The AI framework that adds the engineering to prompt engineering (Python/TS/Ruby/Java/C#/Rust/Go compatible)
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!
DSPy: The framework for programming—not prompting—language models
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"
An open protocol enabling communication and interoperability between opaque agentic applications.
Memory engine and app that is extremely fast, scalable. The Memory API for the AI era.
The official Python SDK for Model Context Protocol servers and clients
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
KGym - A platform to run hundreds to thousands of ML4Linux kernel experiments at scale