-
Microsoft Research
- Beijing
Stars
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
Train the smallest LM you can that fits in 16MB. Best model wins!
A curated list of papers, tools, and benchmarks on LLM-based computer-use agents, covering both terminal/CLI and GUI approaches.
你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候,对你的期望是很高的。 一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.
AI agents running research on single-GPU nanochat training automatically
A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods
SkillsBench evaluates how well skills work and how effective agents are at using them
Shaping capabilities with token-level pretraining data filtering
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Computer Environments Elicit General Agentic Intelligence in LLMs
Arena-Hard-Auto: An automatic LLM benchmark.
Scalable toolkit for efficient model reinforcement
Agent harness built with LangChain and LangGraph. Equipped with a planning tool, a filesystem backend, and the ability to spawn subagents - well-equipped to handle complex agentic tasks.
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
A Text-Based Environment for Interactive Debugging
Official Implementation for the paper "VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models"
The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.5, GPT-OSS, Llama, and more!
Bash is all you need - A nano claude code–like 「agent harness」, built from 0 to 1
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
A lightweight, powerful framework for multi-agent workflows