- Beijing, China
Stars
AI agents running research on single-GPU nanochat training automatically
2026年最新ChatGPT充值订阅教程(117元/月):本文会重点介绍五种开通ChatGPT Plus会员的方法,包括购买ChatGPT Plus独立账号、为你的ChatGPT代充值、拼车合租ChatGPT Plus账号、使用苹果Apple礼品卡充值ChatGPT会员、使用国外的虚拟信用卡订阅ChatGPT Plus会员。
"🐈 nanobot: The Ultra-Lightweight OpenClaw"
Reinforcement Learning via Self-Distillation (SDPO)
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
A cross-platform desktop All-in-One assistant tool for Claude Code, Codex, OpenCode, openclaw & Gemini CLI.
Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature convergence and unlock greater RL potential.
A minimal implementation of DeepMind's Genie world model
Mobile-Agent: The Powerful GUI Agent Family
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
verl: Volcano Engine Reinforcement Learning for LLMs
Fully open reproduction of DeepSeek-R1
Understanding R1-Zero-Like Training: A Critical Perspective
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
The simplest, fastest repository for training/finetuning medium-sized GPTs.