[ACL'26 Oral] AgentOCR is a token-efficient framework that compresses multi-turn agent history by rendering it into images and adopting RL-driven self-compression

Python 33 Updated Mar 1, 2026

XiaoRed5 / Agentic-RL-Most-Detailed-Intro

Agentic RL最详细入门

HTML 28 1 Updated Jun 18, 2026

sriksmachi / text2sql-slm-finetuning-grpo

A low-cost, generalized SLM fine-tuning that excels at Text2SQL tasks

Jupyter Notebook 2 1 Updated Apr 1, 2026

datawhalechina / Agent-Learning-Hub

AI Agent 学习路线与资料库收集

HTML 3,876 402 Updated Jun 5, 2026

khoj-ai / khoj

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous …

Python 35,193 2,254 Updated Mar 26, 2026

bcefghj / miniClaudeCode

miniClaudeCode - 从50万行蒸馏到~1000行的Claude Code核心Agent架构最小复现 | Distilled Claude Code agent framework

Python 104 22 Updated Mar 31, 2026

txl16095 / MiniClaude

轻量级本地 AI 编程助手，基于 Claude Code 精简改造

TypeScript 154 28 Updated May 18, 2026

TIGER-AI-Lab / verl-tool

A version of verl to support diverse tool use [TMLR 2026]

Python 1,001 83 Updated Jun 8, 2026

serge-honcharenko / qwen-grpo

GRPO on Qwen2.5-1.5B base and instruct with verl on GSM8K

Python 1 Updated May 7, 2026

TonyStark042 / LLM-RL

A minimal viable implementation to achieve GRPO based on veRL and TRL.

Python 5 Updated Jun 23, 2025

guoxz22 / verl-tutorial

一份面向实践者的 verl 框架使用教程。verl 是字节跳动开源的大语言模型强化学习训练框架，支持 PPO、GRPO 等多种算法，以及分布式训练、AgentRL 等场景。

75 2 Updated Mar 8, 2026

NVlabs / GDPO

Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Python 475 32 Updated May 20, 2026

labuladong / fucking-algorithm

Crack LeetCode, not only how, but also why.

Markdown 134,307 23,599 Updated Feb 28, 2026

ABexit / ASR-LLM-TTS

This is a speech interaction system built on an open-source model, integrating ASR, LLM, and TTS in sequence. The ASR model is SenceVoice, the LLM models are QWen2.5-0.5B/1.5B, and there are three …

Python 1,235 198 Updated Jun 3, 2026

NVIDIA-AI-IOT / live-vlm-webui

Real-time Vision Language Model interaction via webcam - WebRTC-based web interface

Python 356 57 Updated Mar 10, 2026

jd-opensource / JoyAI-VL-Interaction

289 6 Updated Jun 12, 2026

lintsinghua / claude-code-book

《御舆：解码 Agent Harness》42万字拆解 AI Agent 的Harness骨架与神经 —— Claude Code 架构深度剖析，15 章从对话循环到构建你自己的 Agent Harness。在线阅读网站：

3,697 772 Updated Apr 6, 2026

shanraisshan / claude-code-best-practice

from vibe coding to agentic engineering - practice makes claude perfect

HTML 58,264 5,852 Updated Jun 18, 2026

evanly-gh / RL-on-HRM-Text

Performing SFT and GRPO (DAPO) on Sapient lab's HRM-Text 1.2B model to maximize the MATH benchmark.

Python 1 Updated Jun 10, 2026

jasoncarreira / hrm-text-agent

Python 4 Updated Jun 7, 2026

sapientinc / HRM-Text

HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning.

Python 1,443 134 Updated Jun 17, 2026

Elessar123 / SAC-FLOW

Python 63 9 Updated Dec 2, 2025

ZhengYinan-AIR / Diffusion-Planner

[ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"

Python 1,005 156 Updated Mar 10, 2026

vla-safe / SAFE

This is the official repository for "SAFE: Multitask Failure Detection for Vision-Language-Action Models" (NeurIPS 2025)

Python 77 13 Updated May 21, 2026

yxq953

Lists (1)

123

Starred repositories

td3

3d-perception

place-recognition