Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
A curated list of awesome autonomous researcher frameworks
Autonomous experiment loop skill for Claude Code — port of pi-autoresearch
[ACL'26 Oral] AgentOCR is a token-efficient framework that compresses multi-turn agent history by rendering it into images and adopting RL-driven self-compression
A low-cost, generalized SLM fine-tuning that excels at Text2SQL tasks
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous …
miniClaudeCode - 从50万行蒸馏到~1000行的Claude Code核心Agent架构最小复现 | Distilled Claude Code agent framework
A version of verl to support diverse tool use [TMLR 2026]
GRPO on Qwen2.5-1.5B base and instruct with verl on GSM8K
A minimal viable implementation to achieve GRPO based on veRL and TRL.
一份面向实践者的 verl 框架使用教程。verl 是字节跳动开源的大语言模型强化学习训练框架,支持 PPO、GRPO 等多种算法,以及分布式训练、AgentRL 等场景。
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Crack LeetCode, not only how, but also why.
This is a speech interaction system built on an open-source model, integrating ASR, LLM, and TTS in sequence. The ASR model is SenceVoice, the LLM models are QWen2.5-0.5B/1.5B, and there are three …
Real-time Vision Language Model interaction via webcam - WebRTC-based web interface
《御舆:解码 Agent Harness》42万字拆解 AI Agent 的Harness骨架与神经 —— Claude Code 架构深度剖析,15 章从对话循环到构建你自己的 Agent Harness。在线阅读网站:
from vibe coding to agentic engineering - practice makes claude perfect
Performing SFT and GRPO (DAPO) on Sapient lab's HRM-Text 1.2B model to maximize the MATH benchmark.
HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning.
[ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"
This is the official repository for "SAFE: Multitask Failure Detection for Vision-Language-Action Models" (NeurIPS 2025)