Skip to content
View yxq953's full-sized avatar

Block or report yxq953

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Python 999 74 Updated Jun 17, 2026

Awesome List for Agentic RL

HTML 1,615 62 Updated May 26, 2026

codex 永动机,避免codex执行一个长任务总是自动停止的问题

TypeScript 139 12 Updated Apr 3, 2026

A curated list of awesome autonomous researcher frameworks

126 17 Updated Jun 3, 2026

Autonomous experiment loop skill for Claude Code — port of pi-autoresearch

Python 315 31 Updated Mar 24, 2026

awesome autoresearch list

Python 544 39 Updated Jun 18, 2026

[ACL'26 Oral] AgentOCR is a token-efficient framework that compresses multi-turn agent history by rendering it into images and adopting RL-driven self-compression

Python 33 Updated Mar 1, 2026

Agentic RL最详细入门

HTML 28 1 Updated Jun 18, 2026

A low-cost, generalized SLM fine-tuning that excels at Text2SQL tasks

Jupyter Notebook 2 1 Updated Apr 1, 2026

AI Agent 学习路线与资料库收集

HTML 3,876 402 Updated Jun 5, 2026

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous …

Python 35,193 2,254 Updated Mar 26, 2026

miniClaudeCode - 从50万行蒸馏到~1000行的Claude Code核心Agent架构最小复现 | Distilled Claude Code agent framework

Python 104 22 Updated Mar 31, 2026

轻量级本地 AI 编程助手,基于 Claude Code 精简改造

TypeScript 154 28 Updated May 18, 2026

A version of verl to support diverse tool use [TMLR 2026]

Python 1,001 83 Updated Jun 8, 2026

GRPO on Qwen2.5-1.5B base and instruct with verl on GSM8K

Python 1 Updated May 7, 2026

A minimal viable implementation to achieve GRPO based on veRL and TRL.

Python 5 Updated Jun 23, 2025

一份面向实践者的 verl 框架使用教程。verl 是字节跳动开源的大语言模型强化学习训练框架,支持 PPO、GRPO 等多种算法,以及分布式训练、AgentRL 等场景。

75 2 Updated Mar 8, 2026

Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Python 475 32 Updated May 20, 2026

Crack LeetCode, not only how, but also why.

Markdown 134,307 23,599 Updated Feb 28, 2026

This is a speech interaction system built on an open-source model, integrating ASR, LLM, and TTS in sequence. The ASR model is SenceVoice, the LLM models are QWen2.5-0.5B/1.5B, and there are three …

Python 1,235 198 Updated Jun 3, 2026

Real-time Vision Language Model interaction via webcam - WebRTC-based web interface

Python 356 57 Updated Mar 10, 2026

《御舆:解码 Agent Harness》42万字拆解 AI Agent 的Harness骨架与神经 —— Claude Code 架构深度剖析,15 章从对话循环到构建你自己的 Agent Harness。在线阅读网站:

3,697 772 Updated Apr 6, 2026

from vibe coding to agentic engineering - practice makes claude perfect

HTML 58,264 5,852 Updated Jun 18, 2026

Performing SFT and GRPO (DAPO) on Sapient lab's HRM-Text 1.2B model to maximize the MATH benchmark.

Python 1 Updated Jun 10, 2026
Python 4 Updated Jun 7, 2026

HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning.

Python 1,443 134 Updated Jun 17, 2026
Python 63 9 Updated Dec 2, 2025

[ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"

Python 1,005 156 Updated Mar 10, 2026

This is the official repository for "SAFE: Multitask Failure Detection for Vision-Language-Action Models" (NeurIPS 2025)

Python 77 13 Updated May 21, 2026
Next