Skip to content
View rk2900's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report rk2900

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🚀 Text2Grad: Converting natural language feedback into gradient signals for precise model optimization. Revolutionizing RLHF with span-level rewards and targeted improvements across code generation…

Python 37 3 Updated Feb 6, 2026

A curated list of papers and resources on Reward Hacking, Emergent Misalignment, and Proxy Exploitation in Large Models

37 3 Updated Apr 17, 2026

AI agents running research on single-GPU nanochat training automatically

Python 88,206 12,764 Updated Mar 26, 2026

SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.

Python 8,859 852 Updated Jun 20, 2026

SkillsBench evaluates how well skills work and how effective agents are at using them.

PDDL 1,379 319 Updated Jun 22, 2026

A curated collection of papers and resources on On-Policy Distillation for Large Language Models.

Python 349 6 Updated Jun 21, 2026

The agent that grows with you

Python 200,219 35,661 Updated Jun 23, 2026

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 5,131 3,329 Updated May 4, 2026
Python 74 11 Updated Apr 26, 2026

tLLM is an test-time training extension of vLLM

Python 42 Updated Apr 26, 2026

Optimize prompts, code, and more with AI-powered Reflective Text Evolution

Jupyter Notebook 5,305 438 Updated Jun 23, 2026

[Survey] A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

2,267 171 Updated May 16, 2026

你想蒸馏的下一个员工,何必是同事。蒸馏任何人的思维方式——心智模型、决策启发式、表达DNA。Distill how anyone thinks.

Python 25,353 3,673 Updated Jun 14, 2026

PPT Template of ShanghaiTech University. Include Powerpoint, Markdown Marp, LaTeX Beamer

SCSS 36 5 Updated Jun 10, 2024

提供多款 Shadowrocket 规则,拥有强劲的广告过滤功能。每日 8 时重新构建规则。

27,806 1,860 Updated Jun 22, 2026

分流规则、重写写规则及脚本。

JavaScript 26,890 3,997 Updated Jun 21, 2026

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

Python 2,040 202 Updated Jun 9, 2026

Official code for paper "TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning"

Python 67 7 Updated Oct 22, 2025

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Python 4,994 445 Updated Nov 13, 2025

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

Python 850 64 Updated May 17, 2026
TypeScript 42 4 Updated May 11, 2026

🎃 A fast, out-of-the-box terminal built for AI coding.

Rust 5,450 277 Updated Jun 21, 2026

原汁原昧 Claude Code 可运行,可构建, 可调试版; 生产级工程化, 企业级可靠性; 安全无毒, 内存泄露修复

TypeScript 20,260 16,245 Updated Jun 23, 2026

Claude Code Snapshot for Research. All original source code is the property of Anthropic.

TypeScript 1 Updated Mar 31, 2026

OpenClaw-RL: Train any agent simply by talking

Python 5,516 598 Updated May 23, 2026

A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

20,653 2,581 Updated Jun 21, 2026

Curated list of AutoResearch use cases with optimization traces and open source implementations

986 73 Updated Jun 2, 2026

TimeCopilot: the GenAI Forecasting Agent. Built on LLMs and Time Series Foundation Models, it lets you forecast, cross-validate, and detect anomalies using multiple foundation models through a sing…

Python 562 78 Updated Jun 18, 2026

Codebase for TIME benchmark

Python 43 5 Updated Jun 7, 2026

Official code for "ConTSG-Bench: A Unified Benchmark for Conditional Time Series Generation" (ICML 2026)

Python 15 1 Updated May 2, 2026
Next