Lists (1)
Sort Name ascending (A-Z)
Stars
🚀 Text2Grad: Converting natural language feedback into gradient signals for precise model optimization. Revolutionizing RLHF with span-level rewards and targeted improvements across code generation…
A curated list of papers and resources on Reward Hacking, Emergent Misalignment, and Proxy Exploitation in Large Models
AI agents running research on single-GPU nanochat training automatically
SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.
SkillsBench evaluates how well skills work and how effective agents are at using them.
A curated collection of papers and resources on On-Policy Distillation for Large Language Models.
The agent that grows with you
Train the smallest LM you can that fits in 16MB. Best model wins!
Optimize prompts, code, and more with AI-powered Reflective Text Evolution
[Survey] A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
你想蒸馏的下一个员工,何必是同事。蒸馏任何人的思维方式——心智模型、决策启发式、表达DNA。Distill how anyone thinks.
PPT Template of ShanghaiTech University. Include Powerpoint, Markdown Marp, LaTeX Beamer
提供多款 Shadowrocket 规则,拥有强劲的广告过滤功能。每日 8 时重新构建规则。
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"
Official code for paper "TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning"
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
🎃 A fast, out-of-the-box terminal built for AI coding.
原汁原昧 Claude Code 可运行,可构建, 可调试版; 生产级工程化, 企业级可靠性; 安全无毒, 内存泄露修复
ultmaster / claude-code
Forked from ultraworkers/claw-codeClaude Code Snapshot for Research. All original source code is the property of Anthropic.
OpenClaw-RL: Train any agent simply by talking
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Curated list of AutoResearch use cases with optimization traces and open source implementations
TimeCopilot: the GenAI Forecasting Agent. Built on LLMs and Time Series Foundation Models, it lets you forecast, cross-validate, and detect anomalies using multiple foundation models through a sing…
Official code for "ConTSG-Bench: A Unified Benchmark for Conditional Time Series Generation" (ICML 2026)