-
Peking University
- Peking University
-
22:45
(UTC -12:00) - https://murraytom.github.io/
Stars
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.
MurrayTom / claude-code
Forked from glwhappen/claude-codeClaude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
[ICLR'26 Oral] RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai
A survey on security in hierarchical autonomy evolution of AI agents
A user-friendly & efficient knowledge distillation framework for LLMs, supporting off-policy, on-policy (OPD), cross-tokenizer, multimodal, and on-policy self-distillation.
Official implementation of “Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities”
Set of tools to assess and improve LLM security.
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"
MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.
Qwen3.6 is the large language model series developed by Qwen team, Alibaba Group.
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
A version of verl to support diverse tool use [TMLR 2026]
Benchmarking Language Agents Under Controllable and Extreme Context Growth