Skip to content
View MurrayTom's full-sized avatar

Block or report MurrayTom

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Python 681 43 Updated May 30, 2026

SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.

Python 8,200 794 Updated Jun 17, 2026

Awesome List for Agentic RL

HTML 1,613 62 Updated May 26, 2026

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

TypeScript 24 7 Updated Mar 31, 2026

The Definitive AI Agent Benchmark

Python 175 20 Updated Jun 17, 2026

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

Python 21 1 Updated Mar 13, 2026

[ICLR'26 Oral] RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

Python 55 11 Updated Feb 9, 2026

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

Python 673 59 Updated May 17, 2026

PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

Python 1,240 140 Updated Jun 2, 2026

A lifecycle guard skill.

181 1 Updated Mar 27, 2026

MicroVM Runtime

Rust 43 4 Updated Jun 17, 2026

Agentic Agent Framework

Rust 137 3 Updated Jun 17, 2026

Diagnostic Framework for LLMs and MLLMs

Python 38 Updated Mar 2, 2026

All-in-One Safety Evaluation Framwork

Python 50 Updated Apr 21, 2026

A survey on security in hierarchical autonomy evolution of AI agents

18 1 Updated Mar 10, 2026

A user-friendly & efficient knowledge distillation framework for LLMs, supporting off-policy, on-policy (OPD), cross-tokenizer, multimodal, and on-policy self-distillation.

Python 199 15 Updated Jun 18, 2026

Official implementation of “Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities”

Python 14 1 Updated Apr 1, 2026

Set of tools to assess and improve LLM security.

Python 4,232 739 Updated Jun 12, 2026

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

Python 9,654 971 Updated Jun 17, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 379,320 79,408 Updated Jun 18, 2026
Python 21 6 Updated Dec 18, 2025

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

Python 2,021 198 Updated Jun 9, 2026
Python 503 36 Updated Oct 16, 2025

OmniGAIA: Towards Native Omni-Modal AI Agents

Python 134 5 Updated Apr 2, 2026

MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.

Python 429 37 Updated Jun 12, 2026
Python 63 6 Updated Jan 31, 2026

Qwen3.6 is the large language model series developed by Qwen team, Alibaba Group.

3,582 238 Updated Jun 3, 2026

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Python 1,296 120 Updated Aug 16, 2025

A version of verl to support diverse tool use [TMLR 2026]

Python 1,001 83 Updated Jun 8, 2026

Benchmarking Language Agents Under Controllable and Extreme Context Growth

Python 48 8 Updated Apr 29, 2026
Next