MurrayTom

MurrayTom

Peking University PhD Student

8 followers · 2 following

Peking University
Peking University
22:45 (UTC -12:00)
https://murraytom.github.io/

Stars

thunlp / OPD

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Python 681 43 Updated May 30, 2026

SkillOpt is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.

Python 8,200 794 Updated Jun 17, 2026

thinkwee / AgentsMeetRL

Awesome List for Agentic RL

HTML 1,613 62 Updated May 26, 2026

MurrayTom / claude-code

Forked from glwhappen/claude-code

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

TypeScript 24 7 Updated Mar 31, 2026

claw-bench / claw-bench

The Definitive AI Agent Benchmark

Python 175 20 Updated Jun 17, 2026

sheep333c / DIVE

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

Python 21 1 Updated Mar 13, 2026

OSU-NLP-Group / RedTeamCUA

[ICLR'26 Oral] RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

Python 55 11 Updated Feb 9, 2026

claw-eval / claw-eval

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

Python 673 59 Updated May 17, 2026

pinchbench / skill

PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

Python 1,240 140 Updated Jun 2, 2026

skill-sonar / Skill-Sonar

A lifecycle guard skill.

181 1 Updated Mar 27, 2026

AI45Lab / Box

MicroVM Runtime

Rust 43 4 Updated Jun 17, 2026

AI45Lab / Code

Agentic Agent Framework

Rust 137 3 Updated Jun 17, 2026

AI45Lab / DeepScan

Diagnostic Framework for LLMs and MLLMs

Python 38 Updated Mar 2, 2026

AI45Lab / DeepSafe

All-in-One Safety Evaluation Framwork

Python 50 Updated Apr 21, 2026

Epiphanyi / HAE-Agent-Security

A survey on security in hierarchical autonomy evolution of AI agents

18 1 Updated Mar 10, 2026

songmzhang / KDFlow

A user-friendly & efficient knowledge distillation framework for LLMs, supporting off-policy, on-policy (OPD), cross-tokenizer, multimodal, and on-policy self-distillation.

Python 199 15 Updated Jun 18, 2026

xiongyuaay / Contextual-Image-Attack

Official implementation of “Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities”

Python 14 1 Updated Apr 1, 2026

meta-llama / PurpleLlama

Set of tools to assess and improve LLM security.

Python 4,232 739 Updated Jun 12, 2026

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

Python 9,654 971 Updated Jun 17, 2026

openclaw / openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 379,320 79,408 Updated Jun 18, 2026

xjzzzzzzzz / MCPSafety

Python 21 6 Updated Dec 18, 2025

langfengQ / verl-agent

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

Python 2,021 198 Updated Jun 9, 2026

qiancheng0 / ToolRL

Python 503 36 Updated Oct 16, 2025

RUC-NLPIR / OmniGAIA

OmniGAIA: Towards Native Omni-Modal AI Agents

Python 134 5 Updated Apr 2, 2026

eval-sys / mcpmark

MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.

Python 429 37 Updated Jun 12, 2026

Chen-GX / IterResearch

Python 63 6 Updated Jan 31, 2026

QwenLM / Qwen3.6

Qwen3.6 is the large language model series developed by Qwen team, Alibaba Group.

3,582 238 Updated Jun 3, 2026

Alibaba-NLP / ZeroSearch

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Python 1,296 120 Updated Aug 16, 2025

TIGER-AI-Lab / verl-tool

A version of verl to support diverse tool use [TMLR 2026]

Python 1,001 83 Updated Jun 8, 2026

hkust-nlp / LOCA-bench

Benchmarking Language Agents Under Controllable and Extreme Context Growth

Python 48 8 Updated Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly