-
The Hong Kong University of Science and Technology
- jxhe.github.io
- @junxian_he
Stars
Benchmarking multimodal agents on realistic, ultra-challenging visual scenarios requiring long-horizon hybrid tool use.
Benchmarking Language Agents Under Controllable and Extreme Context Growth
[KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
Daytona is a Secure and Elastic Infrastructure for Running AI-Generated Code
Harbor is a framework for running agent evaluations and creating and using RL environments.
Collection of apple-native tools for the model context protocol.
MiniMax M2.1, a SOTA model for real-world dev & agents.
A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.
Post-training with Tinker
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
MiniMax-M2, a model built for Max coding & agentic workflows.
[ICML2025 Oral] LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks, this platform introduces evolving environment…
slime is an LLM post-training framework for RL Scaling.
A clean, modular SDK for building AI agents with OpenHands V1.
The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"
[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".