Skip to content
View jxhe's full-sized avatar

Organizations

@asyml

Block or report jxhe

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 52 2 Updated Mar 13, 2026

Benchmarking multimodal agents on realistic, ultra-challenging visual scenarios requiring long-horizon hybrid tool use.

Python 43 5 Updated Mar 10, 2026

Benchmarking Language Agents Under Controllable and Extreme Context Growth

Python 34 3 Updated Mar 30, 2026

[KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Python 153 12 Updated Mar 29, 2026

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

JavaScript 124,500 16,578 Updated Mar 31, 2026

Daytona is a Secure and Elastic Infrastructure for Running AI-Generated Code

TypeScript 70,928 5,534 Updated Mar 31, 2026

Harbor is a framework for running agent evaluations and creating and using RL environments.

Python 1,186 834 Updated Mar 31, 2026

Collection of apple-native tools for the model context protocol.

TypeScript 3,043 268 Updated Aug 11, 2025

MiniMax M2.1, a SOTA model for real-world dev & agents.

542 44 Updated Jan 28, 2026

A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.

Jupyter Notebook 36,810 3,996 Updated Mar 31, 2026

Post-training with Tinker

Python 3,004 363 Updated Mar 31, 2026

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

Shell 87,922 9,410 Updated Mar 31, 2026

A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows

Python 49,764 5,157 Updated Feb 19, 2026

[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Python 298 30 Updated Mar 31, 2026

MiniMax-M2, a model built for Max coding & agentic workflows.

2,529 203 Updated Nov 13, 2025

Contexts Optical Compression

Python 22,773 2,095 Updated Jan 27, 2026

[ICML2025 Oral] LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

Python 98 13 Updated Jul 31, 2025

Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification

Python 22 1 Updated Oct 8, 2025

Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Java 2,176 195 Updated Mar 18, 2024

Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks, this platform introduces evolving environment…

Python 464 63 Updated Mar 26, 2026

A Gym for Agentic LLMs

Python 472 31 Updated Jan 21, 2026

slime is an LLM post-training framework for RL Scaling.

Python 5,051 677 Updated Mar 29, 2026

A clean, modular SDK for building AI agents with OpenHands V1.

Python 615 194 Updated Mar 31, 2026

The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"

Python 113 2 Updated Sep 29, 2025

[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".

Python 17 Updated Feb 9, 2026
Next