Skip to content
View jxhe's full-sized avatar

Organizations

@asyml

Block or report jxhe

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 50 2 Updated Mar 13, 2026

Benchmarking multimodal agents on realistic, ultra-challenging visual scenarios requiring long-horizon hybrid tool use.

Python 41 5 Updated Mar 10, 2026

Benchmarking Language Agents Under Controllable and Extreme Context Growth

Python 34 3 Updated Mar 2, 2026

[KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Python 153 10 Updated Mar 26, 2026

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

JavaScript 113,893 14,829 Updated Mar 28, 2026

Daytona is a Secure and Elastic Infrastructure for Running AI-Generated Code

TypeScript 70,877 5,525 Updated Mar 28, 2026

Harbor is a framework for running agent evaluations and creating and using RL environments.

Python 1,145 820 Updated Mar 28, 2026

Collection of apple-native tools for the model context protocol.

TypeScript 3,039 267 Updated Aug 11, 2025

MiniMax M2.1, a SOTA model for real-world dev & agents.

542 44 Updated Jan 28, 2026

A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.

Jupyter Notebook 36,482 3,958 Updated Mar 27, 2026

Post-training with Tinker

Python 2,988 363 Updated Mar 27, 2026

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

Shell 83,808 7,081 Updated Mar 27, 2026

A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows

Python 48,719 5,020 Updated Feb 19, 2026

[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Python 298 30 Updated Mar 19, 2026

MiniMax-M2, a model built for Max coding & agentic workflows.

2,523 202 Updated Nov 13, 2025

Contexts Optical Compression

Python 22,761 2,093 Updated Jan 27, 2026

[ICML2025 Oral] LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

Python 98 12 Updated Jul 31, 2025

Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification

Python 22 1 Updated Oct 8, 2025

Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Java 2,174 195 Updated Mar 18, 2024

Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks, this platform introduces evolving environment…

Python 463 62 Updated Mar 26, 2026

A Gym for Agentic LLMs

Python 472 31 Updated Jan 21, 2026

slime is an LLM post-training framework for RL Scaling.

Python 5,013 673 Updated Mar 27, 2026

A clean, modular SDK for building AI agents with OpenHands V1.

Python 606 192 Updated Mar 28, 2026

The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"

Python 112 2 Updated Sep 29, 2025

[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".

Python 17 Updated Feb 9, 2026
Next