Skip to content
View jbarnes850's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report jbarnes850

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Python 2,409 356 Updated Dec 23, 2025

OpenTinker is an RL-as-a-Service infrastructure for foundation models

Python 272 19 Updated Dec 23, 2025

MiniMax-M2, a model built for Max coding & agentic workflows.

2,067 157 Updated Nov 13, 2025

This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix some of the annoying things you get from only using Claude cod…

Jupyter Notebook 75 4 Updated Dec 19, 2025

Accelerating MoE with IO and Tile-aware Optimizations

Python 440 26 Updated Dec 23, 2025

ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)

Python 261 29 Updated Dec 23, 2025

MoE training for Me and You and maybe other people

Python 290 25 Updated Dec 17, 2025

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,203 186 Updated Dec 23, 2025
TypeScript 2 Updated Dec 16, 2025
Python 18 2 Updated Dec 23, 2025

My learning notes for ML SYS.

Python 4,766 303 Updated Dec 22, 2025

Build RL environments for LLM training

Python 522 32 Updated Dec 23, 2025

Developer Asset Hub for NVIDIA Nemotron — A one-stop resource for training recipes, usage cookbooks, and full end-to-end reference examples to build with Nemotron models

Jupyter Notebook 229 40 Updated Dec 19, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Python 25,847 1,816 Updated Oct 13, 2025

Open-source release accompanying Gao et al. 2025

Python 464 47 Updated Dec 11, 2025

PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

Python 219 5 Updated Dec 10, 2025

Repository for getting started with the OfficeQA Benchmark.

Python 39 4 Updated Dec 18, 2025

ThetaEvolve: Test-time Learning on Open Problems, enabling RL training on AlphaEvolve/OpenEvolve and emphasizing scaling test-time compute

Python 80 7 Updated Dec 8, 2025

Evolve your language agent with Agentic Context Engineering (ACE)

Python 435 48 Updated Nov 18, 2025

A live benchmark and evaluation framework for open-ended deep research in the wild.

Python 100 10 Updated Nov 13, 2025
Python 616 59 Updated Dec 23, 2025

Causal RL Environments Simulator

Python 18 2 Updated Dec 20, 2025

A benchmark for LLMs on complicated tasks in the terminal

Python 1,243 439 Updated Dec 20, 2025

A simple memory system for claude code

Shell 221 17 Updated Dec 17, 2025

Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks, this platform introduces evolving environment…

Python 404 55 Updated Nov 17, 2025

CausalPFN: Amortized Causal Effect Estimation via In-Context Learning

Jupyter Notebook 84 10 Updated Dec 8, 2025
Next