- New York, New York
Lists (1)
Sort Name ascending (A-Z)
Stars
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
OpenTinker is an RL-as-a-Service infrastructure for foundation models
MiniMax-M2, a model built for Max coding & agentic workflows.
This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix some of the annoying things you get from only using Claude cod…
Accelerating MoE with IO and Tile-aware Optimizations
ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)
MoE training for Me and You and maybe other people
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
My learning notes for ML SYS.
Developer Asset Hub for NVIDIA Nemotron — A one-stop resource for training recipes, usage cookbooks, and full end-to-end reference examples to build with Nemotron models
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Open-source release accompanying Gao et al. 2025
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
Repository for getting started with the OfficeQA Benchmark.
ThetaEvolve: Test-time Learning on Open Problems, enabling RL training on AlphaEvolve/OpenEvolve and emphasizing scaling test-time compute
Evolve your language agent with Agentic Context Engineering (ACE)
A live benchmark and evaluation framework for open-ended deep research in the wild.
A benchmark for LLMs on complicated tasks in the terminal
A simple memory system for claude code
Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks, this platform introduces evolving environment…
CausalPFN: Amortized Causal Effect Estimation via In-Context Learning