- New York, New York
Lists (1)
Sort Name ascending (A-Z)
Stars
Repository for "Training Language Models To Explain Their Own Computations"
A benchmark that challenges language models to code solutions for scientific problems
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
Measuring how well CLI agents can post-train LLMs
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
OpenTinker is an RL-as-a-Service infrastructure for foundation models
MiniMax-M2, a model built for Max coding & agentic workflows.
This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix some of the annoying things you get from only using Claude cod…
Accelerating MoE with IO and Tile-aware Optimizations
ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)
MoE training for Me and You and maybe other people
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
My learning notes for ML SYS.
Developer Asset Hub for NVIDIA Nemotron — A one-stop resource for training recipes, usage cookbooks, and full end-to-end reference examples to build with Nemotron models
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Open-source release accompanying Gao et al. 2025
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
Repository for getting started with the OfficeQA Benchmark.
ThetaEvolve: Test-time Learning on Open Problems, enabling RL training on AlphaEvolve/OpenEvolve and emphasizing scaling test-time compute
Evolve your language agent with Agentic Context Engineering (ACE)
A live benchmark and evaluation framework for open-ended deep research in the wild.