Blog | AI21

Jun 4, 2026

First scale, then enrich: How the right execution strategy helped us reach state-of-the-art on SWE-rebench

In brief We present a new state-of-the-art result on the SWE-rebench benchmark: a 60.9% issue resolve rate for 123 issues…

May 13, 2026

Reproducing Variance: Caching in Agentic LLM Pipelines

Apr 28, 2026

Reaching SOTA Performance Without Breaking the Bank

All That Glitters: When "Gold-Like" Answers Mask Functional Failures on Coding Agent Benchmarks

Apr 14, 2026

All that glitters: When “gold-like” answers mask functional failures on coding agent benchmarks

Engineering the subconscious: Why Claude Code isn't enough to build AI systems

Apr 5, 2026

Engineering the subconscious: Why Claude Code isn’t enough to build AI systems

Stride and Prejudice: How a 32-bit overflow corrupted a CUDA kernel (and stayed hidden for weeks)

Mar 25, 2026

Stride and prejudice: How a 32-bit overflow corrupted a CUDA kernel (and stayed hidden for weeks)

Mar 17, 2026

Mind the gap: What separates demo agents from production systems

Where enterprise AI deployments actually get stuck

Mar 10, 2026

Where enterprise AI deployments actually get stuck

Feb 26, 2026

Modular intelligence: a human-like model for agent orchestration

Feb 11, 2026

Reducing LLM training waste with model-agnostic padding minimization

Feb 5, 2026

Go big or go OOM: the art of scaling vLLM

Jan 29, 2026

One token to corrupt them all: a vLLM debugging tale

Jan 29, 2026

Chunk size is query-dependent: a simple multi-scale approach to RAG retrieval

1 2 3 … 11