Notes, mostly on data & agents.
Notes on data engineering, distributed systems, Rust, and — lately — AI agents. I've been writing here since 2015; some posts are reference docs I keep coming back to, others are thinking out loud.
Browse by category
Quick Thoughts
AX
Dries Buytaert's regarding AX (Agent Experience): In the era of AI coding agents, software adoption depends on how easily these agents can "consume" your project. AI agents prioritize the cheapest path to completion; if your software is difficult to use, they will simply reject it without notice. To ensure your project is chosen, you must reduce three specific costs: Friction: The effort required to get the system running (installation, setup, environment configuration, and access/credentials). Abstraction: The effort required to understand what to do next (providing clear recipes, templates, scaffolds, and logical defaults). Verification: The effort required to confirm the work was successful (reliable tests, clear error messages, and inspectable system states). What helps AI agents (AX) also helps human developers (DX). By making your project easy for an agent to install, modify, and verify, you make it better for everyone.
Dynamic Workflows with GLM 5.1
I am using more a lot of differrent models with Claude Code, this here is reazied We can also asking Claude to customize model and budget for each step in workflows.
MAI-Thinking-1, MAI-Code-1-Flash
| Benchmark | MAI-Thinking-1 | MAI-Code-1-Flash | Claude Haiku 4.5 | Sonnet 4.6 | Opus 4.6 | Kimi K2.6 | GLM-5.1 | | ---------------------- | -----------------: | -------------------: | --------------------------: | -------------------------: | -------------------------: | -----------------------------: | ---------------------------------------: | | SWE-Bench Pro | 52.8 | 51.2 | 35.2 | - | 53.4 | 58.6 | 58.4 | | SWE-Bench Verified | 73.5 | 71.6 | 66.6 | 79.6 | 80.8 | 80.2 | - | | SWE-Bench Multilingual | - | 65.5 | 62.7 | 75.9 | 77.8 | 76.7 | - | | AIME 2025 | 97.0 | - | 80.7 no tools / 96.3 Python | 95.6 | 99.8 | - | - | | AIME 2026 | 94.5 | 92.5 | 83.3 | - | - | 96.4 | 95.3 | | HMMT Feb 2026 | 84.9 | - | - | - | - | 92.7 | 82.6 | | GPQA Diamond | 84.2 | 84.6 | 73.2 | 89.9 | 91.3 | 90.5 | 86.2 | | LCB v6 | 87.7 | - | - | - | - | 89.6 | - | | AMO Bench | - | 40.0 | 16.0 | - | - | - | - | | Frontier Math | - | 6.3 | 2.8 | - | - | - | - | | HLE | - | 18.0 | 9.5 | 33.2 no tools / 49.0 tools | 40.0 no tools / 53.0 tools | 34.7 no tools / 54.0 tools | 31.0 no tools / 52.3 tools | | Frontier Science | - | 58.2 | 42.3 | - | - | - | - | | Artifacts Bench | - | 36.4 | 36.6 | - | - | - | - | | Terminal-Bench 2.0 | 46.0 | 54.8 | 41.6 | 59.1 | 65.4 | 66.7 | 63.5 Terminus-2 / 69.0 self-reported | | IF Bench | - | 75.0 | 46.1 | - | - | - | - | | Advanced IF | - | 71.4 | 56.9 | - | - | - | - | | Robust IF Bench | - | 61.2 | 45.0 | - | - | - | - | | τ²-Bench | - | 71.7 | 54.7 | - | - | - | - | Sources Microsoft AI, MAI-Thinking-1 technical report, for MAI-Thinking-1 scores and several comparison values. (Microsoft AI) Microsoft AI, Introducing MAI-Thinking-1, for headline AIME 2025 and AIME 2026 values. (Microsoft AI Microsoft AI, Introducing MAI-Code-1-Flash, for MAI-Code-1-Flash vs Claude Haiku 4.5 benchmark values. (Microsoft AI) Anthropic, Claude Haiku 4.5 announcement, for official Haiku context and Anthropic-reported SWE-Bench Verified score caveat. (Anthropic) Anthropic, Claude Sonnet 4.6 System Card, for Sonnet 4.6 SWE-Bench Verified and SWE-Bench Multilingual values. (Anthropic) Anthropic, Claude Opus 4.6 System Card, for Opus 4.6 SWE-Bench Verified and SWE-Bench Multilingual values. (Anthropic) Moonshot AI, Kimi K2.6 Tech Blog / Model Card, for Kimi K2.6 benchmark values. (Kimi) Z.ai, GLM-5.1: Towards Long-Horizon Tasks, and ModelScope GLM-5.1 card, for GLM-5.1 values. (Z.ai) Terminal-Bench official leaderboard, for Terminal-Bench 2.0 reference leaderboard context. (tbench.ai)