Thank you

Written by me, proof-read by an LLM.
Details at end.

It’s the 25th! Whatever you celebrate this time of year, I wish you the very best and hope you are having a lovely day. For me, this is a family time: I’m not at all religious but was brought up to celebrate Christmas. So, today we’ll be cooking a massive roast dinner and enjoying family time1.

This series was an idea I had around this time last year, and it has been a substantial amount of work. I’ve really enjoyed writing it, and seeing the impact it has had on the compiled language community. I realise now in retrospect I exclusively used C and C++2, and concentrated on x86 a bit too much. If I do this again, I’ll try and widen my horizons!

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 25th December 2025.

When compilers surprise you

Written by me, proof-read by an LLM.
Details at end.

Every now and then a compiler will surprise me with a really smart trick. When I first saw this optimisation I could hardly believe it. I was looking at loop optimisation, and wrote something like this simple function that sums all the numbers up to a given value:

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 24th December 2025.

Switching it up a bit

Written by me, proof-read by an LLM.
Details at end.

The standard wisdom is that switch statements compile to jump tables. And they do - when the compiler can’t find something cleverer to do instead.

Let’s start with a really simple example:

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 23rd December 2025.

Clever memory tricks

Written by me, proof-read by an LLM.
Details at end.

After exploring SIMD vectorisation over the last couple of days, let’s shift gears to look at another class of compiler cleverness: memory access patterns. String comparisons seem straightforward enough - check the length, compare the bytes, done. But watch what Clang does when comparing against compile-time constants, and you’ll see some rather clever tricks involving overlapping memory reads and bitwise operations. What looks like it should be a call to memcmp becomes a handful of inline instructions that exploit the fact that the comparison value is known at compile time1.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 22nd December 2025.

When SIMD Fails: Floating Point Associativity

Written by me, proof-read by an LLM.
Details at end.

Yesterday we saw SIMD work beautifully with integers. But floating point has a surprise in store. Let’s try summing an array1:

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 21st December 2025.

SIMD City: Auto-vectorisation

Written by me, proof-read by an LLM.
Details at end.

It’s time to look at one of the most sophisticated optimisations compilers can do: autovectorisation. Most “big data” style problems boil down to “do this maths to huge arrays”, and the limiting factor isn’t the maths itself, but the feeding of instructions to the CPU, along with the data it needs.

To help with this problem, CPU designers came up with SIMD: “Single Instruction, Multiple Data”. One instruction tells the CPU what to do with a whole chunk of data. These chunks could be 2, 4, 8, 16 or similar units of integers or floating point values, all treated individually. Initially1 the only way to use this capability was to write assembly language directly, but luckily for us, compilers are now able to help.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 20th December 2025.

Chasing your tail

Written by me, proof-read by an LLM.
Details at end.

Inlining is fantastic, as we’ve seen recently. There’s a place it surely can’t help though: recursion! If we call our own function, then surely we can’t inline…

Let’s see what the compiler does with the classic recursive “greatest common divisor” routine - surely it can’t avoid calling itself?

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 19th December 2025.

Partial inlining

Written by me, proof-read by an LLM.
Details at end.

We’ve learned how important inlining is to optimisation, but also that it might sometimes cause code bloat. Inlining doesn’t have to be all-or-nothing!

Let’s look at a simple function that has a fast path and slow path; and then see how the compiler handles it1.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 18th December 2025.

About Matt Godbolt

Matt Godbolt is a C++ developer living in Chicago. He works for Hudson River Trading on super fun but secret things. He is one half of the Two's Complement podcast. Follow him on Mastodon or Bluesky.