How much do amd64 microarchitecture levels help in Go?

Go gets a shock speed boost, but the comments are fighting over why it took this long

TLDR: A Go program got a big speed boost on some tasks simply by being compiled for newer processors, with one test improving by 43% and no code changes required. Commenters then split into familiar camps: “why isn’t this the default yet?” versus “this is messy, fragmented, and not really a Go issue at all.”

A deceptively nerdy benchmark post turned into a full-on comment-section cage match over a very simple question: if you tell Go to target newer chips instead of ancient 2003-era defaults, do programs get faster? Short answer: yes, sometimes a lot. In one standout test, a common bit-counting task got about 43% faster just by flipping a compiler setting, no code rewrite needed. That alone had readers doing the classic internet double take: wait, we’ve been leaving that much speed on the table?

But the real fireworks were in the reactions. One camp basically yelled, why is this not the default already? User kristianp was stunned Go isn’t assuming support for newer mainstream chips by now, while others immediately dragged the conversation into the messy reality of modern processors: just because a feature exists doesn’t mean every machine has it. That kicked off a mini-drama over whether the current “levels” system is already outdated and whether a smarter “detect what this CPU can do at runtime” approach, like Rust experiments, would save users from playing hardware roulette.

Then came the philosophical hot takes. jeffrallen called the results one of the clearest examples of diminishing returns ever, basically turning a benchmark thread into a shower-thought about the laws of the universe. And pjmlp dropped the icy drive-by: “Nothing, because this is a compiler question, not a language one.” In other words, even when the speedups are real, the comments still found a way to argue about semantics. Classic internet.

Key Points

  • Go’s default amd64 target is the conservative v1 baseline, which prioritizes broad compatibility over newer CPU instructions.
  • The article explains the x86-64 microarchitecture ladder used by Go: v1 (SSE2), v2 (popcnt and SSE4.2), v3 (AVX2), and v4 (AVX-512 subsets).
  • The benchmark subject is Roaring Bitmaps, a compressed bitset library that stores data in array, bitmap, or run containers depending on density.
  • Benchmarks were run on an Intel Xeon Gold 6548N (Emerald Rapids) using Go 1.26.2 and Roaring v2.18.2, with eight samples for each of four microarchitecture levels.
  • The article reports that enabling v2 cuts popcount time by 43% versus v1 because the compiler can use the hardware popcnt instruction, while v3 and v4 do not further improve that specific operation.

Hottest takes

"one of the clearest example of diminishing returns I've ever seen" — jeffrallen
"I'm surprised that Go doesn't default to AVX2 support by now" — kristianp
"Nothing, because this is a compiler question, not a language one" — pjmlp
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.