July 1, 2026
Push comes to shove
A deep dive into SmallVector:push_back
Coders are dragging compilers after a tiny tweak made a hot button task way faster
TLDR: A small rewrite made a very common LLVM list operation faster by keeping the usual case simple and kicking the rare slow case aside. Commenters turned it into a mini-drama about whether compilers should be smarter, or whether programmers should stop being lazy and prepare better.
A seemingly tiny change to one of the most-used pieces of the LLVM toolbox has turned into prime comment-section theater. The blog post from MaskRay breaks down how a very common action — adding one more item to a list — was doing extra work when it didn’t need to. The fix? Shove the rare, slower case off to the side so the normal path stays clean and quick. In plain English: a small rewrite cut away needless baggage and made a super-common operation faster.
But the real fireworks were in the reactions. One of the loudest takes came from dzaima, who basically declared the current state of compiler optimization embarrassing, calling out GCC and especially Clang for not pushing more work into the less common paths. That kicked off the familiar “are compilers amazing, or are they secretly held together by vibes?” energy. Another camp, led by im3w1l, turned the spotlight back on programmers: if you already know you’re going to add lots of items, why not reserve space ahead of time? That sparked the classic low-level blame game — tool problem or user problem?
And then came the philosophical mic drop from someonebaggy: compilers are a leaky abstraction, meaning the magic only looks magical until you peek underneath. That line felt like the thread’s unofficial meme. The mood was equal parts impressed, annoyed, and weirdly delighted: impressed that such a tiny rewrite matters, annoyed that the tools didn’t do it themselves, and delighted because nothing gets systems programmers more animated than discovering a machine did seven steps when it could’ve done four.
Key Points
- •The article analyzes inefficiencies in generated assembly for `SmallVector::push_back` when the no-grow and grow paths share a common store block.
- •In the original code shape, preserving `this` and the pushed value across a `grow_pod` call forces use of callee-saved registers, adding prologue and epilogue overhead on the fast path.
- •The post shows that shrink wrapping cannot fix this pattern because it can move save/restore operations but does not duplicate blocks.
- •The proposed optimization moves grow-and-store logic into a separate `growAndPushBack` function marked noinline and tail-calls it from `push_back` on capacity exhaustion.
- •After the change, the fast path assembly is reduced to a simple capacity check, store, size increment, and return, totaling seven instructions with a tail call to the slow path.