April 10, 2026
One millisecond, endless memes
Sorting Performance Rabbit Hole
Beating the champ by a hair — and the comments go wild
TLDR: A custom sorter squeezed out a one-time, hair-thin win over C++’s famed sort, after a raft of tweaks and one pivotal threshold change. Commenters are split between cheering the photo-finish, calling it statistical noise, and debating why stable sort—the STL’s origin story—hasn’t been the star of optimization.
A lone coder dove into the “how fast can we sort a giant list” rabbit hole and came back claiming a photo-finish win over the C++ standard library. The twist? Stable sort (keeping equal items in order) was faster with a few tweaks, but the real drama hit with the regular “just go fast” sort. After a parade of experiments—copy tricks, memory moves, shell sort, even a wild detour into radix sort—most changes made things slower. Then came the big knob turn: change the cutoff where the algorithm switches to a simpler method. Boom—big gains. Final score: Pystd clocked 0.754 seconds vs. C++’s 0.755 in one run. Yes, just one run. But bragging rights are bragging rights.
The comment section? Feeding frenzy. One user dropped a history grenade, noting the Standard Template Library (STL) basically exists because its creator obsessed over a stable sort—so why does today’s C++ library seem less tuned there? Others split into camps: the “a win is a win” crew popping confetti, the “that’s noise bro” skeptics demanding 1000-run averages, and the meme lords calling it “sorting NASCAR—won by a pixel.” Jokes about “speedrunning integers any%” flew, while pragmatists rolled eyes at synthetic tests with perfect numbers. Still, everyone agreed: this tiny victory spotlighted a bigger debate—what do we optimize for, and who decides what ‘fast’ really means?
Key Points
- •Tests used 10 million shuffled 64-bit integers with identical order across algorithms.
- •Pystd’s stable sort was optimized to 0.86s, about 5% faster than std::stable_sort.
- •Attempts to speed up unstable sort via temp+shift moves, memmove, shell sort, and radix sort were slower or ineffective.
- •Tuning introsort’s insertion sort cutoff from 16 to 32 produced the largest performance gain; 64 was faster but had trade-offs.
- •Pystd’s best unstable sort time was 0.754s vs libstdc++’s 0.755s (observed once), narrowly beating it.