April 10, 2026

One millisecond, endless memes

Sorting Performance Rabbit Hole

Beating the champ by a hair — and the comments go wild

TLDR: A custom sorter squeezed out a one-time, hair-thin win over C++’s famed sort, after a raft of tweaks and one pivotal threshold change. Commenters are split between cheering the photo-finish, calling it statistical noise, and debating why stable sort—the STL’s origin story—hasn’t been the star of optimization.

A lone coder dove into the “how fast can we sort a giant list” rabbit hole and came back claiming a photo-finish win over the C++ standard library. The twist? Stable sort (keeping equal items in order) was faster with a few tweaks, but the real drama hit with the regular “just go fast” sort. After a parade of experiments—copy tricks, memory moves, shell sort, even a wild detour into radix sort—most changes made things slower. Then came the big knob turn: change the cutoff where the algorithm switches to a simpler method. Boom—big gains. Final score: Pystd clocked 0.754 seconds vs. C++’s 0.755 in one run. Yes, just one run. But bragging rights are bragging rights.

The comment section? Feeding frenzy. One user dropped a history grenade, noting the Standard Template Library (STL) basically exists because its creator obsessed over a stable sort—so why does today’s C++ library seem less tuned there? Others split into camps: the “a win is a win” crew popping confetti, the “that’s noise bro” skeptics demanding 1000-run averages, and the meme lords calling it “sorting NASCAR—won by a pixel.” Jokes about “speedrunning integers any%” flew, while pragmatists rolled eyes at synthetic tests with perfect numbers. Still, everyone agreed: this tiny victory spotlighted a bigger debate—what do we optimize for, and who decides what ‘fast’ really means?

Key Points

  • Tests used 10 million shuffled 64-bit integers with identical order across algorithms.
  • Pystd’s stable sort was optimized to 0.86s, about 5% faster than std::stable_sort.
  • Attempts to speed up unstable sort via temp+shift moves, memmove, shell sort, and radix sort were slower or ineffective.
  • Tuning introsort’s insertion sort cutoff from 16 to 32 produced the largest performance gain; 64 was faster but had trade-offs.
  • Pystd’s best unstable sort time was 0.754s vs libstdc++’s 0.755s (observed once), narrowly beating it.

Hottest takes

"It’s ironic that the STL basically exists for stable sort … yet it’s apparently not had much optimization focus" — mattnewport
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.