Converting Binary Floating-Point Numbers to Shortest Decimal Strings

Tiny numbers, big fight: speed claims fly and everyone asks 'Where’s zmij'

TLDR: A new study crowns modern number-printing methods much faster than old ones but not always the shortest, while standard libraries lag. The comments hijacked the spotlight with one demand—include alleged speed champ zmij—sparking a micro-optimization vs real-world impact showdown that matters at scale for logs, data, and dashboards.

A new benchmark paper says today’s “dragon” algorithms for turning computer numbers into readable text are blazing fast—think up to 10× quicker than older methods—yet still don’t always print the shortest possible number. Translation for non-nerds: when your app prints 3.14159, there’s a race to do it faster and with fewer digits, because those extra characters and milliseconds add up across billions of numbers.

But the community’s first reaction wasn’t applause—it was: “Where’s zmij?” The top reply insists zmij is claimed to beat every contender in the paper, and folks want it in the ring. That set off classic internet drama: one camp says benchmarks without zmij are “missing the main character,” another defends the study’s scope, and a third shrugs, calling the whole thing “micro-optimizing commas” while their logs drown in digits.

Meanwhile, the paper’s spicy tidbits—some outputs are up to 30% longer than the shortest possible, and standard libraries in languages like C++ and Swift lag the speed demons—fueled more memes. Commenters joked about Dragon4 vs Dragonbox as if it’s Pokémon evolutions, and the usual Intel vs AMD vs Apple scoreboard-watchers appeared with stopwatch emojis. The vibe? Half performance nerds chanting “fewer instructions per number!” and half pragmatists asking whether anyone’s dashboard will notice. Either way, tiny digits are causing outsized drama—and the crowd wants a rematch with zmij on the card.

Key Points

  • Benchmarks compare Dragon4 with modern algorithms (Schubfach, Dragonbox) for IEEE 754 double-precision to decimal conversion.
  • Schubfach and Dragonbox achieve up to 10× speedup over Dragon4, requiring as few as ~210 instructions vs 1500–5000.
  • No surveyed implementation consistently produced the shortest possible strings; some outputs were up to 30% longer than optimal.
  • Standard libraries in languages like C++ and Swift used significantly more instructions than the fastest techniques, with gaps varying by CPU and compiler.
  • Study spans Intel, AMD, and ARM processors and compilers (GCC, Clang), introduces real-world datasets, and provides instruction-level metrics.

Hottest takes

"significantly faster than all of the tested implementations" — leni536
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.