December 27, 2025
When math meets the dupe police
Faster Practical Modular Inversion
Rust dev shows a speed boost — cue dupe police and compiler drama
TLDR: A Rust implementation promises a faster way to “invert” numbers used in crypto, but benchmarks vary wildly by compiler and CPU. Commenters split between dupe-calling, excitement over the math-to-hardware deep dive, and a heated debate about compilers’ undefined behavior—why your speed win might vanish in real code.
A Rust developer just dropped a faster way to do the “undo button” of number math (used in cryptography), claiming 1.3–2x gains over the classic method. The code is on GitHub and it riffs on an older algorithm, promising practical speedups without wizardry. But the community? They immediately turned it into a spectacle. First on stage: the dupe police, with a brisk “dupe” link and a buzzer noise in everyone’s head. Then the compiler wars rolled in. One reader praised how the post moves from math to microchips, but the hottest spark was over undefined behavior — when compilers can basically do anything. The quote that set keyboards on fire: GCC says a certain built-in function on zero has an “undefined result,” LLVM calls it “undefined behavior.” Translation: your code might go boom, or worse, look fast by accident. Meanwhile, the author disses wild benchmark swings across CPUs and compilers, which only stirred more popcorn: is it faster or just lucky? Fans of the math loved the clear walkthrough; skeptics said, “Show me assembly or it didn’t happen.” It’s speed claims versus reality, with bonus memes about “undefined behavior” meaning “UnBelievable Behavior.” Who knew greatest common divisors could divide the room this hard
Key Points
- •The author presents an optimized binary (Stein’s) extended Euclidean algorithm for faster modular inversion.
- •Underlying ideas derive from a 2020 paper by Thomas Pornin focused on constant-time evaluation and long arithmetic.
- •A Rust implementation is available in the mod2k library on GitHub, with focus on modular inversion; linear Diophantine equations are briefly noted.
- •Measured speedups are about 1.3–2× over a textbook implementation on average, including on Cortex-M4, but results may vary.
- •Benchmark outcomes are highly sensitive to compiler choice and version, optimization flags, and CPU microarchitecture; assembly inspection is recommended.