March 16, 2026
Speed freaks vs math nerds
Even Faster Asin() Was Staring Right at Me
A tiny math rewrite makes arcsine fly—Reddit cheers, GPU folks say “not so fast”
TLDR: A clever rewrite of the arcsine math lets CPUs do parts in parallel, delivering up to ~1.9x faster results in tests. Commenters cheered the speed, squabbled over CPU vs GPU benefits, corrected a few misconceptions about Estrin’s method, and joked about mixing up arcsine with arctangent—because of course they did.
The author rewrote a tiny piece of the arcsine (asin) math and—boom—benchmarks show up to ~1.9x speed-ups on some setups. By reshuffling the equation with “Estrin’s Scheme” (think: breaking a math problem into parts your computer can do at the same time), the code lets modern CPUs sprint. The tests span Intel, AMD, and Apple chips, and while MSVC on Windows went wild (up to 1.88x faster), GCC on Windows barely budged, setting off a fresh round of compiler drama.
But the real party is in the comments. One user confidently chimed in, “I think it is atan,” basically confusing arcsine with arctangent, and joked that “sin is almost a lookup” — the thread collectively did a spit-take. Another commenter asked about Estrin’s trick, then immediately corrected themselves, admitting it doesn’t actually cut the number of multiplications, it just parallelizes them better. Meanwhile, a GPU-savvy voice threw cold water on the hype, noting that GPUs don’t really benefit from this kind of instruction-level parallelism the way CPUs do. Translation: fast for CPUs, not necessarily a win for graphics chips.
Between the “Gotta Go Fast” memes and the CPU vs GPU turf war, the vibe is deliciously nerdy. And yes, someone linked the previous HN thread so everyone could relive the origin story.
Key Points
- •The asin polynomial was restructured using Estrin’s scheme to reduce dependency chains and increase ILP versus Horner’s method.
- •The new formulation computes two independent linear terms and reuses x^2, enabling parallel execution on out-of-order CPUs.
- •Microbenchmarks ran 10,000,000 asin calls per run, 250 runs per chip/OS/compiler configuration, with std::asin as baseline.
- •On Intel i7-10750H, Estrin-based asin achieved up to ~1.80–1.88x speedup over std::asin depending on compiler/OS.
- •On AMD Ryzen 9 with GCC on Linux, the Estrin variant showed a ~1.44x speedup over std::asin, slightly ahead of the prior custom version.