Tracking down a 25% Regression on LLVM RISC-V

Dev kills 25% slowdown on RISC‑V; cheers, side‑eye, and 'who pays?'

TLDR: A compiler tweak accidentally made RISC‑V chips use slower “big math,” causing a 24% slowdown—then a patch fixed it and closed the gap to GCC. The comments split between applause for the sleuthing, jokes about tiny charts in Firefox, and a heated debate over unpaid open‑source labor.

A sneaky compiler change made some RISC‑V chips do their math the long way, slowing a benchmark by about 24%—until one engineer tracked it down and fixed it. Translation: the code started using the “big calculator” (double‑precision math) instead of the “small calculator” (single‑precision), and that extra heft cost time. With the fix in, the gap to rival compiler GCC vanished, and the thread lit up.

The loudest take? “This really shouldn’t be free work.” One commenter fumed that mega‑companies profit while individuals donate brainpower, turning a technical win into an open‑source pay debate. On the other side, RISC‑V fans dialed up the hype, saying this is proof the software is getting fast before the hardware flood—“warms my heart to see RISC‑V optimizations,” beamed one. Meanwhile, a totally off‑topic zinger stole laughs: someone complained the blog’s charts were tiny in Firefox—“SVGs on Firefox are broken”—sparkling a mini‑meme about a visual regression overshadowing the performance regression.

Amid the drama, newbies got a friendly explainer: LLVM (one popular compiler) briefly chose slower double‑math inside a loop; the patch nudged it back to faster float‑math. The Igalia perf chart shows the win. Verdict: a nerdy bug hunt turned feel‑good fix—plus a spicy argument about who should foot the bill.

Key Points

  • A recent LLVM change to isKnownExactCastIntToFP folded fpext(sitofp x to float) to double into uitofp x to double, disabling a downstream narrowing in visitFPTrunc.
  • On RISC-V (SiFive P550), this led LLVM to emit fdiv.d (33-cycle latency) instead of fdiv.s (19-cycle), causing about a 24% regression on this benchmark.
  • The author extended getMinimumFPType with range analysis to reduce fptrunc(uitofp x to double) to uitofp x to float, restoring the float-narrowing optimization.
  • Benchmark analysis using Igalia’s dashboard and llvm-mca confirmed that prior LLVM builds used fdiv.s in the loop, while recent builds regressed to fdiv.d.
  • The SiFive P550’s out-of-order execution explains instruction reordering, but the core issue was the missed narrowing due to cast folding; the landed patch eliminates the gap to GCC for this benchmark.

Hottest takes

"This really shouldn't be free work." — brcmthrowaway
"SVGs on Firefox are broken" — szmarczak
"warms my heart to see RISC-V optimizations" — LeFantome
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.