When O3 is 2x slower than O2

Rust “fast mode” runs slower—blame old CPUs or weird code

TLDR: A Rust benchmark showed the “faster” O3 setting ran over twice as slow as O2, likely worsened by targeting an older Haswell CPU. Commenters split between blaming outdated hardware choices, dunking on Rust’s syntax, and joking about ozone versus oxygen—highlighting how tuning and context can flip performance expectations.

A Rust dev ran a simple test and dropped a mic: switching from O2 to O3 (compiler “faster mode”) made their code more than twice as slow. Cue chaos in the comments. One camp came in hot with the hardware angle: targeting Haswell (an older Intel chip) is the real villain, with users arguing that aggressive O3 tweaks tied to a 15-year-old CPU will backfire on modern machines. Another camp didn’t even get past the code—one commenter declared Rust’s syntax “chaotic,” turning a performance post into a style roast. And then the comic relief: a glorious misunderstanding where someone thought O3 and O2 were about ozone and oxygen, sparking a mini-meme thread about gas flow instead of code flow.

Under the hood, the author’s test inserts items into a small, sorted list and compares floating-point numbers—tricky business that can confuse optimizers. Profiling showed the comparison function and the search routine eating way more time under O3, but the author cautioned this could be a synthetic benchmark quirk. The crowd? Split between “don’t lock to ancient CPUs,” “Rust syntax hurts my eyes,” and “lol ozone.” It’s equal parts performance autopsy and comment-section circus—exactly the internet’s favorite flavor of tech drama.

Key Points

  • Opt-level=3 (O3) caused a +123% performance regression versus opt-level=2 (O2) in a Rust bounded priority queue benchmark targeting Haswell.
  • Measured insert times were ~963 ns (O2) and ~2.154 µs (O3) using Criterion and cargo bench.
  • The implementation uses a sorted Vec and binary_search_by with a float-first, id-second comparison function; uniqueness constraints made a binary heap unsuitable.
  • Flamegraph profiling showed binary_search_by’s sample share increasing from 44.15% (O2) to 79.62% (O3), and the compare function from 25.88% to 63.57%.
  • Benchmarks were run on AMD Ryzen 7 5800X (Zen 3) and Intel Core i7-4790 (Haswell), with similar regression behavior; setup used RUSTFLAGS with target-cpu=haswell.

Hottest takes

“I seriously dislike the syntax of Rust” — johnisgood
“Ozone would have greater friction getting through small pores” — cat_plus_plus
“tying yourself to an antique architecture gives you very bad results” — pclmulqdq
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.