io_uring, libaio performance across Linux kernels and an unexpected IOMMU trap

io_uring races ahead, but a default IOMMU switch triggers 30% slowdown outrage

TLDR: Benchmarks show io_uring is much faster than the old method and gets even better on newer Linux, but a default security feature (IOMMU) caused a 30% slowdown until disabled. Comments split between speed seekers, security worriers, and skeptics asking for broader tests and deeper data—highlighting how defaults can make or break upgrades.

The benchmarking bombshell: YDB tested Linux’s old async disk I/O (libaio) versus the new hotness (io_uring) across kernels from 5.4 to 7.0, and the community immediately zeroed in on the twist. Yes, io_uring—the newer way to do fast disk requests—was up to 2x faster and even 1.4x quicker on newer kernels. But a surprise 30% slowdown popped up in some setups, traced to IOMMU—a device memory safety feature—being turned on by default in newer releases. Cue the drama.

Author eivanov89 dropped the mic with that finding, and the thread erupted into a speed vs safety face-off. One camp shouted “tune it, don’t doom it,” cheering the performance gains and treating IOMMU like a sneaky new-feature tax. Another camp, led by skavi, asked the obvious: what protection do you lose if you switch it off? Meanwhile, hcpp questioned the whole premise—why 4K random writes, and would sequential workloads tell a different story?

The performance sleuths showed up too: tanelpoder probed interrupts and latency, hinting that IOMMU’s extra bookkeeping might be the hidden speed bump when the system relies on interrupts instead of busy-wait polling. The memes wrote themselves: “It’s fast if you turn everything off,” “IOMMU = I Owe My Mega-drops.” Bottom line: io_uring still rules, but kernel defaults matter, and a single toggle can turn victory laps into speed traps.

Key Points

  • YDB benchmarked libaio vs io_uring across Linux kernels 5.4 to 7.0-rc3 using fio on NVMe storage.
  • io_uring delivered up to 2x higher IOPS than libaio; best io_uring configs improved ~1.4x on newer kernels.
  • A ~30% IOPS drop for libaio and non-SQPOLL io_uring between 5.4 and 5.15 was observed.
  • The performance drop was traced to Intel IOMMU being enabled by default in newer kernels, not a kernel regression.
  • Final results used intel_iommu=off and NVMe poll_queues=16; tests were tightly controlled (NUMA/cgroups/fio params).

Hottest takes

"The most surprising part wasn’t io_uring gains (~2x), but a ~30% regression caused by IOMMU being enabled by default between releases." — eivanov89
"Why was 4K random write chosen as the main workload, and would the conclusion change with sequential I/O?" — hcpp
"what was the security situation of whatever is now being protected by the IOMMU before it was enabled by default?" — skavi
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.