March 24, 2026
Fast path, then faceplant
io_uring, libaio performance across Linux kernels and an unexpected IOMMU trap
io_uring races ahead, but a default IOMMU switch triggers 30% slowdown outrage
TLDR: Benchmarks show io_uring is much faster than the old method and gets even better on newer Linux, but a default security feature (IOMMU) caused a 30% slowdown until disabled. Comments split between speed seekers, security worriers, and skeptics asking for broader tests and deeper data—highlighting how defaults can make or break upgrades.
The benchmarking bombshell: YDB tested Linux’s old async disk I/O (libaio) versus the new hotness (io_uring) across kernels from 5.4 to 7.0, and the community immediately zeroed in on the twist. Yes, io_uring—the newer way to do fast disk requests—was up to 2x faster and even 1.4x quicker on newer kernels. But a surprise 30% slowdown popped up in some setups, traced to IOMMU—a device memory safety feature—being turned on by default in newer releases. Cue the drama.
Author eivanov89 dropped the mic with that finding, and the thread erupted into a speed vs safety face-off. One camp shouted “tune it, don’t doom it,” cheering the performance gains and treating IOMMU like a sneaky new-feature tax. Another camp, led by skavi, asked the obvious: what protection do you lose if you switch it off? Meanwhile, hcpp questioned the whole premise—why 4K random writes, and would sequential workloads tell a different story?
The performance sleuths showed up too: tanelpoder probed interrupts and latency, hinting that IOMMU’s extra bookkeeping might be the hidden speed bump when the system relies on interrupts instead of busy-wait polling. The memes wrote themselves: “It’s fast if you turn everything off,” “IOMMU = I Owe My Mega-drops.” Bottom line: io_uring still rules, but kernel defaults matter, and a single toggle can turn victory laps into speed traps.
Key Points
- •YDB benchmarked libaio vs io_uring across Linux kernels 5.4 to 7.0-rc3 using fio on NVMe storage.
- •io_uring delivered up to 2x higher IOPS than libaio; best io_uring configs improved ~1.4x on newer kernels.
- •A ~30% IOPS drop for libaio and non-SQPOLL io_uring between 5.4 and 5.15 was observed.
- •The performance drop was traced to Intel IOMMU being enabled by default in newer kernels, not a kernel regression.
- •Final results used intel_iommu=off and NVMe poll_queues=16; tests were tightly controlled (NUMA/cgroups/fio params).