June 27, 2026
Code by AI, roasted by humans
Task Failed Successfully: Saturating NIC and Disk Bandwidth
AI maxed out the machine, but the comments roasted the human and the hype
TLDR: A programmer used AI to speed up a data-heavy system, but the machine only hit half its expected pace and the AI’s reason for the fix was wrong. In the comments, readers split between impressed and deeply skeptical, with some roasting the AI-coding flex and others saying better profiling would’ve solved it sooner.
A programmer proudly declared he now writes “not a single line” of code at work because artificial intelligence does it for him — and then immediately stumbled into a glorious mess where the system only reached about half its promised speed. The setup was supposed to be simple: read data from super-fast drives and blast it across the network at full speed. Instead, the machine hit a wall early, the processor got slammed, and the real plot twist was that the AI’s explanation for the “fix” was apparently just... wrong. In other words, the task failed successfully, and the internet smelled blood.
The comments quickly turned into a mini trial about AI coding bravado. One critic basically called the whole thing suspicious, mocking the “NOT A SINGLE LINE!!” flex and questioning whether the testing was solid at all. Another commenter was much less theatrical but more cutting: stop guessing, use the profiler properly, and the answer would have shown up much sooner. That set off the classic nerd drama of “AI magic” vs “old-school performance debugging” — with one side impressed the system was pushed so far, and the other side eye-rolling at what they saw as hype and missing rigor.
There was also some classic comment-section energy: the author jumped in to say yes, this was a full-on rabbit hole, yes, the obvious fix only solved the small demo, and no, the bigger deployment still refused to behave. Then a helpful hardware wizard slid in with a spicy “have you tried sending data directly?” suggestion, proving that even in a pile-on, there’s always one person trying to unlock the secret boss level. It’s part cautionary tale, part AI-age comedy: the machine went fast, the explanation went off the rails, and the comments absolutely went to work.
Key Points
- •The article describes a simplified benchmark that reads 1 MiB blocks from eight NVMe drives and sends them over RDMA to saturate a 400 Gb/s NIC.
- •The demo is configured to minimize interference, with all devices on the same NUMA node, a pinned worker thread, and IOMMU passthrough mode.
- •Benchmark results show the system bottlenecks at an I/O depth of 16, reaching only about half of NIC bandwidth while CPU usage hits 100%.
- •Profiling with perf shows that io_submit_sqes consumes 81.62% of total CPU cost at the tested depth.
- •The article identifies direct-I/O kernel overhead related to DMA metadata construction and page handling as the dominant source of CPU time.