Queueing Requests Queues Your Capacity Problems, Too

“Queues won’t save you” — devs roast the infinite line

TLDR: The post warns that adding a queue can turn traffic spikes into hour-long waits while server metrics look fine. Commenters clap back with “limit the line,” push bounded queues and backpressure, dunk on reflexive queuing, and even pitch LIFO—plus a surprise side-fight over alleged AI art, because of course.

A spicy post warns that adding a “please hold” line to your app doesn’t fix overloads—it hides them. The author’s shocker: during a 2x traffic hour, your server graphs look chill while users wait an hour because 3.6 million requests pile up. Even a tiny 10% over-capacity wave? That’s a 6‑minute delay and 360k people stuck in line. The community lit up like a status page on fire.

Engineers rallied behind the “stop trusting happy graphs” camp, with one commenter dropping Gil Tene’s classic talk How NOT to Measure Latency. Translation: your “p90” (90th percentile) metric can look fine while the tail users are crying. Others invoked Kingman’s formula—as you run closer to full capacity, wait times explode—and demanded bounded queues and backpressure (tech speak for telling callers “we’re full” instead of silently stacking requests).

The hottest takedown? “Stop slapping queues on everything,” argued one interviewer, saying queues only make sense when rejecting is truly expensive (think file uploads). Then came the chaos agent: another commenter floated a LIFO stack—serve the newest first—so at least some folks get instant wins. Fans call it the “VIP line”; critics called it “Hunger Games for users.”

And because it’s the internet, someone accused the post’s art of being AI. Suddenly the queue discourse had its own queue: latency math, line etiquette, and “is that image even real?”

Key Points

  • A system running at full capacity can see perceived (client) latency surge when queueing is used to handle overloads.
  • In a model with 1,000 rps capacity and a one-hour spike to 2,000 rps, the queue grows by 3.6 million requests, causing about one hour of queue delay.
  • Even a 10% overload for one hour (1,100 rps) creates a 360,000-request backlog, yielding roughly six minutes of waiting under FIFO.
  • Perceived latency includes queueing delay, whereas server latency typically measures only processing time, masking user impact.
  • Retries can worsen overload by enqueueing duplicates, while autoscaling is generally compatible with queues to help absorb spikes.

Hottest takes

“As you approach 100% utilization, waiting times explode” — andrewstuart
“Candidates that start adding queues reflexively do poorly” — avidiax
“Use a stack? LIFO… at least some people will still get quick responses” — mankyd
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.