Client-side load balancing at a million requests per second

Why this shopping giant stopped blaming the hallway and built its own fast lane

TLDR: Zalando rebuilt part of its shopping system so internal requests no longer depend on a shared traffic router, making the service faster, cheaper, and easier to troubleshoot. In the comments, the mood is impressed and battle-scarred: people love that the team finally took control of a long-running source of blame and mystery.

Zalando’s engineers just dropped a very nerdy flex with very real stakes: their shopping system was getting slowed by a shared middleman, so they built an in-app traffic cop to send requests directly and keep product pages snappy even during chaos. In plain English, one customer action could trigger up to 100 internal lookups, and every one of those used to pass through the same communal router. As the post puts it, “when Skipper sneezed, PRAPI got the flu”—which is exactly the kind of line the community lives for because it turns a deep infrastructure story into workplace drama with servers.

And yes, the comments instantly centered less on the raw engineering and more on the vibe: respect, curiosity, and a little “wow, these people really suffered.” The author jumped in personally, saying they’d worked hard on the “story arc and readability,” which got the thread feeling less like a dry technical memo and more like a behind-the-scenes confession from someone who’s survived too many mystery outages. The strongest reaction is basically: finally, they owned the problem instead of guessing whose fault it was. That landed hard because everyone loves a tale about cutting out a flaky middle layer and getting receipts.

The humor is subtle but there: readers are clearly primed for the classic “it’s always DNS / it’s always the load balancer / it’s never just one thing” genre of tech meme. The real drama isn’t that the old system was bad—it’s that it was good enough for years, until the cost, confusion, and finger-pointing became impossible to ignore. That’s catnip for infrastructure fans.

Key Points

  • Zalando’s Product Read API serves millions of requests per second with single-digit millisecond latency across 25 European markets.
  • The system initially routed both edge and internal fan-out traffic through Skipper, Zalando’s open-source Kubernetes ingress controller and HTTP router.
  • The batch endpoint could expand one request into up to 100 parallel downstream calls, making latency depend on the slowest of many Skipper hops.
  • Shared infrastructure in the hot path made it difficult to distinguish whether incidents and latency spikes originated in Skipper, PRAPI, or elsewhere.
  • Zalando moved high fan-out internal routing into an in-process client-side load balancer that reproduced Skipper’s hash ring and added fade-in, bounded load, and availability-zone-aware routing features.

Hottest takes

"I put a lot of effort into the story arc and readability" — cjbooms
"replicating the exact same hash-ring" — cjbooms
"happy to take any questions" — cjbooms
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.