April 10, 2026

One line, eight hours, many memes

Bluesky April 2026 Outage Post-Mortem

Half the app went dark for 8 hours after a tiny code oops — and the comments are feral

TLDR: Bluesky says a missing “speed limit” in new code unleashed thousands of lookups, exhausting network ports and knocking out half the app for 8 hours. Commenters split between “classic dev slip-up,” cynicism that it’s common, and jokes about blue-on-blue screenshots, while many applaud the candid write‑up.

Bluesky’s own engineer dropped a brutally honest post-mortem admitting that half the app went wobbly for 8 hours because a new internal tool fired off monster requests — 15,000 to 20,000 posts at once — without a “speed limit.” Cue a flood of connections, network ports ran out, and boom: users left staring at error screens. The community pounced on the money line: one missing safeguard in the code. Top vibe? One tiny line took down half the party. The drive-by verdict “That’ll do it” became the meme of the day.

But the thread didn’t stop at dunking. One camp asked why anyone would batch 20,000 posts in the first place, turning “Why?” into a whole mood. Another crowd shrugged, calling the “this is common” failure mode a boring industry classic — too many connections, not enough guardrails. Then a side-quest erupted: readers roasted the screenshot styling — “light blue on dark blue,” with one old-school hero bragging it “works in lynx” (the vintage text browser). Between the roasts, some gave props for transparency and the promise to fix observability, while others side-eyed the cheeky “we’re hiring” line. The takeaway? A missing limit became a meme, a design gripe stole a scene, and Bluesky’s honesty got a cautious nod.

Key Points

  • Outage intermittently affected about half of Bluesky users for ~8 hours, with warning signs over the prior weekend.
  • Error logs indicated memcached issues and port binding failures, aligning with traffic drops; network transit was not the cause.
  • A newly deployed internal service sent occasional very large batches (15–20k URIs) to the GetPostRecord RPC.
  • GetPostRecord lacked bounded concurrency (missing errgroup.SetLimit), spawning tens of thousands of goroutines and flooding memcached connections.
  • Rapid connection churn led to many sockets in TCP TIME_WAIT, exhausting ports and degrading cache access across services, complicating diagnosis.

Hottest takes

"That’ll do it." — threecheese
"I expect this is common." — goekjclo
"Lite Blue on a dark Blue background" — jmclnx
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.