Fast-Servers

Assembly-line servers promise speed—crowd splits: old tricks vs shiny new Linux magic

TLDR: A dev recommends an “assembly line” server with one thread per CPU and simple stages for speed. Commenters spar over whether it’s smart simplicity or old news, pushing newer Linux tools, while a ghostly diagram becomes a meme—why it matters: performance vs hype, in plain code.

A developer pitches a fast, simple server model that sounds like a factory line: one thread per CPU core, each stage (like “accept” and “read”) gets its own worker, and connections move down the assembly line via efficient waiting calls (epoll/kqueue) to hit 100k requests/second. The crowd instantly turned this into a spectacle. One commenter nailed the vibe: “a little pipeline,” while another shouted, “this was discussed in 2016,” and a third said it’s basically the old SEDA idea with new paint.

Then the hot take train arrived: the Linux crowd waved io_uring (a newer way to do speedy input/output) like a shiny toy, calling the design “dated but timeless” and questioning whether splitting stages across threads actually helps. Meanwhile, a visual nitpick became the comment-section meme: “why is the first diagram duplicated at .1 opacity?”—cue jokes about ghost servers and transparency settings. The drama centers on whether this “pin your workers to cores and keep code simple” approach beats chasing the newest trick. Team Simple says fewer decisions mean fewer bugs. Team Shiny warns about overhead and urges modern tools. Verdict? The article is the appetizer; the comments are the main course.

Key Points

  • The article critiques the traditional event-dispatch server model, often implemented via libevent, as suboptimal for performance.
  • It proposes one thread per CPU core with CPU affinity, each maintaining its own epoll/kqueue descriptor.
  • State transitions (e.g., accept, read) are handled by separate threads; file descriptors are passed between threads’ event queues.
  • Implementation uses pthreads for a detached, system-scoped thread pool, and creates per-thread epoll/kqueue queues.
  • Platform specifics include Linux CPU affinity via pthread_setaffinity_np, OS X affinity via Mach thread policies, and raising RLIMIT_NOFILE to support many connections.

Hottest takes

"you'd probably use io_uring nowadays" — luizfelberti
"Seems similar to the SEDA architecture" — lmz
"little pipeline that all of the requests go thro..." — bee_rider
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.