Experiments with Kafka's head-of-line blocking (2023)

Kafka making your jobs wait? Devs yell: stop using a freight train for pizza

TLDR: An experiment shows Kafka can stall jobs when a slow task blocks the line, while beanstalkd keeps workers busy. Comments erupted: one side says Kafka isn’t a job queue, the other says careful tuning can fix it. The takeaway: choose tools by need—ordering vs speed—to avoid surprise bottlenecks.

A simple experiment pitted Kafka against beanstalkd using 100 tiny “jobs,” four of which are slow on purpose. In Kafka, messages are split across partitions; each consumer owns some. If one slow job lands at the front, everything behind it stalls—that’s head‑of‑line blocking. In beanstalkd, the queue hands the next job to the next free worker, so a slowpoke doesn’t freeze the line. The moment this dropped, the comments went nuclear. One camp shouted, “Kafka is a stream log, not a chore chart!” arguing order is a feature and using it like a worker queue is asking for pain. The counter‑camp flexed tuning tips: more partitions, randomized keys, concurrent handlers, idempotent processing (safe to run twice), and careful offset commits—“you can make it work.” Then came the snark: “Benchmarking with naps? Cute.” Critics said sleeping isn’t real I/O, so results prove little beyond “slow things slow you down.” Memes flew—“freight train delivering pizza,” “HOL means ‘Hold On, lad’,” and the eternal RabbitMQ vs Kafka cage match, with a few “just use SQS” drive‑bys. War stories poured in: stuck partitions, surprise replays, and beanstalkd fans calling it “old but gold.” The pragmatic chorus: pick the tool for the job—if you need strict order, accept the waits; if you want work‑queue speed, use a real queue.

Key Points

  • Kafka’s partitioned consumer-group model can cause head-of-line blocking when used as a job queue.
  • Beanstalkd and RabbitMQ dispatch jobs to the next available consumer, mitigating HoL blocking.
  • The experiment generates 100 jobs, with four 10-second sleeps and the rest 0-second sleeps, enqueued into both Kafka and beanstalkd.
  • Kafka is configured with 10 partitions and five consumers (two partitions per consumer); beanstalkd has five consumers.
  • Watcher processes measure total completion time by counting dummy messages emitted after each job is processed, starting on the first and stopping on the 100th.

Hottest takes

“Kafka is a log, not your todo list—stop abusing it” — packetdad
“Add partitions, process concurrently, make handlers idempotent; problem solved” — threadripper
“Benchmarking with sleeps proves nothing except nap time; show real I/O” — napPolice
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.