January 6, 2026

Bloom or doom? Pick your join poison

How to Improve a Perfect Join Algorithm

Perfect on paper, sluggish in reality — fans say Bloom filters make it fly

TLDR: A beloved “perfect” method for combining tables is too slow in practice, so adding Bloom filters is pitched as a fix to skip useless work. The community is split: theory fans applaud the elegant speed-up, while practitioners insist hash joins still rule unless the filters fit flawlessly.

Database nerds are brawling over a classic “perfect” trick for stitching tables together: Yannakakis’s algorithm. On paper it’s instance‑optimal (aka best possible in theory), but in real life it’s 2–3x slower than the go‑to method, hash join. The author suggests a speed boost using Bloom filters—tiny memory “sieves” that can quickly say “definitely not” or “maybe” so you skip junk early. Cue drama: theorists, clutching their upcoming ICDT talk, argue elegance + guarantees; engineers counter with “my pager says go fast.”

The hottest thread? People dunking on “perfect but slow.” One camp cheers: “Bloom filters fit in cache, semijoins stop wasting time—finally practical!” Another camp fires back: “False positives mean extra work; just build the hash tables and move on.” Memes are flying: “Bloom or Doom,” “Instance‑optimal, not instance‑operational,” and a galaxy‑brain vs. wrench‑emoji face‑off. Benchmark bros scream for numbers; DBAs joke about six passes vs. three like it’s a cardio routine. The vibe: fun, salty, and deeply split—half the crowd wants theory to earn its keep, the other half says if it ain’t faster, it ain’t happening. Everyone agrees on one thing: make joins fast or make excuses.

Key Points

  • Yannakakis’s algorithm is instance-optimal for acyclic joins but can be 2–3x slower than hash join in practice.
  • In a worst-case input where no tuples are eliminated, six semijoin passes become redundant.
  • On that input, Yannakakis’s approach performs ~9n lookups and 6n inserts vs. hash join’s ~3n probes and 3n inserts.
  • An on-the-fly construction of intermediate hash tables during semijoins is noted as a minor optimization.
  • Using Bloom filters for semijoin results shifts many operations to cache-friendly checks and inserts, reducing overhead.

Hottest takes

"If it’s 3x slower, it’s not perfect—it’s cosplay" — queryGrinch
"Bloom filters? Great until your ‘maybe’ becomes ‘oops’" — cacheMeOutside
"Theory flexes, prod sweats; ship the fast one" — opsGoblin
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.