January 6, 2026
Bloom or doom? Pick your join poison
How to Improve a Perfect Join Algorithm
Perfect on paper, sluggish in reality — fans say Bloom filters make it fly
TLDR: A beloved “perfect” method for combining tables is too slow in practice, so adding Bloom filters is pitched as a fix to skip useless work. The community is split: theory fans applaud the elegant speed-up, while practitioners insist hash joins still rule unless the filters fit flawlessly.
Database nerds are brawling over a classic “perfect” trick for stitching tables together: Yannakakis’s algorithm. On paper it’s instance‑optimal (aka best possible in theory), but in real life it’s 2–3x slower than the go‑to method, hash join. The author suggests a speed boost using Bloom filters—tiny memory “sieves” that can quickly say “definitely not” or “maybe” so you skip junk early. Cue drama: theorists, clutching their upcoming ICDT talk, argue elegance + guarantees; engineers counter with “my pager says go fast.”
The hottest thread? People dunking on “perfect but slow.” One camp cheers: “Bloom filters fit in cache, semijoins stop wasting time—finally practical!” Another camp fires back: “False positives mean extra work; just build the hash tables and move on.” Memes are flying: “Bloom or Doom,” “Instance‑optimal, not instance‑operational,” and a galaxy‑brain vs. wrench‑emoji face‑off. Benchmark bros scream for numbers; DBAs joke about six passes vs. three like it’s a cardio routine. The vibe: fun, salty, and deeply split—half the crowd wants theory to earn its keep, the other half says if it ain’t faster, it ain’t happening. Everyone agrees on one thing: make joins fast or make excuses.
Key Points
- •Yannakakis’s algorithm is instance-optimal for acyclic joins but can be 2–3x slower than hash join in practice.
- •In a worst-case input where no tuples are eliminated, six semijoin passes become redundant.
- •On that input, Yannakakis’s approach performs ~9n lookups and 6n inserts vs. hash join’s ~3n probes and 3n inserts.
- •An on-the-fly construction of intermediate hash tables during semijoins is noted as a minor optimization.
- •Using Bloom filters for semijoin results shifts many operations to cache-friendly checks and inserts, reducing overhead.