Bloom filters are good for search that does not scale

Small sites cheer, academics flex, and Bing drops receipts

TLDR: The post says Bloom filters are brilliant for small-site search but shaky at web scale. Commenters fire back with real-world wins, a fresh academic algorithm, and Bing’s usage—agreeing they shine as quick “not here” checks, not full-blown web search. It matters because speed vs. size trade-offs rule search.

A throwback blog argues Bloom filters—tiny memory-savvy checklists that can say “definitely not here”—make sense for small site search, but fall apart at web scale. Cue the comments: the crowd split into hype vs. hard limits, and it’s delightful. One camp, like sanskarix, loves the “nope!” power: you avoid needless work and save cash, especially in caching. Another camp storms in with academic swagger: MattPalmer1086 drops a “I invented a very fast string search” and links a 2024 paper, turning the thread into a citation showdown SEA 2024. Then majke slaps down Big Tech receipts—Bing reportedly uses Bloom filters in its fresh index paper and BitFunnel—and suddenly the “doesn’t scale” headline looks more “it depends.” The vibe gets storytime cozy when susam reminisces about speeding petabyte-scale network queries at RSA, while hijinks pops in with the eternal question: “is there a better way for haystack hunts?” Jokes fly about the blog’s “cat on a bus” example—proof that language is chaotic—and the thread lands on a spicy consensus: Bloom filters are killer for fast no’s, not perfect yes’s. Scale? Tricky. Use cases? Plenty.

Key Points

  • Per-document Bloom filters enable compact client-side full text search indexes for small sites.
  • Query time is O(number of documents), making naïve Bloom filter search unsuitable for large corpuses.
  • Sorting filters to enable binary search is shown to fail via a simple counterexample.
  • A tree of aggregate Bloom filters (via bitwise OR) does not reduce search due to high-dimensional text overlap.
  • The article introduces an “Inverted Index of Bloom Filters” section but provides no further details in the excerpt.

Hottest takes

"they let you say 'definitely not here' without checking everything" — sanskarix
"I invented a very fast string search algorithm based on bloom filters" — MattPalmer1086
"Bing uses bloom filters for the most-recent index" — majke
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.