February 8, 2026
Old code, hot takes
How to squeeze a lexicon (2001) [pdf]
2001 hack to cram huge word lists has devs at war over "timeless" vs "just use a DB"
TLDR: A 2001 paper explains a compact way to check if a word is in a list using a minimal automaton, beating bulkier methods. Comments split between praising timeless efficiency and saying modern databases or Bloom filters are enough, with extra drama over DAWG vs ADFA definitions — useful, funny, and very nerdy.
An old-school 2001 paper just resurfaced, showing how to pack massive word lists into a tiny, fast structure called a minimal ADFA — basically a super-compact map that says “is this word in the list?” without storing every letter. The paper compares this to tries (tree-like word maps), hashing, and Bloom filters (space savers that sometimes guess wrong). The crowd went full popcorn. One camp swooned over “elegant, memory-cheap engineering,” claiming this beats bloated apps and modern overkill. Another camp shrugged: “Just use a database and move on,” saying today’s hardware makes all this compression theater unnecessary. Things got spicy when someone insisted a DAWG (a compressed word graph) isn’t the same as an ADFA — cue a pedant-off with diagrams and barking jokes. Bloom filter fans chimed in with “false positives are fine for most use-cases,” only to get roasted by spelling-checker purists who need guaranteed answers. Language nerds loved the bit about English vs highly inflected languages: affix stripping is cute until Polish or Finnish shows up. The memes? “2001 PDF still beats your microservice,” “Bloom filters are vibes with lies,” and a lot of “my grandma’s crossword app did this.” Classic internet energy: nostalgia, nitpicking, and surprisingly wholesome CS lore.
Key Points
- •Minimal ADFAs provide compact, fast-access representations for finite string sets (lexicons).
- •Traditional DFA minimization is resource-intensive for large collections; the paper promotes a more efficient construction algorithm for minimal ADFAs.
- •The algorithm is presented for three ADFA variants, with minor improvements and comparisons to other data structures.
- •Hashing approaches, including perfect hashing and Bloom filters, have space and certainty trade-offs and may require storing or compressing strings.
- •Tries (abbreviated and full) offer different space-time trade-offs, with memory reduction possible via lists or binary search trees for child pointers.