May 10, 2026

Small file, huge comment-section energy

Replacing a 3 GB SQLite db with a 10 MB FST (finite state transducer) binary

Dev shrinks a giant dictionary download—and the comments instantly start arguing

TLDR: A dictionary app creator cut a massive 3 GB word download down to about 10 MB by using a much more specialized way to store words. Commenters loved the result, but instantly split into two camps: "great practical fix" versus "hang on, wasn’t this old news—and why wasn’t the original file compressed better?"

A developer behind a Finnish-English dictionary app just pulled off the kind of glow-up that makes the internet sit up straight: he replaced a 3 GB download with something closer to 10 MB, all by ditching a bloated general-purpose setup for a tiny custom-built word lookup file. In plain English, he stopped hauling around an entire filing cabinet and switched to a very smart index card. The crowd’s reaction? Equal parts "wow", "wait, didn’t we invent this ages ago?", and "okay but why was it 3 GB in the first place?"

That last part is where the comment section got juicy. One camp loved the story’s honesty: the developer first shipped the "bad easy thing"—a huge database—because it worked, then came back later with a cleaner fix. Commenters practically turned that into a life lesson about shipping first and optimizing later. But the skeptics were not letting the victory lap go unchallenged. One sharply asked why normal compression didn’t already tame that monster-sized download if so much of the data repeated itself. Ouch.

Meanwhile, nostalgia nerds barged in with major been-there energy. Multiple readers said the "new" trick felt suspiciously familiar, with one pointing to an older name for the same idea and another reminiscing about cramming a word game dictionary into a 6 MB cache years ago. There was also some wholesome side-quest energy, with people wondering if the same trick could rescue Turkish or Japanese dictionaries too. So yes, the article is about shrinking software—but the real show is the comments: half applause, half archaeology, and just enough side-eye to keep it spicy.

Key Points

  • The article describes Taskusanakirja, a Finnish-English dictionary that relies on incremental prefix-based search.
  • The first implementation used a trie in Go and could store roughly 400,000 items in about 50–60 MB with optimizations.
  • Scaling to include Finnish inflected forms increased the dataset to an estimated 40–60 million items, which the trie approach could not handle within the target memory footprint.
  • As an interim solution, the author shipped inflections in a separate SQLite database using Full Text Search, which performed well but required a 3 GB download.
  • The post revisits the problem nine months later with the goal, stated in the title, of replacing the large SQLite database with a much smaller finite state transducer binary.

Hottest takes

"DAFSA is the rediscovery of a data structure called Directed Acyclic Word Graph" — lscharen
"I chose to do the bad easy thing" — Hendrikto
"Wouldn’t vanilla compression have dealt with that" — cadamsdotcom
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.