May 10, 2026
Small file, huge comment-section energy
Replacing a 3 GB SQLite db with a 10 MB FST (finite state transducer) binary
Dev shrinks a giant dictionary download—and the comments instantly start arguing
TLDR: A dictionary app creator cut a massive 3 GB word download down to about 10 MB by using a much more specialized way to store words. Commenters loved the result, but instantly split into two camps: "great practical fix" versus "hang on, wasn’t this old news—and why wasn’t the original file compressed better?"
A developer behind a Finnish-English dictionary app just pulled off the kind of glow-up that makes the internet sit up straight: he replaced a 3 GB download with something closer to 10 MB, all by ditching a bloated general-purpose setup for a tiny custom-built word lookup file. In plain English, he stopped hauling around an entire filing cabinet and switched to a very smart index card. The crowd’s reaction? Equal parts "wow", "wait, didn’t we invent this ages ago?", and "okay but why was it 3 GB in the first place?"
That last part is where the comment section got juicy. One camp loved the story’s honesty: the developer first shipped the "bad easy thing"—a huge database—because it worked, then came back later with a cleaner fix. Commenters practically turned that into a life lesson about shipping first and optimizing later. But the skeptics were not letting the victory lap go unchallenged. One sharply asked why normal compression didn’t already tame that monster-sized download if so much of the data repeated itself. Ouch.
Meanwhile, nostalgia nerds barged in with major been-there energy. Multiple readers said the "new" trick felt suspiciously familiar, with one pointing to an older name for the same idea and another reminiscing about cramming a word game dictionary into a 6 MB cache years ago. There was also some wholesome side-quest energy, with people wondering if the same trick could rescue Turkish or Japanese dictionaries too. So yes, the article is about shrinking software—but the real show is the comments: half applause, half archaeology, and just enough side-eye to keep it spicy.
Key Points
- •The article describes Taskusanakirja, a Finnish-English dictionary that relies on incremental prefix-based search.
- •The first implementation used a trie in Go and could store roughly 400,000 items in about 50–60 MB with optimizations.
- •Scaling to include Finnish inflected forms increased the dataset to an estimated 40–60 million items, which the trie approach could not handle within the target memory footprint.
- •As an interim solution, the author shipped inflections in a separate SQLite database using Full Text Search, which performed well but required a 3 GB download.
- •The post revisits the problem nine months later with the goal, stated in the title, of replacing the large SQLite database with a much smaller finite state transducer binary.