Replacing a 3 GB SQLite db with a 10 MB FST (finite state transducer) binary

Dev shrinks a giant dictionary download—and the comments instantly start arguing

TLDR: A dictionary app creator cut a massive 3 GB word download down to about 10 MB by using a much more specialized way to store words. Commenters loved the result, but instantly split into two camps: "great practical fix" versus "hang on, wasn’t this old news—and why wasn’t the original file compressed better?"

A developer behind a Finnish-English dictionary app just pulled off the kind of glow-up that makes the internet sit up straight: he replaced a 3 GB download with something closer to 10 MB, all by ditching a bloated general-purpose setup for a tiny custom-built word lookup file. In plain English, he stopped hauling around an entire filing cabinet and switched to a very smart index card. The crowd’s reaction? Equal parts "wow", "wait, didn’t we invent this ages ago?", and "okay but why was it 3 GB in the first place?"

That last part is where the comment section got juicy. One camp loved the story’s honesty: the developer first shipped the "bad easy thing"—a huge database—because it worked, then came back later with a cleaner fix. Commenters practically turned that into a life lesson about shipping first and optimizing later. But the skeptics were not letting the victory lap go unchallenged. One sharply asked why normal compression didn’t already tame that monster-sized download if so much of the data repeated itself. Ouch.

Meanwhile, nostalgia nerds barged in with major been-there energy. Multiple readers said the "new" trick felt suspiciously familiar, with one pointing to an older name for the same idea and another reminiscing about cramming a word game dictionary into a 6 MB cache years ago. There was also some wholesome side-quest energy, with people wondering if the same trick could rescue Turkish or Japanese dictionaries too. So yes, the article is about shrinking software—but the real show is the comments: half applause, half archaeology, and just enough side-eye to keep it spicy.

Key Points

•The article describes Taskusanakirja, a Finnish-English dictionary that relies on incremental prefix-based search.
•The first implementation used a trie in Go and could store roughly 400,000 items in about 50–60 MB with optimizations.
•Scaling to include Finnish inflected forms increased the dataset to an estimated 40–60 million items, which the trie approach could not handle within the target memory footprint.
•As an interim solution, the author shipped inflections in a separate SQLite database using Full Text Search, which performed well but required a 3 GB download.
•The post revisits the problem nine months later with the goal, stated in the title, of replacing the large SQLite database with a much smaller finite state transducer binary.

Hottest takes

"DAFSA is the rediscovery of a data structure called Directed Acyclic Word Graph" — lscharen

"I chose to do the bad easy thing" — Hendrikto

"Wouldn’t vanilla compression have dealt with that" — cadamsdotcom

May 10, 2026

Small file, huge comment-section energy

Dev shrinks a giant dictionary download—and the comments instantly start arguing

Key Points

Hottest takes

May 10, 2026

Small file, huge comment-section energy

Replacing a 3 GB SQLite db with a 10 MB FST (finite state transducer) binary

Dev shrinks a giant dictionary download—and the comments instantly start arguing

Key Points

Hottest takes

Save News