Data Compression Explained

An old-school guide to shrinking files has commenters geeking out, name-dropping legends, and begging for AI tea

TLDR: The article explains the basic rules of data compression and why there’s no universal trick for making every file smaller. In the comments, readers treated it like a cult classic: praising author Matt Mahoney, arguing the benchmarks are outdated, and pushing for answers on whether AI can do better.

A dusty-but-respected internet classic about how files get smaller somehow turned into a mini fan club meeting, a history lesson, and an "AI, but make it practical" debate. The article itself is a deep explainer from compression guru Matt Mahoney, walking readers through the basics: some kinds of shrinking keep every bit intact, some throw away details you probably won’t notice, and no, sadly, there is no magic app that can shrink everything forever. That last point alone feels like the kind of fact that should be printed on every sketchy "boost your storage" ad online.

But the real action is in the reactions. One commenter basically set off the nostalgia siren by saying the old benchmark charts are from the "pre Fabrice Bellard days", with the spicy implication that the leaderboard is ancient tech history now. Another jumped in with a much hotter take: databases are "very, very stupid" compared with highly tuned search methods, claiming the gap can be absurdly huge. That’s the kind of comment that makes engineers quietly crack their knuckles.

Meanwhile, the softer side of the thread showed up too: multiple people were openly admiring Mahoney, with one simply saying "Matt is a great guy" and another reminding everyone he made ZPAQ, a niche backup tool that instantly raised his legend status. And then came the modern plot twist: someone asked for sources on AI-based compression, because if compression is really prediction, the crowd clearly wants to know when robots entered the chat.

Key Points

  • The book is a self-contained technical introduction for readers who want to understand data compression or build compression software.
  • It defines compression as reducing bits needed for storage or transmission and distinguishes between lossless and lossy methods.
  • The article states that all compression algorithms use at least a model and a coder, with optional preprocessing transforms.
  • It explains that optimal coding is known, but optimal modeling is not computable in general.
  • The information theory section says no universal lossless compressor can guarantee compression for every input, especially random data.

Hottest takes

"pre Fabrice Bellard days" — rurban
"Databases are mostly very, very stupid" — rurban
"Matt is a great guy" — blobbers
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.
Data Compression Explained - Weaving News | Weaving News