March 14, 2026
Compression wars, but make it messy
An Ode to Bzip
Internet crowns bzip the unsung hero; zstd shouts back
TLDR: Bzip shrank a real Lua codebase the most, beating popular options like zstd and xz. Comments split between praising tiny file sizes, slamming bzip’s slow compression, clarifying bzip3 isn’t bzip2, and grumbling that a Go library enabled zstd in a patch release—text savers vs speed lovers.
Today’s compression cage match didn’t go how the crowd expected: in a real test on Lua code for a Minecraft mod, the old-school bzip family squeezed the most, beating gzip, zstd, xz, brotli, and lzip. The author says bzip’s secret sauce is BWT (a reorder-and-group trick) instead of LZ77 (a find-and-reference trick), which favors text over binary. Translation: it clusters similar letters so simple rules can munch long runs of characters.
The comments went full soap opera. One fan dubbed bzip2 the “unsung coworker” while speed lovers yelled “just use zstd” because bzip can be slow to compress. Another reader argued the piece barely compared xz and zstd — “unless I’m missing something, the only references to x…” — stirring demands for broader benchmarks. Then came the lore drop: “bzip3 has close to nothing to do with bzip2,” a reminder it’s a different take on the same idea.
Meanwhile, a side quest erupted: Go’s klauspost/compress flipped on zstd in a patch version, prompting “wait, defaults changed in a patch?!” energy. Meme-wise, the crowd joked bzip is the “grandpa with gains,” zstd the “gym bro,” and BWT the “weird but brilliant uncle.” The vibe: if you’re squeezing text to the last byte, bzip’s back; if you want speed and convenience, the zstd crowd isn’t budging.
Key Points
- •On a 327,005-byte Lua code file, BWT-based bzip variants produced the smallest compressed sizes (bzip3: 61,067 bytes; bzip2 -9: 63,727 bytes).
- •LZ77-based compressors (Zopfli/gzip, zstd, xz, brotli, lzip) yielded larger outputs ranging from ~67–76 KB under high settings.
- •Bzip’s use of the Burrows–Wheeler Transform groups characters by context, enabling efficient modeling and run-length encoding without storing origin positions.
- •BWT performs best on consistent text-like data; mixed contexts (e.g., “color” vs “colour”) can reduce its efficiency compared to LZ77’s recency-based references.
- •bzip3 lacks compression level parameters and bzip2’s -9 option has limited effect, whereas LZ77-based methods often rely on level tuning due to costly match searches.