March 1, 2026

Compression wars, serve the cake

Have your cake and decompress it too

Vortex drops “smaller, faster” claim; commenters yell speed over size and “seen it before”

TLDR: Vortex claims smaller files and much faster reads by stacking lightweight, random-access compressors instead of heavy zip tools. Comments split: veterans say speed trumps size, and others argue the idea echoes ORC, BtrBlocks, and OpenZL—so it’s clever, but not entirely new.

Vortex just bragged it can make data files 38% smaller and read them 10–25x faster than the old guard, all by chaining a bunch of lightweight tricks instead of slapping on heavy-duty compressors like ZSTD. Think: specialized “mini-compressors” for numbers and strings, stacked smartly so you can still jump to any value instantly. The crowd loved the bold claim but immediately split into camps. One veteran waved a caution flag: “scan rate is more important than size”, arguing speed wins every time even if your files aren’t the tiniest. Cue jokes about “decompressing a whole cake just for one bite” and cheers for random-access reads.

The real drama? A few folks shrugged, saying this is basically a remix of ideas from the academic BtrBlocks paper and older formats like ORC. Another chimed in that it looks a lot like OpenZL, which auto-builds a custom compressor for your data—translation: cool, but not brand new. Fans teased Parquet for relying on a big final squeeze that kills quick lookups, while skeptics tossed eye-rolls at “yet another format.” The hot take meter spiked: speed-first pragmatists vs. size-obsessed minimalists, with memes about cake, frosting, and who’s actually reinventing the wheel. Delicious data drama!

Key Points

  • Vortex reports TPC-H SF10 files that are 38% smaller and decompress 10–25x faster than Parquet with ZSTD, without using general-purpose compression.
  • The approach is to try multiple lightweight encodings and compose them per column, inspired by the BtrBlocks framework.
  • Parquet employs per-page lightweight encodings followed by a general-purpose compressor (e.g., ZSTD, LZ4, Snappy) per column chunk.
  • General-purpose compression reduces random access and hampers sparse lookups and late materialization by requiring full-page decompression.
  • Parquet’s hard-coded encoding cascade and limited repertoire hinder extensibility; discussions are underway to add encodings like ALP, while BtrBlocks advocates recursive chaining of lightweight encodings.

Hottest takes

"scan rate is more important than size" — gopalv
"It was easier to beat Parquet's defaults" — gopalv
"Looks similar to OpenZL" — pella
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.