March 1, 2026
Compression wars, serve the cake
Have your cake and decompress it too
Vortex drops “smaller, faster” claim; commenters yell speed over size and “seen it before”
TLDR: Vortex claims smaller files and much faster reads by stacking lightweight, random-access compressors instead of heavy zip tools. Comments split: veterans say speed trumps size, and others argue the idea echoes ORC, BtrBlocks, and OpenZL—so it’s clever, but not entirely new.
Vortex just bragged it can make data files 38% smaller and read them 10–25x faster than the old guard, all by chaining a bunch of lightweight tricks instead of slapping on heavy-duty compressors like ZSTD. Think: specialized “mini-compressors” for numbers and strings, stacked smartly so you can still jump to any value instantly. The crowd loved the bold claim but immediately split into camps. One veteran waved a caution flag: “scan rate is more important than size”, arguing speed wins every time even if your files aren’t the tiniest. Cue jokes about “decompressing a whole cake just for one bite” and cheers for random-access reads.
The real drama? A few folks shrugged, saying this is basically a remix of ideas from the academic BtrBlocks paper and older formats like ORC. Another chimed in that it looks a lot like OpenZL, which auto-builds a custom compressor for your data—translation: cool, but not brand new. Fans teased Parquet for relying on a big final squeeze that kills quick lookups, while skeptics tossed eye-rolls at “yet another format.” The hot take meter spiked: speed-first pragmatists vs. size-obsessed minimalists, with memes about cake, frosting, and who’s actually reinventing the wheel. Delicious data drama!
Key Points
- •Vortex reports TPC-H SF10 files that are 38% smaller and decompress 10–25x faster than Parquet with ZSTD, without using general-purpose compression.
- •The approach is to try multiple lightweight encodings and compose them per column, inspired by the BtrBlocks framework.
- •Parquet employs per-page lightweight encodings followed by a general-purpose compressor (e.g., ZSTD, LZ4, Snappy) per column chunk.
- •General-purpose compression reduces random access and hampers sparse lookups and late materialization by requiring full-page decompression.
- •Parquet’s hard-coded encoding cascade and limited repertoire hinder extensibility; discussions are underway to add encodings like ALP, while BtrBlocks advocates recursive chaining of lightweight encodings.