Efficient String Compression for Modern Database Systems

Databases are squeezing text for speed; fans hype it, skeptics ask 'but how do I search'

TLDR: CedarDB champions compressing text to make databases faster and cheaper, echoing Snowflake’s “strings are everywhere” reality. Commenters clash over mainstream databases lagging, skepticism about CedarDB’s pedigree, and the big question: can compressed data still handle simple text searches like “find me this word?”

Strings run the world, and CedarDB just reminded everyone why squeezing them down matters: smaller text means faster searches and lower bills. The post walks through compressing strings (think: turning repeated website addresses into tiny codes) and why this boosts performance by fitting more data into the computer’s “fast lane.” It even nods to Snowflake’s finding that text columns are everywhere and used in filters, meaning speed really matters. But the comments? Pure drama. One camp is shocked the big names don’t do this better: “How do SQLite, MySQL, and Postgres still not have column-level string compression?” snarled one veteran. Another camp side‑eyes CedarDB with, “Never heard of it—just another cloud thing?” Meanwhile, the practical folks fire back with a very real worry: “Cool compression, but how do I do LIKE searches?”—basically, can you still find text easily when it’s squished? DuckDB fans jumped in waving receipts, linking to DuckDB’s lightweight compression explainer like it’s a mic drop. The thread ping-ponged between hype (“save money, go fast!”), skepticism (“new vendor, who dis?”), and everyday pain points (“please don’t break my text searches”). Bonus meme energy: confessions about stuffing UUIDs—those long ID codes—into text columns like guilty snacks. The vibe: compression is hot, but only if it doesn’t mess with your vibes—or your queries.

Key Points

  • Approximately 50% of data is stored as strings, which are heavily used in query filters.
  • Snowflake’s workload analysis found string columns are the most common and most frequently filtered.
  • In databases, compression primarily improves query performance by reducing memory footprints and enhancing cache and bandwidth efficiency.
  • As of January 22, 2026, CedarDB supports Uncompressed, Single Value, and Dictionary compression for strings.
  • CedarDB’s dictionary compression stores offsets for efficient random access, and the article sets up a later discussion of FSST.

Hottest takes

"Never heard of CedarDB" — ForHackernews
"I wonder how one does like queries" — mbfg
"Genuinely surprised there isn't column-level string compression in SQLite, MySQL or Postgres" — crazygringo
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.