Why German Strings Are Everywhere?

German Strings go viral: speed fans cheer while nitpickers shout “old news”

TLDR: A hyped “German Strings” text format is spreading through major data tools for faster lookups on short text. Commenters loved the speed idea but brawled over outdated titles, a stray question mark, and whether emoji-heavy text (UTF-8) could ruin the party, proving performance meets pedantry in the wild.

German Strings, a super-fast way to store short bits of text, are suddenly everywhere — from DuckDB to Apache Arrow to Polars and Facebook’s Velox. The post says the magic comes from optimizing for tiny, read-mostly data and peeking at just the first characters for quick “starts with” checks. Think: less fiddling, more sprinting. But the comments? Oh, they turned it into a soap opera.

One commenter calmly explains the name — it traces back to the Umbra project at TU Munich — while the rest of the thread combusts over meta-drama. Cue a chorus of “add the year!” as users argue the title should mark it as (2024), with one linking to an earlier HN thread and calling parts outdated. Another goes full grammar police, insisting the extra question mark in the submission “makes little sense.” Meanwhile, a practical voice warns that UTF‑8 (how computers store multi-byte characters like emoji) could complicate all this speed talk. On the fun side, one nerdy delight lands: a suggestion to make a “reverse string” to turbocharge “ends with” searches — niche but spicy. Verdict: the tech crowd loves the speed story, but the real heat is the pedantry, time-policing, and emoji anxiety. Classic internet.

Key Points

  • CedarDB created a custom “German Strings” format optimized for data processing workloads.
  • The format has been adopted by DuckDB, Apache Arrow, Polars, and Facebook Velox.
  • C strings are null-terminated and pose safety and performance challenges (length calculation, manual memory management).
  • C++ (libc++) strings store size, pointer, and capacity, are mutable, and support short string optimization; SSO is noted as not possible in Rust.
  • Design observations for German Strings: most strings are short, seldom modified, and many operations examine only prefixes, favoring immutability and prefix-optimized access.

Hottest takes

Title should mention (2024). Some of the info was already outdated back then — Rygian
The added question mark in the HN submission makes little sense — cubefox
UTF-8 could add complications — jmclnx
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.