Parallelizing ClickHouse aggregation merge for fixed hash map

A tiny math tweak makes billion‑row queries fly—and the comments go wild

TLDR: A new ClickHouse tweak lets threads merge groups in parallel without converting data structures, boosting speed. The community is stunned by a 7x gap from a simple “0 +” trick, sparking debates over footguns, smarter defaults, and why billion‑row queries should never hinge on type quirks.

ClickHouse just hit a plot twist: a developer’s PR to speed up how big groups of data get merged revealed that adding 0 + to a calculation makes a billion‑row query run almost 7× faster. The culprit? Type quirks. One version treated the group key as a small number, trapping merges in a single “drawer,” while the other used a bigger type that unlocked multi‑threaded merging across “buckets.” In plain English: color‑coded bins versus one giant junk drawer. The PR lets threads work on disjoint key sets in place—no conversion, less drama.

Comments exploded. Performance folks called it a footgun: “Why does math trivia decide speed?” Others defended ClickHouse’s low‑level control, arguing the optimizer shouldn’t guess types for users. A middle camp begged for “auto‑promote or warn” so beginners don’t ship the ‘0+’ hack. Then came a spooky memory deallocation failure in CI; meme lords dropped “hash map ghosts” and declared the 0+ Gang. Profilers debated why the flamegraph looked unchanged—CPU time doesn’t show threads overlapping—cue explainers and smug charts. The final vibe: applause for the clever per‑thread merge, side‑eye at defaults, and a loud chorus: make fast the default, or at least shout when you’re choosing the junk drawer

Key Points

•Two similar ClickHouse queries over 1e9 rows differed greatly in runtime (≈62.8s vs ≈8.55s) due to differing group-by key types.
•Grouping by UInt16 used a fixed hash map (one-dimensional array), hindering parallel merge; wider types used standard/two-level hash tables enabling parallel merge.
•Initial idea to convert the fixed array to a two-level structure was slow; an in-place parallel merge over disjoint key subsets was proposed and implemented.
•Range-based key segmentation improved wall-clock time via parallelism but left CPU time and flamegraph profiles largely unchanged.
•CI uncovered a memory deallocation assertion failure (indicative of corruption) during development, logged on 2025-09-22.

Hottest takes

"Adding 0 to a number should not be a performance hack" — data_dramatist

"I came for SQL, stayed for the fixed-hash-map soap opera" — bitflip_bandit

"54 seconds saved by a math joke? Ship it" — ops_on_fire

December 19, 2025

Zero to hero, literally

A tiny math tweak makes billion‑row queries fly—and the comments go wild

Key Points

Hottest takes

December 19, 2025

Zero to hero, literally

Parallelizing ClickHouse aggregation merge for fixed hash map

A tiny math tweak makes billion‑row queries fly—and the comments go wild

Key Points

Hottest takes

Save News