December 29, 2025

Hash Wars: Lookup Love, Insert Woes

High-performance C++ hash table using grouped SIMD metadata scanning

Faster lookups, slower inserts: new C++ table sparks 'niche' backlash and Rust jokes

TLDR: A new C++ hash table speeds up lookups at scale by scanning grouped slots, edging out others beyond 1M entries, but inserts are slower. Commenters called it niche due to fixed size and no deletes, questioned the benchmarks, and argued “use more memory” beats it without fancy tricks.

A new C++ hash table claims big speed by scanning groups of 16 slots at once, promising faster lookups at scale. The numbers: tiny wins at 1–2 million entries, losses at small sizes, and slower inserts (about 0.72×). The catch? No resizing or deletion, SSE2-only, and you must pre-size your table. Cue the comments going full popcorn mode. jeffbee rolls in with the harsh reality check: static-size “win” means “pretty useless or niche,” while zX41ZdbW pokes the bear, calling the benchmark unrealistic and pointing to production-style datasets. Meanwhile, SwissTable loyalists and Boost fans insist this is not new, saying Google’s trick already does grouped scans, and nly adds, “Boost.Unordered does this too.” Then squirrellous drops a pragmatic bomb: “just use more memory and lower the load—fewer collisions, no fancy SIMD needed.” Rust folks chime in with “Rust port when?” and ARM users sigh at the x86-only SSE2 requirement. Academic spice enters the chat with a flex about Elastic Hashing “disproving a 40-year-old conjecture,” which gets eye-roll memes and “paper hype” jokes. Verdict from the crowd: great if you never change anything and your job is lookup-heavy. For everyone else, it’s speed theater.

Key Points

  • Benchmarks show the grouped SIMD hash table is faster than ankerl::unordered_dense at 1M (71.9 ms vs 72.3 ms) and 2M elements (156.3 ms vs 157.4 ms), but slower at 100k and 500k.
  • At 1M elements and ~80% load, inserts are 0.72x the speed of ankerl::unordered_dense, while lookup hits are 1.69x faster and misses 1.21x faster.
  • The design probes contiguous groups of 16 slots and uses SSE2 intrinsics to scan 16 metadata bytes in a few instructions, filtering with a 1-byte tag (occupied flag + 7-bit hash fragment).
  • Recommended use cases are tables larger than ~500k elements with lookup-heavy workloads; the table is header-only, fixed-capacity, and requires C++17 with SSE2 on x86-64.
  • Limitations include no deletion or resizing, underperformance on small tables, and lack of an ARM NEON version; related research includes Elastic Hashing challenging Yao’s uniform probing conjecture.

Hottest takes

"Static size, no deleting... pretty useless or at best niche" — jeffbee
"The test does not look realistic" — zX41ZdbW
"You don’t even need SIMD in that case since collisions are rare" — squirrellous
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.