April 19, 2026

Caches vs SIMD: choose your fighter

A cache-friendly IPv6 LPM with AVX-512 (linearized B+-tree, real BGP benchmarks)

New IPv6 speed trick drops — old guard yells “cache!” while language wars ignite

TLDR: A portable C++ library reimplements a fast IPv6 lookup method and posts real BGP benchmarks. Commenters argue a simple trie can rival the fancy version due to cache, quibble over AVX‑512 build flags, and groan about C++—highlighting the battle between shiny speedups and practical portability.

A new open-source library just hit: a clean-room, portable C++ take on the PlanB IPv6 lookup idea that uses chip-wide math (AVX‑512) but can fall back to ordinary code. Translation: it finds which network a given address belongs to, fast, with millions of lookups per second, and even ships Python bindings. But the comments? Pure chaos. One top voice notes real-world tests on the public internet’s routing table and drops a bomb: sometimes the “boring” old Patricia trie can tie or beat the fancy vector trick because of better use of CPU cache. Cue a cache vs SIMD cage match.

Build nerds showed up swinging, nitpicking “why detect AVX‑512 in CMake instead of a #ifdef?” like it’s 2009, while RISC‑V fans asked for a version using their favorite vector instructions. Someone deadpanned the feature name like a spell (“IPv6 longest-prefix-match”), and another sighed, “Sad it is C++,” kicking off a mini language war. Meanwhile, the project’s measured, reproducible benchmarks on a normal laptop impressed lurkers who just want fast, portable tools they can actually use. The vibe: ambitious engineering meets practical reality, with half the crowd chanting “SIMD all the things,” and the other half whispering “cache is king.” Read the paper? Sure. But the comments are the show. See the reference code here

Key Points

  • planb-lpm is a portable, MIT-licensed C++17 reimplementation of the PlanB IPv6 LPM algorithm using a linearized B+-tree.
  • It supports AVX-512 SIMD acceleration with a transparent scalar fallback and provides a dynamic FIB with wait-free lookups via a rebuild-and-swap model.
  • The library includes Python bindings (pybind11), correctness tests against a brute-force LPM reference, and a sample FIB/trace generator.
  • Benchmarks use a synthetic 100k-prefix FIB shaped to RIPE rrc00-25 and a 1M uniform-random address trace, with 20 timed passes and a warmup, pinned to a single core.
  • On an Intel i5-1035G7 (Ice Lake) with Ubuntu 24.04 on WSL and GCC 13.3, median throughput ranged roughly 47–76 MLPS across single and batched lookup modes.

Hottest takes

"a plain Patricia trie can sometimes match or beat the SIMD tree" — debugga
"Why detect avx512 in build system instead of using #ifdef ?" — ozgrakkurt
"Sad it is c++." — sylware
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.