What Are Skiplists Good For?

Skiplists: from nerd cult to secret sauce—devs split as the “use something else” crowd pushes back

TLDR: A “skiptree” spin on skiplists helped a team speed through huge histories in Google BigQuery by skipping slow one‑by‑one lookups. Comments split between fans saying skiplists quietly power modern databases, skeptics asking “why not classic indexes,” and pragmatists sharing real‑world wins and even a GPU‑powered twist.

Skiplists just went from “hipster data structure” to surprise hero after one team said a souped‑up version—dubbed a “skiptree”—saved their massive bug‑hunting runs. In plain speak, a skiplist is a list with express lanes so you don’t check every item one by one. The twist: they used it to avoid BigQuery’s slow “find this exact row” lookups by letting them jump through huge test histories faster.

Cue the comment war. One camp beams “told you so,” with ahartmetz praising the simple code and even name‑dropping that Qt’s old QMap used skiplists (link). Another camp flexes receipts: mrjn says skiplists underpin in‑memory tables for LSM trees—“Log‑Structured Merge,” the engine behind many modern databases—so this isn’t just a niche toy, it’s everywhere. But the skeptics roll in hard: locknitpicker asks the million‑query question—why not the classic B‑tree indexes (the usual database tree) before inventing new tricks?

Then the thread gets spicy‑nerdy. winwang drops a GPU‑friendly skiplist (think graphics cards crunching searches in parallel), while tooltower shares a down‑to‑earth win: using skiplists to keep a running account balance fast—aka “folds,” but for your money. The memes write themselves: Team Express Lane vs Team Checkout Line. Is it cult status, secret sauce, or both? The crowd can’t stop arguing—and we’re here for it.

Key Points

  • Skiplists are randomized, hierarchical linked structures serving as drop-in replacements for binary search trees with similar asymptotic performance.
  • Nodes are probabilistically promoted (e.g., 50% chance per level), enabling O(log n) search by traversing from top levels down.
  • Skiplists are favored by some for simpler, understandable lock-free concurrent implementations.
  • Antithesis faced a traversal problem on a large branching tree of fuzzing timelines stored in Google BigQuery.
  • BigQuery’s scan-optimized design makes iterative parent_id lookups expensive, effectively causing O(depth) full table scans for tree history queries.

Hottest takes

"Skiplists have some nice properties - the code is fairly short and easy to understand, for one." — ahartmetz
"skiplists form the basis of in-memory tables used by LSM trees, which are themselves the basis of most modern DBs (written post 2005)." — mrjn
"The article makes no mention of b-trees if any kind." — locknitpicker
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.