Show HN: Keybench – Scriptable, extensible performance tool for key value stores

A new speed-testing app for data tools has commenters cheering, nitpicking, and joking about benchmark wars

TLDR: Keybench is a new tool that lets developers run the same speed test against different data stores, making comparisons feel more fair. Commenters liked the idea but instantly launched the usual benchmark debate, with skeptics warning that test results can still start endless fan fights.

A new Show HN post dropped with Keybench, a tool for testing how fast different key-value databases really are, and the crowd instantly did what the internet does best: turned a neat utility into a full-on benchmark drama thread. The basic pitch is simple enough for non-experts: you write a little script describing the kind of work you want a database to do, then Keybench runs that same job across different systems and compares speed and delays. Fans loved that idea, saying it promises a fairer fight because the same test runs everywhere instead of each database getting its own flattering setup.

But of course, the comments didn’t just clap politely. The hottest reaction was the classic tech-world battle cry: "benchmarks are always suspicious." Some readers were excited about the flexibility and the fact that it can test multiple engines with the same setup; others immediately warned that every performance contest becomes a mess the second someone picks the “wrong” workload. In other words, people agreed the tool is useful, while also arguing that any result it produces will probably start fights.

The jokes came fast too. A few commenters basically treated this like the latest entry in the endless “my database can beat up your database” saga, while others poked fun at the idea that developers will now spend their weekends writing tiny scripts just to prove their favorite storage engine is secretly a race car. The mood was equal parts impressed, skeptical, and gleefully ready for chaos.

Key Points

  • Keybench benchmarks sorted key-value stores using Lua-defined workloads that can run unchanged across multiple engines.
  • It reports both workload units per second and primitive operations per second, distinguishing high-level workload rate from raw key-touch rate.
  • Latency is measured as per-operation histograms with p50, p99, p99.9, and maximum values for operations such as put, get, del, range, mget, mput, and mdel.
  • The system is modular, with replaceable components for engine execution, workload definition, backend plugins, and reporters.
  • The default build includes an in-memory skiplist engine, while RocksDB and TidesDB can be added through make-time backend flags.

Hottest takes

"my database can beat up your database" — anon
"benchmarks are always suspicious" — anon
"now people can automate their benchmark arguments" — anon
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.