January 9, 2026
Pretty pics, spicy logs
Sorted string tables (SST) from first principles
Gorgeous charts, hot log debate, and an icon war
TLDR: The post explains how sorted tables help databases read just the right data from SSDs that fetch fixed‑size pages. Comments split between awe at the visuals, surprise that append‑only logs still shine on SSDs, and a bold “no index needed” link drop—making storage design newly dramatic and relevant.
Almog Gavra explains how databases store stuff on disk, turning Sorted String Tables—think tidy, alphabetized lists—into a bingeable explainer. He shows why solid‑state drives (SSDs) read in fixed chunks called pages, so smart layout reduces waste. But the comment section stole the show. craftkiller swooned over the visuals—"stunning!"—then sparked an unexpected mini‑war over those tiny window buttons, asking what tool made the charts. Cue design snobs vs. pragmatists.
The spiciest thread? epistasis gasped that the "append‑only log" (just keep adding entries and never rewrite) isn’t just for old spinning disks—it still shines on SSDs. That surprised people who expected different rules, and kicked off log‑heads vs. index‑heads banter. Then mac3n dropped a link to ksip with the flex: "binary search on mmpa’d sorted text files, no index needed"—translation: use memory‑mapped files (treat a file as in‑memory) and skip extra lookup structures. Some cheered the simplicity; others eye‑rolled "please, not on my production."
Jokes flew: chart envy, "Swiss Army Knife" puns, and threats to print the diagrams as wall art. The vibe: nerdy disk physics, gorgeous charts, and a shock twist—logs are back, baby.
Key Points
- •SSTs are discussed in the context of SSD-based storage and database disk layouts.
- •SSD I/O is page-based (typically 4KB), causing read amplification when only part of a page is needed.
- •An experiment using Direct I/O shows similar latency for 1KB and 4KB reads (~9.2µs), indicating overhead dominates.
- •Data transfer is a small portion (1–2%) of SSD read latency; command processing and address translation dominate.
- •Databases reduce read amplification by exploiting spatial and temporal locality to co-locate frequently accessed data.