LLM Doesn't Write Correct Code. It Writes Plausible Code

Dev forum erupts: AI churns “plausible” code that runs 20,000× slower

TLDR: An AI-built database looked legit but ran 20,000× slower than SQLite on a simple test. Comments split between “plausible isn’t correct,” corporate cynicism about buying vibes, and defenders saying it’s impressive the AI made a working database at all — proof that speed and scrutiny still matter.

The internet just watched an AI write a “working” database that looks legit on paper… then face-planted in reality. The author ran a simple test: find 100 rows by ID. The trusty SQLite took a blink (0.09 ms). The AI’s Rust rewrite? An eye-watering 1,815.43 ms — about 20,000× slower. Cue the comment section turning into a gladiator arena. One camp screamed “LLMs optimize for vibes, not truth”, tossing around the author’s line “LLMs lie. Numbers don’t.” Another camp argued the test was fair because the code compiled and passed, while skeptics like flerchin wondered if benchmarking was in the prompt — “hidden requirements” drama unlocked.

The funniest theme? Plausibility culture. lukeify quipped, “Most humans also write plausible code,” while FrankWilhoit went full corporate roast: “Enterprise buys plausible code.” Meanwhile, marginalia_nu shared a war story: Claude Code spent an hour trying to draw a fleur-de-lis — “it’s not good at tasks it hasn’t seen.” But defenders showed up, like cat_plus_plus marveling that the AI even built a database that correctly stores and reads data, urging profiling — because, hey, SQLite had decades. The author points to outside research like METR’s study and GitClear’s analysis to say this pattern isn’t a fluke. TL;DR: the code looked right, the numbers screamed “no,” and the comments turned it into a meme-fest about plausible vs correct.

Key Points

•A benchmark showed SQLite performing a 100-row primary key lookup in 0.09 ms, while an LLM-generated Rust reimplementation took 1,815.43 ms (≈20,171x slower).
•Both libraries were tested with the same C benchmark, compiler flags, WAL mode, schema, and queries for a fair comparison.
•The TRANSACTION batch baseline was already 298x slower than SQLite; INSERT without transactions was 1,857x slower; SELECT by ID 20,171x slower; UPDATE and DELETE exceeded 2,800x slower.
•Code review found a missing INTEGER PRIMARY KEY (ipk) check in the planner, preventing direct B-tree lookups and causing full table scans.
•The author argues LLMs favor plausibility over correctness and cites METR and GitClear studies showing similar issues when outputs aren’t rigorously verified.

Hottest takes

"Enterprise customers don't buy correct code, they buy plausible code" — FrankWilhoit

"Most humans also write plausible code" — lukeify

"Your LLM actually wrote a correct code for a full relational database" — cat_plus_plus

March 6, 2026

Vibes vs benchmarks

Dev forum erupts: AI churns “plausible” code that runs 20,000× slower

Key Points

Hottest takes

March 6, 2026

Vibes vs benchmarks

LLM Doesn't Write Correct Code. It Writes Plausible Code

Dev forum erupts: AI churns “plausible” code that runs 20,000× slower

Key Points

Hottest takes

Save News