LLM Doesn't Write Correct Code. It Writes Plausible Code

It compiles, it passes… then it crawls: AI’s ‘good enough’ showdown

TLDR: An AI-generated database remake looked correct but ran up to 20,171x slower on simple searches. Commenters split: some warn “plausible” is dangerous without deep checks, others say humans do the same and fast iteration makes it fine. It matters because many trust AI code that can silently fail.

An AI-built remake of a popular database looked like a winner on paper — it compiled, passed tests, claimed fancy features — then clocked up to 20,171x slower on basic lookups. Cue the internet meltdown. The original author says LLMs (AI coding tools) chase “plausible,” not “correct,” and the crowd instantly split into camps. The legal crowd chimed in: “This is my job now,” warning that plausible-but-wrong is a time bomb when busy reviewers rubber-stamp work. The philosophers piled on with, “How is that different from humans?” Meanwhile pragmatists joked the AI can “fake it,” and that’s fine if you iterate fast.

There was even meta-drama: someone yelled “dupe!” and dropped a link, while others nitpicked confusion with Turso (a legit fork that runs within 1.2x of the original). The vibe: numbers don’t lie, but vibes do. The post’s claim — define success before you code — got love, especially after the AI remake shipped all the “right” parts (planner, storage, indexing) yet missed key checks that turned quick searches into full table crawls. Cautionists cite studies (METR, GitClear) that this isn’t a one-off; boosters say “good enough” plus feedback is still a superpower. Drama score: high, jokes: spicy, stakes: real.

Key Points

  • An LLM‑generated Rust rewrite of SQLite compiles, passes tests, and claims compatibility but is dramatically slower than system SQLite.
  • Benchmarks on identical settings show a best‑case 298× slowdown, with SELECT by ID at ~20,171×, UPDATE/DELETE >2,800×, and INSERT without transaction ~1,857×.
  • A key cause is a missing integer primary key (ipk) check in the planner, forcing full table scans instead of O(log n) B‑tree lookups.
  • The reimplementation is large (~576k lines across 625 files) and includes modules like parser, planner, VDBE, B‑tree, pager, and WAL, but critical logic is flawed.
  • The article distinguishes the project from Turso/libsql (a C SQLite fork within ~1.2× of SQLite) and provides reproducible benchmarks; external studies (METR, GitClear) indicate the issues are not isolated.

Hottest takes

“plausible — but may be, and often is, invalid, unsound, and/or ill-advised.” — treetalker
“Ok, I’ll bite: how is that different from humans?” — seanmcdirmid
“No, but if you hum a few bars I can fake it!” — bitwize
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.