Your job is to deliver code you have proven to work

Prove it or just send it? Commenters feud over “working code”

TLDR: Simon Willison says developers must show proof—manual demos and automated tests—that their code truly works, even with AI tools. Commenters split: some want customer outcomes and “good enough,” others reject manual-first or warn AI-written tests can mask bad code, while a few wonder if this problem is overblown.

Simon Willison lit up dev-land by saying the quiet part loud: your job is to deliver code you’ve proven works—with real manual testing and automated tests, even when using AI helpers. He points to step-by-step demos, screen recordings, and “the perfect commit” playbook (link) as the gold standard. It’s basically: don’t make your coworkers debug your homework.

Cue the comments cage match. One voice sighed, “Maybe in an ideal world,” while 9rx pushed a spicy counter: solving customer problems sometimes means no code, or even “good enough” code that isn’t perfect. Testing drama escalated when allcentury declared manual-first testing “not very productive,” arguing automation should lead and humans should sanity-check at the end. Then andy99 poured gasoline on the fire: even if you demand proof, people can slap AI-written tests onto giant code dumps and call it “verified.” Meanwhile, dfxm12 asked the question everyone’s thinking: are these AI-fueled chaos PRs actually happening in your org—or is this a boogeyman?

Memes flew: “Screenshots or it didn’t run,” “LLM is my intern vs LLM is my alibi,” and a new office rule: no Friday uploads without receipts. The only thing everyone agreed on? If you’re going to bring AI to work, bring proof too.

Key Points

•Willison argues developers must submit code they have personally proven to work, not rely on reviewers to validate it.
•Proof requires two non-optional steps: manual testing (including happy paths and edge cases) and automated testing.
•Manual testing should be reproducible and evidenced (e.g., command sequences with output or screen capture videos).
•Automated tests must accompany changes and be designed to fail if the implementation is reverted; integrate an effective test harness.
•With the rise of coding agents (e.g., Claude Code, Codex CLI), developers should ensure these tools also verify changes by executing and testing code, using frameworks like Click’s CLIRunner for CLI projects.

Hottest takes

"works well enough" — 9rx

"Manual testing as the first step… not very productive" — allcentury

"LLM‑written tests slapped on giant PRs" — andy99

December 18, 2025

Screenshots or it didn’t run

Prove it or just send it? Commenters feud over “working code”

Key Points

Hottest takes

Save News