You Hired the AI to Write the Tests. Of Course They Pass

Developers split: AI is grading its own homework—should anyone trust the A+

TLDR: An engineer warns that letting the same AI write code and tests is like grading your own exam, and suggests writing clear checklists first with outside tools to verify. Commenters split: some demand “frozen” tests and better planning, others say weak tests are better than none—or pit multiple AIs against each other.

The post that lit up dev-land: one engineer admits his AI agents write code while he sleeps—and yes, the same AI writes the tests. The punchline? Of course they pass. He calls it a “self‑congratulation machine,” and pitches an old‑school fix with a modern twist: write clear checklists first (what the feature should do), then let tools like Playwright run the checks like a robot browser jury. It’s basically Test‑Driven Development (write tests first) sped up by AI.

Cue the crowd. The top vibe is “fox guarding the henhouse.” BeetleB begs for a way to freeze the tests so the code can’t quietly rewrite the rules mid-game. digitalPhonix slams the whole setup as cart-before-horse chaos: how did we get to “50 pull requests a week” before deciding if any of it is right? Others get philosophical: RealityVoid says this happens with humans too—if you pay folks just to write tests, they often just bless whatever exists; the real cure is better, clearer specs.

Meanwhile, the pragmatists are here for vibes and velocity. Havoc argues even weak AI tests beat nothing and swears by external test suites—the big, third‑party checklists you don’t control—as the final boss. And then there’s the multiverse crowd: lateforwork runs Gemini vs Claude vs ChatGPT like an AI cage match, claiming cross‑model reviews keep everyone honest. The memes write themselves: “group project where the group gives itself an A,” “AI marking its own exam,” and “hope is not QA.”

Key Points

  • Autonomous AI coding tools increase code throughput, but reliable verification becomes a bottleneck.
  • Having the same AI write both code and tests risks validating the AI’s interpretation rather than the intended spec.
  • A test-first approach inspired by TDD is recommended: define acceptance criteria in plain English before coding.
  • Frontend verification uses Playwright to run browser checks, capture screenshots, and produce per-criterion results.
  • Backend verification checks observable API behavior (status codes, headers, messages) using tools like curl; this confirms conformance to specs but not spec correctness.

Hottest takes

"I wish there was a way to "freeze" the tests." — BeetleB
"That’s really putting the cart before the horse." — digitalPhonix
"You can have Gemini write the tests and Claude write the code." — lateforwork
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.