June 20, 2026

Robot, meet your hall monitor

Human Judgment as a Specification

AI can write the rules, but the comments say humans still have to babysit it

TLDR: Researchers say AI-written software needs human judgment to verify what was actually intended, and their PICK tool tries to make that easier with simple choices instead of blind trust. Commenters turned it into a heated referendum on programmer responsibility, with some demanding people still read the code and others warning most won’t.

The big idea in "Human Judgment as a Specification" is simple enough for non-experts: if you’re going to let AI help write software, you also need a clear way to check whether it did what you actually meant. The researchers pitch a tool called PICK that doesn’t just ask people to trust one AI answer. Instead, it shows several possible answers and makes humans choose between them using concrete examples. In plain English: don’t just vibe-check the robot—make it prove itself.

But the comments? That’s where the real fireworks start. One camp was firmly in the "good, finally, some adult supervision" lane. jMyles got philosophical about PICK being allowed to fail, arguing it’s better for the tool to admit “none of these answers work” than to quietly ship nonsense. That sparked the most practical and spicy clash in the thread: should programmers still be expected to read the code AI spits out? remywang said yes, full stop, calling code the real source of truth and pushing for smaller, easier-to-read output. Then ekidd came in with pure doom-posting energy, saying they were “heartbroken” that reading code is now considered too much to ask of students or professionals.

And then there was the unexpected arts-and-humanities plot twist. otekengineering argued this whole mess is really about judgment, metaphor, and even philosophy—skills more often taught outside engineering. So yes, the article is about checking AI work, but the comments turned it into a mini culture war over laziness, craftsmanship, and whether software’s future belongs to careful readers or exhausted button-clickers.

Key Points

  • The article argues that increased use of generative AI in programming requires stronger use of formal methods to verify outputs against intended requirements.
  • It says using LLMs to translate prose into formal specifications does not by itself solve correctness problems caused by ambiguity, misconceptions, or context-dependent intent.
  • The article proposes that humans must remain involved in specification formalization, and that the workflow should be both meaningful and moderate in effort.
  • It describes PICK as a tool that compares multiple LLM-generated candidates by asking users to judge concrete examples that distinguish them.
  • The article reports that experimental results show the PICK workflow works well and says it has been implemented for regular expressions, LTL, and ABAC.

Hottest takes

"Better to know than to ship the regex anyway" — jMyles
"The code is the best source of truth" — remywang
"I am honestly heartbroken to live in a world where reading the code is seen as an unreasonable ask" — ekidd
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.