LLM-as-a-Courtroom

AI puts on a judge’s wig to fix stale docs, and the internet yells “Objection”

TLDR: Falconer launched an “AI courtroom” to decide when company docs should update after code changes, swapping numeric scores for argued cases. Commenters are split between loving the adversarial rigor and slamming the cost, speed, and whether AI can truly judge “user harm,” making this both bold and controversial.

In today’s episode of “AI goes full courtroom drama,” Falconer says it can stop “documentation rot” — those out‑of‑date company docs everyone hates — by having an AI courtroom decide when to update them. Instead of shaky 1–10 ratings, their system brings in a Prosecutor, Defense, a mini‑Jury, and a Judge to argue it out in seconds. The community? Absolutely split.

Supporters cheer the adversarial vibes. One dev noted LLMs (large language models, the chatty AIs) are much better at arguing positions than spitting out numbers, so a courtroom makes sense. But the skeptics came in hot: one commenter deadpanned that an AI can’t understand “user harm,” calling the whole thing cosplay for code. Others questioned the bill: why spin up a full legal drama per code change when a simpler model might predict updates and a quick AI summary could finish the job? One user balked at the “massive token overhead” versus a basic RAG setup — that’s a technique where the AI looks up facts as it answers.

Then came the memes: a mock trial with a judge sustaining an objection “on whichever grounds you find most compelling” had everyone picturing a literal kangaroo court for pull requests. Big promise, big questions: time saver or courtroom circus?

Key Points

  • Falconer is building a shared memory layer and automates documentation updates based on code changes.
  • Determining document updates after PR merges requires contextual judgment beyond simple pattern matching.
  • Falconer’s agent performs end-to-end review of PRs and documents, completing tasks in seconds that would take humans days.
  • The company built infrastructure to process tens of thousands of PRs daily for enterprise customers.
  • An initial numeric scoring approach proved unreliable, leading to a new “LLM-as-a-Courtroom” framework focused on reasoned arguments.

Hottest takes

"On whichever grounds you find most compelling" — test6554
"An LLM does not understand what 'user harm' is" — emsign
"a massive token overhead compared to a standard RAG check" — nader24
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.