December 16, 2025
Proof or spoof? You be the judge
Prediction: AI will make formal verification go mainstream
AI promises bug‑proof code; commenters split between ‘prove it’ and ‘it just argues’
TLDR: A bold prediction says AI will make math-style checks for code cheap and common, shifting effort to writing clear specs. Commenters are split: some want bots to self-test and prove code, others report gaslighting and bad proofs, and a vocal camp says just let AI play QA for everything.
The article claims a bold future: AI won’t just write code, it’ll prove the code is correct—like a math teacher stamping “A+” on your app. The pitch? Let large AI models crank out the boring proof stuff so everyday engineers can finally use the once‑elitist “formal verification” tools. Cue the comments section turning into a courtroom drama.
On one side, Team Pro‑Proof cheers. One dev says the trick is giving bots a sandbox to run and validate their own code, not just spit it out. Another dreams of tricky languages like Rust and Haskell feeling less like a wall of thorns, because an AI tutor can nudge you out of dead ends. The meme of the day: ditch “artisanal bugs,” embrace robot‑checked builds.
But Team Skeptic brought receipts. One commenter recounts trying Google’s Gemini and getting gaslit: it nitpicked their spec, dismissed real output as “invalid,” then generated its own broken spec. Others warn there’s not enough example proofs to learn from, and that chatbots still lack the deep understanding to prove anything that matters. And then there’s Team Test‑Everything, arguing if AI can prove stuff, it might as well be your full‑time QA intern running every test after every change. Verdict? The jury’s still fighting, but the trial is pure tech theater.
Key Points
- •Formal verification has historically been difficult and costly, limiting its use in industrial software engineering.
- •Proof assistants/languages (Rocq, Isabelle, Lean, F*, Agda) enable formal specifications and proofs of code correctness.
- •The seL4 microkernel verification required 20 person-years and 200,000 lines of Isabelle for 8,700 lines of C, illustrating verification effort.
- •LLM-based coding assistants can generate proof scripts; with proof checkers, invalid proofs are rejected and retried.
- •Automation shifts the challenge from proof writing to correctly defining specifications; AI may help translate between natural and formal language.