February 21, 2026
Proof or Poof?
Mathematics in the Library of Babel
From 'mind-blowing' to 'hard to verify,' readers split on AI math
TLDR: A leading researcher says AI is getting surprisingly good at writing math proofs and may hit top-tier results sooner than he thought. Commenters split between awe and alarm: some cheer the progress, others warn that AI’s proofs are hard to verify and could flood math with slick mistakes.
Mathematicians are sweating and celebrating after a reflective essay claimed today’s AI can already write respectable proofs and might outpace human researchers sooner than expected. The author admits he once underestimated the machines, even betting they wouldn’t match top-tier papers by 2030—now he expects to lose. Cue the comments section, where the vibe swings between fireworks and fire alarms.
On Team Goosebumps: one reader gushed that the piece—and the classic inspiration behind the title, Borges’ Library of Babel—is “one of the best.” The mood: awe, wonder, and a little literary swoon. On Team Red Pen: skeptics latched onto the AI “First Proof” bake-off, cheering the progress but warning that AI-written arguments are “extremely hard to verify” and often messy. The drama centers on a worry the author also raised: a flood of sloppy, wrong-but-convincing math that pollutes the field. Commenters turned the baking pun into a meme—“the dough is rising, but is it fully baked?”—while others joked that AI is great at proofs until you ask it to prove it actually proved anything.
The split is stark: dazzled optimists versus verification hawks. Everyone agrees on one thing: the pace is scorching, and the quality control oven better be preheated, fast.
Key Points
- •LLMs have progressed from early experiments in 2020 to producing correct mathematical proofs by 2022.
- •Reasoning models like o3-mini-high (assessed Feb 2025) are deemed genuinely useful for research despite errors.
- •ChatGPT 5.2 Pro (Dec 2025) can often provide reasonable proofs of expert-level but routine lemmas, with mistakes.
- •The author uses OpenAI’s Codex for advanced scientific computing tasks, indicating expanded practical utility.
- •A pilot project (“First Proof”) tested models on 10 unpublished lemmas, and the author now expects to lose a 2030 capability bet with Tamay Besiroglu.