March 12, 2026

Proofs, pop‑ups, and pitchforks

Lf-lean: The frontier of verified software engineering

AI ports 1,276 proofs in hours—fans cheer, skeptics say it’s just a tutorial

TLDR: An AI ported 1,276 textbook proofs with a claimed 350x speed-up using a “one rule checks many tasks” approach, sparking debate over whether that’s a real milestone or just a flashy demo. Commenters split between “impressive verification” and “it’s a tutorial,” while a surprise phone permissions pop-up fueled extra drama.

Theorem’s new project, lf-lean, claims an AI blitzed through 1,276 textbook proofs—translating them from one proof system to another—while staying correct by design. The team says it used “task-level specification generators” (think: one big rule that checks tons of similar tasks) and saw a wild 350x speed-up with only a couple days of human help. They even point to a METR timeline suggesting verified software engineering (code that proves itself right) is arriving faster than expected. Big talk, big chart, big vibes.

But the comments? Spicy. One early reply shrugs: “Is this impressive?” Another calls the textbook a beginner tutorial and says AI should handle it. Meanwhile, defenders argue this is a legit benchmark: 97% done autonomously, with proofs that actually verify—no flaky tests, no hand-waving. The crowd splits between “marketing flex” and “milestone moment.”

Then the thread swerved into chaos when a user on mobile saw a permissions prompt on the site and panicked. Cue memes: “Do these proofs need my location?”, “Rocq and Roll into Lean,” and “350x more pedantry.” Some want harder tests—real-world refactors, not classroom exercises. Others say you’ve got to start somewhere. Bottom line: cool demo, hotter comment section, and a pop-up that stole the show. Read the post on Theorem or the blog, if your phone lets you.

Key Points

  • lf-lean delivers a verified translation of 1,276 Logical Foundations statements from Rocq to Lean.
  • Using rocq-dove, the system auto-generates task-level correctness specifications and evaluates translations and proofs.
  • Frontier AI models autonomously produced verified translations for 97% of statements; six difficult cases were solved manually.
  • The effort required ~2 person-days of human work versus an estimated ~2.75 person-years manually, implying ~350x speed-up.
  • Positioned on METR’s time horizon framework, the result suggests verified software engineering may be advancing faster than expected.

Hottest takes

"They just ported a bunch of theorems/proofs" — ngruhn
"This website is asking me for permissions on my phone. Why?" — akkad33
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.