January 13, 2026

Proofs, Preemptibles, and Popcorn

Running Lean at Scale

Math AI goes thrift shopping: 500k cheap machines spark comment war

TLDR: A new service lets math AI test proof steps across swarms of cheap, interruptible machines, scaling beyond pricey graphics chips. Comments clash over the real cost (is 500k machines for $5k legit?), hype around an IMO‑level AI, and whether this mostly just double‑checks chatbot answers.

An infrastructure team claims they’ve taught their “proof-bots” to crank through math theorems using Lean—think a language that helps computers check every step of a math proof—with a souped‑up switchboard called a REPL service. Translation: they can try tons of tiny proof steps across swarms of cheap, interruptible machines (preemptible instances) while keeping the fancy graphics chips free for training. Cue the comments turning into an accounting showdown. One user did the napkin math and asked if 500,000 computers for an hour costs just $5,000, or if hidden fees will bite. The vibe: “coupon‑clipping AI” vs “you forgot about network, storage, and chaos tax.”

Hype arrived fast, with another commenter linking this work to Aristotle, an AI that “performed at Gold level at IMO,” the International Math Olympiad—paper here. Then a tinkerer dropped a DIY twist: a Claude Code plugin so you can poke Aristotle yourself—GitHub link. Meanwhile, the curious crowd asked the existential question: is this basically fact‑checking large language models (LLMs), or does distributed Lean have bigger uses? Jokes landed about “stateless vibes” and the keyword sorry meaning “we’ll fix it later,” while skeptics wondered if 500k tiny tasks equals 500k tiny headaches. The strongest split: budget bragging vs practical pain, with a side of math‑contest flex and plug‑and‑play tinkering.

Key Points

  • A custom automated reinforcement learning system is used to improve models that prove theorems in Lean.
  • The REPL service serves as a stateless Lean execution framework, handling all interactions between models and Lean proofs.
  • The REPL provides operations (run Lean code, run tactic, export/import state) enabling distributed tree exploration of proof states.
  • The service scales to hundreds of thousands of CPUs and uses preemptible instances to reduce compute costs.
  • Robustness requirements include handling Lean errors, network issues, and timeouts, with client reconnection and resumption supported.

Hottest takes

"running 500000 instances for 1 hour can be done for about $5000" — auggierose
"performed at Gold level at IMO" — RGamma
"used to validate the output of llms?" — ncgl
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.