July 3, 2026
Proof, panic, and petty comments
Leanstral 1.5: Proof Abundance for All
This free proof bot looks powerful, but the comments instantly turned into a bug-hunt brawl
TLDR: Leanstral 1.5 is a new free AI tool that’s claiming top-tier results at checking difficult math and code correctness, while costing far less than some rivals. Commenters were split between impressed and skeptical, with the biggest fight over whether its flashy bug-finding example was genuinely clever or just overhyped.
Leanstral 1.5 just arrived waving a giant scorecard: it’s free, open-source, and apparently absurdly good at checking hard math proofs and spotting mistakes in code. The makers say it hit 100% on one major test set, topped rivals on several advanced benchmarks, and even found bugs in open-source software. In plain English: this is a robot built to do the kind of painfully careful checking humans usually avoid, and it’s now cheaper and more accessible than many competing systems.
But the real show was in the comment section, where applause quickly turned into a mini food fight. One camp basically said, “Cool results, but that bug example is not the flex you think it is.” User boulos openly questioned whether the highlighted bug was really some exotic edge case, while moonset piled on with a killer comparison: they claim Codex with GPT-5.5 High found the same bug from a simple prompt, arguing it wasn’t especially sneaky so much as “too few eyeballs” had looked at the code. Ouch.
Then came the classic AI-model user frustration: what exact prompt are we supposed to use? Commenter nullc wanted concrete examples and even a guide written for other language models, complaining tiny prompt changes can cause wildly different results. And because no internet debate is complete without tone policing, another user snapped back that one criticism felt like a “lazy, straw-manning comment.” So yes, Leanstral dropped a serious technical release — but the crowd was busy arguing whether the headline bug was impressive, reproducible, or just good old-fashioned marketing glitter.
Key Points
- •Leanstral 1.5 was released as a free Apache-2.0 licensed formal reasoning model for Lean 4 with 119B total parameters and 6B active parameters.
- •The article reports benchmark results including 100% on miniF2F, 587/672 on PutnamBench, and new state-of-the-art scores of 87 on FATE-H and 34 on FATE-X.
- •The model is trained in three stages: mid-training, supervised fine-tuning, and reinforcement learning with CISPO.
- •Training uses two RL environments: a multiturn theorem-proving loop with Lean compiler feedback and a code-agent workflow using filesystem access, bash commands, and the Lean language server.
- •The release also fully open sources FLTEval, a benchmark based on real pull requests from the Fermat’s Last Theorem repository.