March 16, 2026
Proofs on a budget? Hold my theorem
Mistral Releases Leanstral
Budget math-bot or cut‑rate compromise? Commenters clash
TLDR: Mistral’s Leanstral is an open, low-cost agent for checking math and code proofs, with impressive results for the price. Commenters are split: some cheer the accessible, cheaper option, while others argue that for high‑stakes correctness, Anthropic’s pricier Opus still wins—and is worth every dollar.
Mistral just dropped Leanstral, an open-source “code agent” that helps write and check math and software proofs inside Lean 4. The pitch: it’s fast, cheap, and open. The reaction: budget hero vs. bargain-bin risk. One camp is thrilled the Apache-licensed model and free API put serious proof tools in everyone’s hands. Another camp says, hold up—if the job is correctness, is cheaper what you want?
The data flex from Mistral: Leanstral beats some open mega-models while costing far less, and even outpaces Claude Sonnet on a key test when given two tries (about $36 vs. $549). But Anthropic’s top model, Opus, still scores higher overall—at an eye-watering $1,650. That’s where the comments catch fire. andai jokes about “trustworthy vibe coding,” then pokes the logic: if Leanstral scores lower than top-tier models, why brag about price? jasonjmcghee doubles down: paying more for Opus is “totally worth it” for critical work. lefrenchy wonders if any Mistral model can truly challenge Opus at all.
Meanwhile, tinkerers are already theory-crafting “AI Avengers” strategies—mixing models across multiple attempts to squeeze out extra wins. Bottom line: the community loves the open and affordable angle, but a loud faction argues that when the stakes are high, only the absolute best scores—and maybe the priciest—will do.
Key Points
- •Mistral released Leanstral, an open-source Lean 4 code agent with weights under Apache 2.0, available via Mistral Vibe and a free API.
- •Leanstral uses a sparse architecture with 6B active parameters, supports MCPs (notably lean-lsp-mcp), and leverages parallel inference with Lean as a verifier.
- •Mistral introduced FLTEval and benchmarked Leanstral on real PRs to the FLT project, focusing on completing formal proofs and defining new concepts.
- •Against open-source models, Leanstral achieved higher FLTEval scores with fewer passes, e.g., 26.3 at pass@2 and 29.3 at comparable cost levels.
- •Compared to Claude models, Leanstral offers strong cost-performance (e.g., 26.3 at $36 vs Sonnet’s 23.7 at $549), while Opus 4.6 scores highest but at much higher cost.