Maxproof

AI math machine beats gold-medal level, and the comments immediately turn into chaos

TLDR: MaxProof pushed a math AI past the score usually needed for a gold medal in elite competitions, a big milestone for machine reasoning. Commenters were split between impressed and suspicious, joking about score ties with teenagers while arguing whether the real winner was the AI or the testing strategy.

A new math-focused AI system called MaxProof just posted a score that would clear the gold-medal bar at the International Mathematical Olympiad, the ultra-hard world championship for teen math stars. In plain English: this model doesn’t just blurt out one answer and hope for the best. It creates lots of possible proofs, checks them, fixes them, and then makes them battle it out until one winner survives. That big headline result — 35/42 on IMO 2025 and 36/42 on USAMO 2026 — had readers doing the online equivalent of dropping their coffee.

But the real party was in the comments. One camp zoomed in on the funniest twist: even with a gold-medal-level score, the AI still got caught in the same messy score tie as a bunch of human contestants. As one commenter joked, the “real AGI test” might not be solving the math, but surviving the same scoring traffic jam as 46 teenagers. Another crowd immediately started a more serious food fight: is the secret sauce actually the testing setup, not the model itself? In other words, did the brains win, or did the strategy win?

And because this is the internet, the jokes landed fast too. “Not a good day to be named Max,” one person deadpanned, while another used the moment to push the eternal nerd agenda: more formal verification, please. Even the medal math became drama, with one commenter noting that 2025 had an unusually high share of gold medals, adding a delicious wrinkle to the victory lap. You can practically hear the comment section yelling, “Amazing result… but let’s argue about what it really means.”

Key Points

  • The article introduces MaxProof as a population-level test-time scaling framework for mathematical proof in the MiniMax-M3 series.
  • M3 is trained for three capabilities: proof generation, proof verification, and critique-conditioned proof repair.
  • These capabilities are merged into a single released M3 model and used at test time in multiple roles, including generator, verifier, refiner, and ranker.
  • MaxProof searches across a population of candidate proofs and selects a final proof through tournament selection.
  • The article reports M3 reaching 35/42 on IMO 2025 and 36/42 on USAMO 2026, above the stated human gold-medal threshold on both.

Hottest takes

"the real AGI test ... the same scoring traffic jam as 46 teenagers" — pfannl
"Is the harness more valuable than the weights?" — thierrydamiba
"not a good day to be named Max" — minimaxir
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.