MiniMax M2.5 released: 80.2% in SWE-bench Verified

Cheaper than coffee, faster than Claude? Devs split over MiniMax M2.5

TLDR: MiniMax’s M2.5 promises speedy, low-cost coding help—about $1/hour—and hits 80% on a major bug-fixing test. Comments split between bargain hunters eager to try it (some say it’s free on OpenCode) and Claude loyalists or company rules blocking adoption, fueling a workflow-versus-wallet showdown.

MiniMax just dropped M2.5 and tossed a price grenade into the dev world: it claims “intelligence too cheap to meter,” roughly $1/hour for a nonstop session, and clocked an 80% score on SWE-Bench (a bug-fixing test) while matching the speed of top models like Claude Opus. The company brags it now “plans like an architect” and handles many languages, browsing, and tool use—aka it can click around the web and apps on its own to get work done. Fans are already making wallet memes: “Cheaper than my latte, faster than my intern!”

But the comments are where the real fight is. Power user mythz is already team MiniMax, calling it “fast, cheap and excellent” and even preferring Chinese open-source tools. Deal hunters cheer jhack and denysvitali for noting it’s on the cheapest coding plan and even free on OpenCode right now. Meanwhile, workflow purists like turnsout are chained to their Claude Code setup and asking how to actually plug MiniMax into real projects. One frustrated dev says company rules keep them stuck with the Big Three—OpenAI, Anthropic, Google—“burning credit” in a week. The vibe: benchmark bragging vs real-world lock-in, budget thrill vs comfort-zone loyalty, with a side of GLM vs MiniMax rivalry for good measure.

Key Points

  • MiniMax released MiniMax-M2.5, trained via reinforcement learning in hundreds of thousands of real-world environments.
  • M2.5 reports SOTA benchmark results: 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp (with context management).
  • M2.5 completes SWE-Bench Verified 37% faster than M2.1 and matches the speed of Claude Opus 4.6; operational cost is cited as ~$1/hour at 100 tokens/s.
  • The model is trained on over 10 programming languages and covers the full development lifecycle across Web, Android, iOS, and Windows projects, with a spec-writing planning behavior.
  • Evaluations show improved generalization and agentic efficiency, with fewer rounds than M2.1; new or upgraded benchmarks include VIBE Pro and RISE.

Hottest takes

"fast, cheap and excellent at tool calling" — mythz
"tied to my Claude Code workflow" — turnsout
"stuck with OpenAi, Anthropic and Google LLMs… they burn my credit" — 3adawi
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.