December 9, 2025
Vibes vs Benchmarks!
Mistral Releases Devstral 2 (72.2% SWE-Bench Verified) and Vibe CLI
EU fans cheer Mistral, but 'Vibe CLI' name divides devs
TLDR: Mistral launched Devstral 2, an open coding AI claiming strong bug-fixing scores, plus a new Vibe CLI tool. The crowd split between EU-flavored cheering and name-based eye-rolls, with skeptics demanding links and proof for benchmark and cost claims — a hype-filled release with receipts requested.
Mistral dropped Devstral 2, a big brainy coding model, plus the Vibe CLI to automate code changes in your terminal. They claim 72.2% on SWE-bench Verified — a test where AI fixes real bugs — and say it's up to 7x cheaper than rivals like Claude. It's open-source, free via API, and a smaller 24B version can run on consumer hardware.
But the comments turned this into a vibes check. "I'm so glad Mistral never sold out," cheered EU fans, framing Mistral as the homegrown hero. Others rolled their eyes at the name: "'Vibe CLI' sounds like an unserious tool," snarked one, while another said "vibe-coding is fun, but my manager wants bugs fixed." Power users are excited to compare it to local favorites; one uses GPT-OSS-120b and wonders if Devstral 2 can replace it.
Then came the receipts brigade: "Where are their SWE-bench results?" asked skeptics, pointing to missing links and mismatched leaderboards. Confusion popped up over model size vs tokens, and cost claims sparked debate about real-world tasks versus marketing slides. The mood? Hype meets side-eye: people love the open, EU energy, but they want proof — and a CLI name that doesn’t sound like a beach playlist
Key Points
- •Mistral AI released Devstral 2 (123B) and Devstral Small 2 (24B) with a 256K context window, under modified MIT and Apache 2.0 licenses, respectively.
- •Devstral 2 scores 72.2% and Devstral Small 2 scores 68.0% on SWE-bench Verified; Devstral 2 is currently free via API.
- •The models are significantly smaller than DeepSeek V3.2 and Kimi K2, and Mistral claims Devstral 2 is up to 7x more cost-efficient than Claude Sonnet at real tasks.
- •Devstral 2 supports production workflows (multi-file edits, dependency tracking, failure detection and retries) with on-prem deployment and fine-tuning options; Small 2 supports image inputs for multimodal agents.
- •Mistral launched the open-source Mistral Vibe CLI (Apache 2.0) for terminal-based code automation, with IDE integration via the Agent Communication Protocol; human evals show Devstral 2 beats DeepSeek V3.2 but trails Claude Sonnet 4.5.