AI2: Open Coding Agents

Open coding bots for everyone: fans cheer cheap training, skeptics shout “you ignored Meta”

TLDR: AI2 launched SERA, open coding bots you can train cheaply on your own code, with fast performance and a $400 recipe. The community split fast: critics point to Meta’s higher scores and call out the marketing, while supporters say low cost and fully open tooling trump leaderboard flexing.

AI2 just dropped SERA, a set of open coding bots (think autocorrect for code) plus a DIY recipe to train them on your own projects for as little as $400. They’re touting fast output speeds and total openness—models, training report, models on HF, and a one-line CLI. Benchmarks claim SERA-32B solves 54% on a well-known coding test, and NVIDIA-backed tuning boasts wild throughput numbers. Sounds like a feel‑good open-source moment… until the comments lit up.

Enter the fact-checkers. One user swung in with receipts: Meta’s CWM models reportedly hit 65% on the same test, accusing AI2 of “conveniently” leaving that out. Another commenter summed up the confusion with meme-ready energy: “color me confused”—if SERA is “smaller and better,” why is it 32B and scoring lower than some rivals? The leaderboard police were out in force.

But a counter‑wave rallied around the price tag. The hottest take: the big number isn’t accuracy—it’s $400 to reproduce strong open performance. Folks building internal tools say the real pain isn’t a few percentage points; it’s cost, compliance, and integration. Others cheered the “open everything” vibe—model, weights, training pipeline, even the corpus. TL;DR: it’s a classic internet split—scoreboard vs. accessibility—while speed‑fans chant “crazy fast,” skeptics cry “marketing spin,” and everyone grabs popcorn for round two.

Key Points

  • AI2 released Open Coding Agents and the SERA model family, providing open models, training recipes, and data for repo-adaptive coding agents.
  • SERA-32B is reported to solve 54.2% of SWE-Bench Verified tasks and trains in about 40 GPU days or fewer on small NVIDIA GPU clusters.
  • The training method claims major cost reductions, matching SWE-smith at 57× lower cost and SkyRL at 26× lower cost.
  • AI2 collaborated with NVIDIA to optimize inference; benchmarks show up to ~8,600 tokens/sec on 4×B200 (NVFP4) and ~3,700 tokens/sec on 4×H100 (FP8).
  • All components—including models, Claude Code integration, and training data—are open and can be launched with a single command for easy use and customization.

Hottest takes

"Claims in the article are incorrect." — ahmadyan
"color me confused." — khimaros
"The interesting number here isn't accuracy. It's the $400" — augusteo
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.