June 30, 2026
Smart model, messy receipts
Claude Sonnet 5 – benchmark results
Big score, big doubts: fans roast the missing stats and token appetite
TLDR: Claude Sonnet 5 posted a standout intelligence score and huge context window, but the community fixated on missing benchmark data, heavy word usage, and frustrating access limits. The real debate wasn’t whether it’s smart on paper, but whether it’s reliable and worth using in real life.
Claude Sonnet 5 arrives with a flashy resume: a top-tier intelligence score, support for text and images, and an enormous memory window big enough to swallow what feels like a small library. On paper, it looks like one of the smartest models around. In the comments, though, the vibe is less "wow" and more "wait, what am I even looking at?"
The loudest reaction was pure distrust. One commenter joked that so much of the benchmark page was missing or contradictory that it felt like the model itself had generated the report and then made half of it up. That line basically became the mood of the thread. People weren’t just debating whether Claude Sonnet 5 is good; they were side-eyeing the scoreboard itself.
Then came the cost-and-effort outrage. Multiple users complained that the model seems to burn through way more words, and therefore more usage, than rivals. In plain English: critics think it talks a lot, charges a lot of attention for it, and still runs into frustrating limits. One person flatly called it "yet another mediocre model," while another said they were tired of Anthropic’s caps and token-hungry behavior.
The spiciest drama was about real-world usability versus shiny test results. A commenter argued that if requests get blocked by provider rules for mysterious reasons, those should count as total failures in benchmarks, because that’s the actual user experience. Translation: the community isn’t just asking "Is it smart?" They’re asking "Can I actually use this thing without it eating my budget and refusing to work?"
Key Points
- •Artificial Analysis lists Claude Sonnet 5 (Adaptive Reasoning, Max Effort) as a proprietary reasoning model released in June 2026.
- •The article says the model supports text and image input, outputs text, and has a 1 million-token context window.
- •The page reports an Artificial Analysis Intelligence Index score of 53, compared with an average of 8 among comparable models.
- •Artificial Analysis states that the model generated 300 million tokens during Intelligence Index evaluation, versus an average of 37 million tokens.
- •The benchmark methodology says Intelligence Index v4.1 is based on nine evaluations and compares proprietary models by price range using a blended 3:1 input/output price ratio.