April 29, 2026
Ug brief. Plugin sad.
I benchmarked Claude Code's caveman plugin against "be brief."
Plugin gets humbled after two tiny words do the exact same job
TLDR: A benchmark found that a popular chatbot-shortening add-on didn’t do better than simply saying “be brief.” Commenters split between mocking the hype, joking that at least it was funny, and wondering if AI tools are now changing how real people write.
A tiny showdown over a chatbot add-on turned into a full-blown comments-section roast. The test itself was simple: could a popular tool called Caveman, built to make Claude reply in ultra-short bursts, beat the world’s most boring instruction — just telling it to “be brief”? After 24 prompts and five different test setups, the answer was basically nope. The short version: the fancy add-on and the two-word command delivered almost the same quality and almost the same amount of text. That immediately sent the crowd into detective mode, dunk mode, and existential-crisis mode all at once.
The strongest reaction was pure disbelief. One commenter flat-out said they “still can’t believe that people take Caveman seriously,” arguing that saving a few words at the end barely matters when people burn through mountains of text in long coding sessions anyway. Ouch. But not everyone was sharpening knives — some readers treated the whole thing like a lovable joke, with one admitting that Caveman made them laugh, which honestly may be the add-on’s biggest win of the day.
Then came the sneaky side-drama: not just whether AI tools are useful, but whether they’re changing how humans write. One commenter said the article’s phrasing had that unmistakable AI smell, then dramatically announced they’d been enjoying classic fiction more lately — a wonderfully petty literary subplot. Others were more practical, saying the benchmark was actually useful because now they want to stick “be brief” into their own default instructions. In other words, the plugin may have lost the battle, but the comments turned it into a referendum on hype, humor, and whether we’re all starting to sound like robots.
Key Points
- •The article benchmarks Caveman against a simple “be brief.” instruction using 24 prompts across six categories and five test arms.
- •All five arms were scored with per-prompt rubrics covering required facts, required terms, and prohibited wrong claims, using claude-opus-4-7 for generation and claude-sonnet-4-6 for evaluation.
- •Reported quality scores were tightly clustered: baseline 0.985, brief 0.985, lite 0.976, full 0.975, and ultra 0.970.
- •Every arm reportedly achieved 100% key-point coverage, with zero must_avoid triggers across 120 responses.
- •The article says Caveman’s main differentiator is not better compression than “be brief.”, but structured output and safety-aware behavior such as Auto-Clarity.