March 30, 2026

Shhh… Claude’s on a word diet

Universal Claude.md – cut Claude output tokens by 63%

One file puts Claude on a word diet — devs split on real savings

TLDR: A drop-in CLAUDE.md claims to cut Claude’s chatter by ~63%, but only saves money when there’s lots of output. Commenters debate if input still dominates costs, call the benchmarks cherry-picked, and argue that quality and personalization may matter more than token thrift.

A new drop-in file called CLAUDE.md promises to stop Claude’s polite rambling and shave replies by about 63%—think fewer “Sure!” openings, no emoji-curly-quote chaos, and less unsolicited advice. It works by sitting in your project and telling the AI to be brief and plain… but even the creator admits most costs come from what you feed the model, not what it says back. Translation: this only pays off if you generate lots of output.

That caveat lit the comments on fire. yieldcrv bluntly warned that your “agents and servers will still eat the bill,” and the mood shifted from “nice hack” to “is this a rounding error?” Meanwhile, btown accused the benchmarks of being cherry-picked for simple explainers, not real-world coding loops. rcleveng chimed in that their Claude (Opus 4.6) isn’t even that chatty under Claude Code, asking if this is meant for other setups. The vibe: interesting tweak, questionable savings.

Then came the culture war. sillysaurusx argued Claude already has a “personalization” switch and said they’d pay extra for better writing over shorter writing. And Tostino raised the scary flag: the rules push “post-hoc reasoning,” which could make some models dumber, not sharper. The memes wrote themselves: “Claude on keto,” “No more ‘I hope this helps!’ detox,” and “Em-dash intervention.” Fans love the cleaner output; skeptics say show us real savings—or it’s just quiet Claude cosplay.

Key Points

  • CLAUDE.md is a single file that, when placed in a project root, reduces Claude’s output verbosity and tokens by about 63%.
  • The approach primarily targets output behavior (verbosity, sycophancy, formatting noise) and does not address input token costs.
  • Benchmarks on five prompts show a reduction from 465 to 170 words (~384 output tokens saved per four prompts), but results are directional, not statistically controlled.
  • Best suited for high-output, persistent workflows (automation, repeated structured tasks, consistent parseable output needs); not cost-effective for short or infrequent queries.
  • For strict parsing reliability, the article recommends using JSON mode or tool schemas; results on local models like llama.cpp and Mistral are untested.

Hottest takes

so everyone, that means your agents, skills and mcp servers will still take up everything — yieldcrv
heavily biased towards single-shot explanatory tasks — btown
Isn’t this what Claude’s personalization setting is for? — sillysaurusx
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.