Stop Burning Your Context Window – How We Cut MCP Output by 98% in Claude Code - Weaving News

Dev crowd cheers token diet while skeptics ask what gets tossed

TLDR: Context Mode cuts tool output in Claude Code by about 98%, keeping chats lean and sessions longer. The crowd cheers the savings, while skeptics worry about losing useful bits and others push “backtracking” to prune mistakes—sparking a lively debate over how to keep AI memory clean without throwing away the good stuff.

The dev world is buzzing over Context Mode, a new “context diet” for Claude Code that claims a 98% cut in bloated tool output. In simple terms: AI tools usually dump heaps of raw data into your chat’s memory (“context window”), but this routes those noisy files through a sandbox so only the useful text gets saved. Result: 315 KB → 5.4 KB, and sessions stretch from ~30 minutes to ~3 hours. Cue cheering, side-eye, and memes.

On HN, fans shout “finally!” while jamiecode calls the 98% cut “the real story,” warning the bigger issue is how messy multi-step workflows keep hauling old baggage forward. mvkel asks if this is basically “pre-compaction,” nervously wondering if it might toss something important—like a tiny utility function—before the AI knows it needs it. nr378 fires a hot take: just prune failed attempts after the model gets it right—classic “cleanup your room” energy.

Meanwhile, happy users like agrippanux say it’s already slashing token bills. Jokes roll in: “token keto,” “stop feeding the AI carbs,” and “Marie Kondo your context.” Even non-nerds get it: less junk in the chat memory means the AI stays sharp longer. And yes, JS folks are grinning about Bun auto-speeding their scripts. Drama? Plenty. Savings? Real.

Key Points

•Context Mode is an MCP server for Claude Code that routes tool executions through isolated subprocesses, allowing only stdout into the conversation to prevent raw data bloat.
•Its knowledge base indexes markdown with SQLite FTS5, uses BM25 ranking and Porter stemming, and returns exact code blocks and headings; URLs can be fetched and indexed without dumping full pages into context.
•Across 11 real-world scenarios, tool outputs were reduced by up to 98% (e.g., Playwright 56 KB → 299 B; 20 GitHub issues 59 KB → 1.1 KB; access log 45 KB → 155 B).
•Session performance improved: 315 KB of raw output becomes 5.4 KB, extending session time from ~30 minutes to ~3 hours and preserving ~99% of context after 45 minutes.
•Setup is via Claude’s Plugin Marketplace or MCP-only npx; a PreToolUse hook auto-routes outputs through the sandbox, and subagents use batch_execute to minimize context usage; authenticated CLIs work via credential passthrough.

Hottest takes

"The 98% reduction is the real story here" — jamiecode

"Is this not in effect a kind of 'pre-compaction,' deciding ahead of time what's relevant?" — mvkel

"Backtracking strikes me as another promising direction to avoid context bloat and compaction" — nr378

February 28, 2026

Slim tokens, thick drama

Stop Burning Your Context Window – How We Cut MCP Output by 98% in Claude Code

Dev crowd cheers token diet while skeptics ask what gets tossed

Key Points

Hottest takes

February 28, 2026

Slim tokens, thick drama

Stop Burning Your Context Window – How We Cut MCP Output by 98% in Claude Code

Dev crowd cheers token diet while skeptics ask what gets tossed

Key Points

Hottest takes

Save News