February 28, 2026
Slim tokens, thick drama
Stop Burning Your Context Window – How We Cut MCP Output by 98% in Claude Code
Dev crowd cheers token diet while skeptics ask what gets tossed
TLDR: Context Mode cuts tool output in Claude Code by about 98%, keeping chats lean and sessions longer. The crowd cheers the savings, while skeptics worry about losing useful bits and others push “backtracking” to prune mistakes—sparking a lively debate over how to keep AI memory clean without throwing away the good stuff.
The dev world is buzzing over Context Mode, a new “context diet” for Claude Code that claims a 98% cut in bloated tool output. In simple terms: AI tools usually dump heaps of raw data into your chat’s memory (“context window”), but this routes those noisy files through a sandbox so only the useful text gets saved. Result: 315 KB → 5.4 KB, and sessions stretch from ~30 minutes to ~3 hours. Cue cheering, side-eye, and memes.
On HN, fans shout “finally!” while jamiecode calls the 98% cut “the real story,” warning the bigger issue is how messy multi-step workflows keep hauling old baggage forward. mvkel asks if this is basically “pre-compaction,” nervously wondering if it might toss something important—like a tiny utility function—before the AI knows it needs it. nr378 fires a hot take: just prune failed attempts after the model gets it right—classic “cleanup your room” energy.
Meanwhile, happy users like agrippanux say it’s already slashing token bills. Jokes roll in: “token keto,” “stop feeding the AI carbs,” and “Marie Kondo your context.” Even non-nerds get it: less junk in the chat memory means the AI stays sharp longer. And yes, JS folks are grinning about Bun auto-speeding their scripts. Drama? Plenty. Savings? Real.
Key Points
- •Context Mode is an MCP server for Claude Code that routes tool executions through isolated subprocesses, allowing only stdout into the conversation to prevent raw data bloat.
- •Its knowledge base indexes markdown with SQLite FTS5, uses BM25 ranking and Porter stemming, and returns exact code blocks and headings; URLs can be fetched and indexed without dumping full pages into context.
- •Across 11 real-world scenarios, tool outputs were reduced by up to 98% (e.g., Playwright 56 KB → 299 B; 20 GitHub issues 59 KB → 1.1 KB; access log 45 KB → 155 B).
- •Session performance improved: 315 KB of raw output becomes 5.4 KB, extending session time from ~30 minutes to ~3 hours and preserving ~99% of context after 45 minutes.
- •Setup is via Claude’s Plugin Marketplace or MCP-only npx; a PreToolUse hook auto-routes outputs through the sandbox, and subagents use batch_execute to minimize context usage; authenticated CLIs work via credential passthrough.