Stop Burning Your Context Window – How We Cut MCP Output by 98% in Claude Code

Dev crowd cheers token diet while skeptics ask what gets tossed

TLDR: Context Mode cuts tool output in Claude Code by about 98%, keeping chats lean and sessions longer. The crowd cheers the savings, while skeptics worry about losing useful bits and others push “backtracking” to prune mistakes—sparking a lively debate over how to keep AI memory clean without throwing away the good stuff.

The dev world is buzzing over Context Mode, a new “context diet” for Claude Code that claims a 98% cut in bloated tool output. In simple terms: AI tools usually dump heaps of raw data into your chat’s memory (“context window”), but this routes those noisy files through a sandbox so only the useful text gets saved. Result: 315 KB → 5.4 KB, and sessions stretch from ~30 minutes to ~3 hours. Cue cheering, side-eye, and memes.

On HN, fans shout “finally!” while jamiecode calls the 98% cut “the real story,” warning the bigger issue is how messy multi-step workflows keep hauling old baggage forward. mvkel asks if this is basically “pre-compaction,” nervously wondering if it might toss something important—like a tiny utility function—before the AI knows it needs it. nr378 fires a hot take: just prune failed attempts after the model gets it right—classic “cleanup your room” energy.

Meanwhile, happy users like agrippanux say it’s already slashing token bills. Jokes roll in: “token keto,” “stop feeding the AI carbs,” and “Marie Kondo your context.” Even non-nerds get it: less junk in the chat memory means the AI stays sharp longer. And yes, JS folks are grinning about Bun auto-speeding their scripts. Drama? Plenty. Savings? Real.

Key Points

  • Context Mode is an MCP server for Claude Code that routes tool executions through isolated subprocesses, allowing only stdout into the conversation to prevent raw data bloat.
  • Its knowledge base indexes markdown with SQLite FTS5, uses BM25 ranking and Porter stemming, and returns exact code blocks and headings; URLs can be fetched and indexed without dumping full pages into context.
  • Across 11 real-world scenarios, tool outputs were reduced by up to 98% (e.g., Playwright 56 KB → 299 B; 20 GitHub issues 59 KB → 1.1 KB; access log 45 KB → 155 B).
  • Session performance improved: 315 KB of raw output becomes 5.4 KB, extending session time from ~30 minutes to ~3 hours and preserving ~99% of context after 45 minutes.
  • Setup is via Claude’s Plugin Marketplace or MCP-only npx; a PreToolUse hook auto-routes outputs through the sandbox, and subagents use batch_execute to minimize context usage; authenticated CLIs work via credential passthrough.

Hottest takes

"The 98% reduction is the real story here" — jamiecode
"Is this not in effect a kind of 'pre-compaction,' deciding ahead of time what's relevant?" — mvkel
"Backtracking strikes me as another promising direction to avoid context bloat and compaction" — nr378
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.