March 13, 2026
Cache me outside
Prompt-caching – auto-injects Anthropic cache breakpoints (90% token savings)
Plugin promises 90% AI bill cuts — devs argue it's already built in
TLDR: A plugin auto-caches repeated prompt pieces to slash AI costs, claiming up to 90% savings. The crowd argues it’s redundant now that Anthropic added auto-caching, while defenders say it’s still useful for your own apps and for tracking what’s cached — making it a savings tool with real visibility.
A new plugin is shouting “90% off your AI bill!” by auto-dropping cache breakpoints — basically sticky notes that tell the chatbot to reuse the same info for five minutes, so repeats cost just 1/10th. The creator flaunts big numbers (up to 92% savings), with modes for bug fixes and refactors, and a one-command install while it awaits marketplace approval. Sounds like coupon day for prompts, right?
Cue the comments: the top chorus is, “Wait, isn’t this already a thing?” One camp points to Anthropic’s new auto-caching — pass ephemeral caching and the system places breakpoints for you — with links to the official docs: automatic caching and Claude Code costs. Others stress the fine print: Claude Code (the IDE) caches its own prompts, but your own apps or scripts using the Anthropic developer toolkit (SDK) don’t — that’s where this plugin lives.
The drama escalates with one hot take calling the plugin “overhead at the wrong layer,” while another user pleads, “Please don’t let bots write the comments.” Meanwhile, practical folks ask if it works with Cowork and other clients; supporters say the real win now is observability: see cache hit rates, savings, and where your prompts actually get cached. So yes, it’s savings… with a side of “didn’t we already have this?” energy.
Key Points
- •Anthropic’s caching API stores stable content for five minutes; cache creation costs 1.25×, and cache reads cost 0.1× of normal.
- •The plugin auto-inserts cache breakpoints for Anthropic SDK calls, with modes for bug fixes, refactors, file tracking, and a turn-count threshold.
- •Benchmarks show large reductions in billed tokens (≈80–92%) across coding scenarios, with break-even at turn two.
- •Installation works in Claude Code via plugin commands and across MCP-compatible clients through a simple MCP config.
- •Anthropic’s automatic caching (cache_control: {"type":"ephemeral"}) handles breakpoint placement, while the plugin adds observability via get_cache_stats and analyze_cacheability.