April 12, 2026
Now you see tokens, now you don’t
Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage
Subscribers fume as “Max” plan vanishes in 90 minutes, demand transparency and threaten to bail
TLDR: A user says their “Pro Max 5x” allowance drained in 90 minutes, blaming cached prompts still counting fully and background tasks silently eating usage. Comments erupted with demands for clear accounting, cancellation threats, and switchers touting competitors—making transparency the battleground that could decide where devs spend their money.
The community is in full meltdown mode after a Pro Max 5x user said their “Max” quota vanished in about 90 minutes of what they call moderate use. The claim: cached prompts—meant to be cheaper—are still counting like full price toward limits, plus background tabs and auto clean‑ups are burning tokens while you’re not looking. Translation for non‑devs: tokens are the tiny pieces of text these AIs meter like phone minutes, and people think the meter’s running even when you thought you were on Wi‑Fi.
Commenters lit the beacons. One camp is furious, calling for clear, itemized accounting and joking that the “Max” in Pro Max stands for “maximum disappearing act.” Another camp is voting up issues and linking receipts like this one marked “Closed as not planned”, which only amped the panic. The cancellation threats are loud (“fair means fair”), while the brand‑jumpers flex: one says they’ve switched to “Codex,” another swears by “GPT‑5.4 + Swival,” dubbing it their daily driver.
The hot debate: Is this a bug, unclear policy, or user setup gone wild with background sessions quietly chugging away? Memes call the cache a “token vampire” and the plan a “Max plan speedrun.” But the loudest chorus is simple: show us exactly what’s counted and when—before more users ghost and go elsewhere.
Key Points
- •A user on a Pro Max 5x (Opus) plan saw quota deplete within 1.5 hours after reset despite moderate usage.
- •Token logs from session JSONL files show in Window 2 a total of 691 calls and 103.9M cache_read, 13.1M cache_create, and 387k output tokens across sessions.
- •The analysis suggests cache_read tokens likely count at full rate against rate limits, contrary to an expected 1/10 rate reduction.
- •Background sessions (e.g., compacts, retros, hooks) continued making API calls and consuming the shared quota without active user interaction.
- •Auto-compact events caused large cache_creation spikes near ~966k tokens, and the 1M context window led to high per-call input costs as contexts grew before compaction.