January 26, 2026

API bills go brrr—then vanish

Tell HN: I cut Claude API costs from $70/month to pennies

Solo dev cuts AI costs to pennies; crowd screams 'batch it, go local'

TLDR: A solo builder slashed AI costs from about $70/month to pennies by switching models, batching tasks, and trimming junk. Comments split between “run it locally,” use cheaper providers, and enable caching—proof that small, practical tweaks can save real money without hurting results.

A scrappy builder ran the numbers on their AI bill and saw a gut-punch: around $70 a month just to summarize community chatter. After ditching the pricier model for a cheaper one that actually did better, batching work instead of pinging the robot every hour, and filtering out "lol"s before the AI ever saw them, the bill dropped to pennies a day. Cue the comments section turning into a budgeting boot camp. One camp shouted: run it local—meaning, use a free model on your own machine if you don’t need instant replies. Another camp warned: enable prompt caching (saving repeated instructions) because “the provider doesn’t do it for you.” And the bargain hunters yelled: try other providers like z.ai for even cheaper rates.

Then came the drama: “Show us the architecture,” demanded the skeptics, hungry for a play-by-play. The pragmatists cheered the small tweaks—shorter outputs, stripping code—while the meme squad dropped Mandalorian energy with “This is the way.” A helpful commenter even linked a toolbox of checks for keeping your AI honest: llm-sanity-checks. Translation for the non-tech crowd: the community just turned this cost crisis into a group project, mixing frugality, DIY pride, and a dash of snark to prove you can have smart AI without emptying your wallet.

Key Points

  • Initial cost check showed $2.30/day (~$70/month, $840/year) for one instance using the Claude API.
  • A bug contributed to costs, but most spending came from frequent and unoptimized API requests.
  • Switching from Claude Sonnet to Claude Haiku improved results on the same data at about one-third the cost.
  • Batching requests, pre-filtering trivial messages, shortening outputs, and removing code snippets reduced token usage.
  • After changes, costs dropped to pennies per day, enabling 3x pricing tier limits and added intermittent quality checks.

Hottest takes

"Have you considered running a local llm? May not need to pay for api calls" — LTL_FTC
"Are you also adding the proper prompt cache control attributes?" — dezgeg
"This is the way" — 44za12
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.