October 28, 2025

Spread thin or saving your bread?

Show HN: Butter – A Behavior Cache for LLMs

Butter says it’ll slash AI bills — commenters ask “middleman tax or magic”

TLDR: Butter caches repeat AI answers to cut costs and keep replies consistent, acting as a drop‑in endpoint you can try now. Commenters are split: some love the savings, others see a middleman, question legality and pricing longevity, and ask how it works with local models.

Butter just slid onto the scene promising to “cache” your AI’s repeat answers so you spend less and get more consistent replies. Translation: it remembers common responses and serves them back, fast and cheap. It’s live — you can try it out here — and it even drops in as a Chat Completions endpoint, so your current tools don’t freak out. The price? 5% of whatever it saves you (it’s free for now). But the community’s kitchen got hot fast.

Skeptics smelled a “middleman tax,” with one top comment asking if we’re now paying Butter instead of OpenAI. DIY tinkerers flexed that they already built this at home — a simple “if we’ve seen it, reuse it” file of answers — calling Butter a shiny wrapper on a common trick. Then came the drama: is this even allowed? One commenter asked point‑blank if it’s legal, hinting at terms-of-service gray zones. Meanwhile, startup watchers liked the cut of the pricing model but guessed it won’t stick once the growth phase ends. And the practical crowd wanted receipts on local models and what it’ll cost when you’re not paying OpenAI at all.

In short: half the thread is cheering a smart coupon-clipper for AI; the other half is asking if we’re just buttering our bills with a new brand of spread. Either way, the debate is sizzling.

Key Points

  • Butter is a behavior cache for LLMs that identifies patterns in responses and serves cached outputs to reduce token costs.
  • The system is deterministic, enabling consistent repetition of past behaviors.
  • Butter exposes a Chat Completions–compatible API endpoint and shows how to route OpenAI client requests through it.
  • It integrates with multiple AI tools and frameworks, including LangChain, Mastra, Crew AI, Pydantic AI, AI Suite, Helicone, LiteLLM, Martian, Browser Use, and DSPy.
  • Pricing is set at 5% of token cost savings, and the service is currently free.

Hottest takes

“So instead of OpenAI I should pay butter?” — robofanatic
“Interesting... is it legal?” — ronbenton
“I like the pricing model but I’m skeptical it will last.” — puppycodes
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.