April 5, 2026
Bring receipts, bot
LLMs can't justify their answers–this CLI forces them to
Dev crowd chants “Evals or GTFO” as the bot brings receipts
TLDR: A new tool makes chatbots justify decisions with real tests, leading to a cautious “new endpoints only” move to GraphQL and smaller‑than‑hyped gains. The comments lit up with tool fatigue and a rallying cry—“Evals or GTFO”—as devs split between loving the accountability and dreading yet another AI gizmo to review.
A new command‑line tool called wheat just tried to give AI the one thing it’s notoriously bad at: receipts. The team asked a simple question—should they switch their app’s plumbing from REST (the old, URL-based way) to GraphQL (a newer, pick‑exactly‑what‑you‑need menu). Instead of vibes, wheat read their code, searched the web, tagged every claim by evidence strength, built a quick prototype, and spit out a decision doc. The verdict? Try GraphQL only for new stuff, keep the rest for now. The promised 40–60% speed savings shrank to 15–25% once the tool checked real data, and caching remains a headache.
But the code wasn’t the headline—the comments were. One user summed up the mood with a meme‑ready mic drop: “Evals or GTFO.” Another sighed they can’t even keep up with the flood of AI tools anymore. That split defined the thread: half the crowd cheering “finally, proof over blog posts,” the other half groaning “another AI wrapper to audit?” A few jokers called wheat the “PM that actually tests things,” while the skeptics clapped back that if every decision needs a prototype, teams will drown in homework.
Still, even cynics admitted this tool did something rare: it caught hype and downgraded it, on the record. Whether that’s the future of engineering—or just today’s meme—depends on how many more of these tools you can keep up with.
Key Points
- •“Wheat” structures decisions by collecting claims, grading evidence, and building tested prototypes.
- •Initial findings cited large GraphQL payload reductions, but adversarial review revised benefits to ~15–25% due to existing REST field filtering.
- •Prototype tests showed DataLoader resolves GraphQL N+1 issues and p95 latency is within 12% of REST.
- •REST’s CDN/HTTP caching does not directly translate to GraphQL, requiring custom solutions.
- •Final recommendation: use GraphQL for new endpoints, migrate existing REST endpoints opportunistically, and address caching before broader adoption.