LLMs can't justify their answers–this CLI forces them to

A new command‑line tool called wheat just tried to give AI the one thing it’s notoriously bad at: receipts. The team asked a simple question—should they switch their app’s plumbing from REST (the old, URL-based way) to GraphQL (a newer, pick‑exactly‑what‑you‑need menu). Instead of vibes, wheat read their code, searched the web, tagged every claim by evidence strength, built a quick prototype, and spit out a decision doc. The verdict? Try GraphQL only for new stuff, keep the rest for now. The promised 40–60% speed savings shrank to 15–25% once the tool checked real data, and caching remains a headache.

But the code wasn’t the headline—the comments were. One user summed up the mood with a meme‑ready mic drop: “Evals or GTFO.” Another sighed they can’t even keep up with the flood of AI tools anymore. That split defined the thread: half the crowd cheering “finally, proof over blog posts,” the other half groaning “another AI wrapper to audit?” A few jokers called wheat the “PM that actually tests things,” while the skeptics clapped back that if every decision needs a prototype, teams will drown in homework.

Still, even cynics admitted this tool did something rare: it caught hype and downgraded it, on the record. Whether that’s the future of engineering—or just today’s meme—depends on how many more of these tools you can keep up with.

April 5, 2026

Bring receipts, bot

Dev crowd chants “Evals or GTFO” as the bot brings receipts

Key Points

Hottest takes

April 5, 2026

Bring receipts, bot

LLMs can't justify their answers–this CLI forces them to

Dev crowd chants “Evals or GTFO” as the bot brings receipts

Key Points

Hottest takes

Save News