Claude Code's binary reveals silent A/B tests on core features

Paying devs say they’re lab rats; defenders fire back with “read the fine print”

TLDR: A user claims Claude Code ran silent experiments that changed how its planning feature works, triggering outrage from paying customers who felt used as test subjects. The community split fast: pros demand transparency and opt‑outs, while others say evolving AI tools can change under the terms—raising trust stakes for work-critical software.

The internet is in full meltdown mode over a bombshell claim: a paying user says they decompiled the Claude Code app and found secret A/B tests silently changing a core feature called “plan mode.” In plain English: some customers allegedly got a “cap” version that strips context, blocks prose, and slams plans to 40 lines—no conversation, just a wall of bullets. Cue rage. One camp is fuming: “This isn’t Instagram—I use this to work,” as posts demand transparency, opt-outs, and “no mystery switches” in a $200/month tool.

But the backlash has a backlash. A loud chorus is waving the terms of service, with one commenter snapping, “Did you assume a $200 subscription meant a frozen product?” Meanwhile, the cynics are having a field day: “Professional tool? Large language models aren’t even consistent.” Others are memeing the codename—“tengu_pewter_ledger” sounds like a Dark Souls boss—and joking that “cap” is fitting because it feels like, well, “cap.” Razengan drops the “I knew it” receipt, and the thread turns into a tug-of-war over trust versus iteration.

Amid the jokes, a practical question cuts through: is the test tied to the app install or the user account? Either way, the community’s headline is clear: don’t A/B test our workflow in silence—tell us and let us toggle it.

Key Points

  • A user decompiled the Claude Code binary and found a GrowthBook-managed A/B test called “tengu_pewter_ledger.”
  • The test controls how plan mode writes its final plan, with four variants: null, trim, cut, and cap.
  • The “cap” variant limits plans to 40 lines, forbids context and prose, and prioritizes keeping file paths over narrative text.
  • The user reports being assigned the “cap” variant, resulting in an auto-generated, terse plan with no Q/A phase or steering.
  • Telemetry logs at “tengu_plan_exit” record the assigned variant and metrics (e.g., plan length, outcome), enabling performance correlation.

Hottest takes

“I knew it” — Razengan
“LLMs offer none of this, and A/B testing is just further proof” — reconnecting
“Did you assume a $200 subscription meant a frozen product?” — handfuloflight
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.