Comparing Fable and 10 other LLMs on refactoring a LangGraph god node

AI Refactor Face-Off Turns Into a Fable Backlash and Comment-Section Food Fight

TLDR: A developer tested 11 AI models on how to untangle a giant, messy app brain and published the full results. But readers zeroed in on a different drama: whether overhyped Fable is actually flaky, over-restricted, or already losing ground to newer rivals.

A programmer ran a nerdy but revealing showdown: 11 artificial intelligence models were asked to clean up a bloated chunk of app logic that had turned into a so-called “god node” — basically one giant control room doing way too many jobs at once. The experiment compared American and Chinese models, had them review each other’s plans, and even ranked who seemed most trustworthy. But while the write-up is all about careful testing, the comment section immediately made it personal.

The loudest reaction? Fable was supposed to be the chosen one, and people are clearly not feeling the magic. One commenter flat-out called it “unusable” because it allegedly complained about policy issues over a simple interface change. Another described a messy experience where the model appeared available, then disabled, then required a fresh login, and then reportedly refused to do a server security audit after barely getting started. Ouch. And then came the disbelief: if gpt5.5 beat Fable, does that mean the hype train has hit a wall? One user basically asked whether Fable secretly hands work off to another model behind the curtain.

There was also side-quest chaos: one person said the author’s site was blocked by the UK’s national cyber security system, which is the kind of detail that makes any discussion instantly feel ten times more scandalous. And naturally, another commenter demanded a rerun with the newest models, because in AI land, last week’s rankings are already ancient history. The vibe is equal parts lab test, fandom civil war, and “please update the leaderboard before I form an opinion.”

Key Points

  • The article documents an experiment comparing 11 LLMs on how to refactor a complex LangGraph "god node" from a real agent.
  • The experiment workflow included proposal generation, peer evaluation across models, and three different methods for deciding which model judgments to trust.
  • The central `plan` node in the LangGraph contained about 350 lines of logic and was described as an anti-pattern because it hid orchestration inside a single node.
  • The agent’s purpose was to collect parameters for downstream calculations using a mix of web search and user questions, with behavior that varied by conversation context.
  • The article lists multiple responsibilities embedded in the `plan` node, including iteration control, bootstrap questions, decomposition, recipe assembly, schema preparation, calculator limit checks, and blocked-calculation recovery.

Hottest takes

"This model is unusable" — holoduke
"gpt5.5 is better than fable. I thought fable was the endtimes?!" — andersmurphy
"This site may be associated with malicious activity or malware" — azalemeth
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.