June 20, 2026
From clean brains to AI spaghetti
LLMs Are Complicated Now
AI was supposed to get smarter, not turn into a wiring nightmare everyone argues about
TLDR: Modern AI chat models are getting far more complicated because speed, cost, and scale now matter as much as the core idea. Commenters split between nodding along that this is the natural next phase and complaining the article stacked the comparison to make the chaos look bigger.
The big mood in the comments? "Congrats, we rebuilt the mess". The article says today’s large language models — the tech behind chatbots and AI assistants — have gone from neat, stackable designs to sprawling, hard-to-manage systems packed with special tricks, shortcuts, and performance hacks. In plain English: what looked elegant a couple of years ago now looks a lot more like a machine held together with genius, duct tape, and very expensive electricity. The author argues this happened for the same reason it happened in recommendation engines: once speed and cost become life-or-death issues, the clean version of the idea stops being enough.
But readers immediately turned the spotlight onto the comparison itself. One of the spiciest complaints was that comparing Llama 3 to Nemotron 3 Ultra felt like picking two different families and then acting shocked they look different. That got the classic comment-section side-eye: is this a deep industry truth, or just a dramatic setup?
Then came the bigger, more philosophical take: this is just the "bitter lesson" all over again. Early on, everyone wins by scaling up the obvious stuff. Later, the easy gains dry up, and suddenly every extra bit of progress demands painful engineering wizardry. That idea landed because it explains the whole vibe: AI didn’t become messy by accident — it became messy because the cheap, easy era is over. Even the joke name "Claude Telenovela" sounded like the commenters knew this whole saga has officially become a soap opera.
Key Points
- •The article says early LLM systems such as the work leading to Llama were architecturally cleaner than recommendation systems, but modern LLMs have become much more complex.
- •It identifies multiple sources of LLM complexity, including diverse attention variants, Mixture-of-Experts routing, multimodal encoder integration, and inference across multiple GPUs.
- •The article compares this trend to recommendation systems, where relatively simple core architectures became operationally complex because of capability and efficiency pressures.
- •It argues that once performance optimizations become necessary, evaluating new architectural ideas requires partially optimized implementations, not just clean baseline definitions.
- •The article presents FlexAttention in PyTorch, built with Triton templates, as an example of designing for composability and verification to support faster model experimentation.