June 25, 2026
No ticket? No problem
Puzzling Success of Overparameterization: Lottery Tickets or Escape Dimensions?
Big AI Myth Gets Roasted as commenters argue the old story was too neat to be true
TLDR: The paper says big AI models probably don’t work because they hide a lucky “winning” mini-model; they work because having more size gives training more ways to avoid bad outcomes. Commenters turned that into a mini-brawl, with some declaring the old idea basically dead and others shrugging that the real unanswered question lies elsewhere.
A cozy old explanation for why giant AI models work is getting dragged by its own fanbase. The paper says the popular “lottery ticket” story — the idea that huge models win because they secretly contain one lucky little mini-model — is a cute teaching tool, but not the real reason. Instead, the authors argue that bigger models succeed because they have more room to move during training, making it easier to avoid getting stuck in bad solutions. In plain English: it may be less about finding a magic winner and more about having a bigger dance floor.
And the comments? Absolutely not calm. One of the strongest reactions came from users saying this isn’t even shocking anymore, with one commenter casually dropping that the original lottery-ticket champion basically walked it back — a spicy little academic betrayal if true. Another commenter went full math-warrior, arguing the real story is about the model getting trapped only when every direction is bad, which becomes less likely as models get bigger. Then came the classic internet drive-by: “Isn’t this trivial?” Ouch. That same commenter immediately swerved to what they think is the real mystery: why AI performance sometimes gets worse before it gets better again as models grow.
So yes, the research is about geometry and training. But the real show is the community mood: one camp says the old analogy is misleading, another says this update is obvious, and everyone seems delighted to kick over a beloved AI metaphor.
Key Points
- •The article challenges the common lottery-ticket analogy used to explain the success of overparameterized neural networks.
- •It says the lottery-ticket view is misleading because it treats subnetworks as if they can learn independently of the rest of the network.
- •The article argues that this leads to an incorrect interpretation of learning in wide networks as multi-start optimization over many subnetworks.
- •As evidence against the isolated-subnetwork view, it states that winning tickets can fail when the rest of the network is perturbed.
- •It proposes that wider networks succeed because extra optimization dimensions help training escape bad local minima and because bad minima become rarer as width increases.