The Lottery Ticket Hypothesis: finding sparse trainable NNs with 90% less params

Tiny networks, big claims — and the comments aren’t buying it

TLDR: A 2018 hypothesis claims tiny subnetworks can be trained to match big models, saving 90% of parameters. Commenters are split: some call it outdated and unproven, others say it’s just about lucky starting values, with transformer fans arguing this whole argument matters less for modern models.

A blast from 2018/19 just resurfaced: the “Lottery Ticket Hypothesis” says big neural nets hide tiny “winning tickets” you can train on their own to get the same accuracy with 90% fewer weights. The paper even claims some pruned models (10–20% the size) train faster and do better on handwritten digits and mini image datasets like MNIST and CIFAR-10. If true, that’s a storage and speed jackpot for AI. Here’s the twist: the community’s vibe is pure chaos. laughingcurve rolls in with “this is still just a hypothesis” energy, citing newer evidence that pokes holes in it. observationist goes full math wizard—“gauge invariant” and fuzzy function talk—turning the thread into a philosophy-of-AI seminar. rob_c says it’s basically a fancy way of admitting deep nets depend heavily on their starting values and suggests transformers (the model behind ChatGPT-style systems) don’t have the same problem. Meanwhile, readers joke about AI “scratch-offs,” fantasizing about an algorithm that finds winning tickets before training, like a cheat code. The funniest moment? A grammar snipe from choult: “Fewer.” The post claims a jackpot; the comments say: show me the receipts. For more, see the original paper here.

Key Points

•Pruning can reduce trained neural network parameters by over 90% without degrading accuracy.
•Sparse architectures from pruning are typically hard to train from scratch, limiting training-time benefits.
•The Lottery Ticket Hypothesis posits dense networks contain trainable “winning ticket” subnetworks that achieve comparable test accuracy when trained alone.
•An algorithm identifies these winning tickets, and experiments validate their effectiveness and the role of favorable initializations.
•Winning tickets often constitute 10–20% of the original network size on MNIST and CIFAR-10 and can learn faster and reach higher accuracy than the dense model above that size.

Hottest takes

“remains just a hypothesis with plenty of evidence against it” — laughingcurve

“Neural networks are effectively gauge invariant” — observationist

“avoid pure DNNs due to their strong reliance on initialization” — rob_c

January 5, 2026

Jackpot or just lucky weights?

Tiny networks, big claims — and the comments aren’t buying it

Key Points

Hottest takes

January 5, 2026

Jackpot or just lucky weights?

The Lottery Ticket Hypothesis: finding sparse trainable NNs with 90% less params

Tiny networks, big claims — and the comments aren’t buying it

Key Points

Hottest takes

Save News