May 6, 2026

AI theory or just theory-core?

A Theory of Deep Learning

Big brain claim sparks eye-rolls, praise, and a surprise font side quest

TLDR: A researcher says he may finally explain why giant AI systems work so well, even when older ideas say they should fail. Commenters were intrigued but mostly said the title oversold it, with the thread hilariously drifting into complaints about copy-paste and praise for the website’s font.

A bold blog post called “A Theory of Deep Learning” rolled into the internet with serious main-character energy: here, at last, might be an answer to the giant question of why today’s artificial intelligence systems work so well even when older textbook ideas say they really shouldn’t. The author reaches for Borges, memory, forgetting, and the weird fact that huge AI models can soak up mountains of data, fit it perfectly, and still do well on new tasks. In plain English: these systems seem too big and too messy to behave, yet somehow they often do.

But the comments? Instant reality check. The loudest reaction was basically: love the writing, not buying the title. One reader called it a beautiful way of saying some memorized stuff matters and some doesn’t, then delivered the killer line that this is not “the grand unified theory” because a real theory should actually explain that, not just rename it. Another commenter said the whole thing feels more like we’re still in the era of collecting observations than making true predictions. Translation: nice framework, but don’t pop the champagne yet.

And because the internet refuses to stay on one topic, the thread also swerved gloriously into side quests. One person got distracted by the site’s elegant font. Another was deeply, personally offended that the sidenotes couldn’t be copied and pasted. So yes, a sweeping attempt to explain modern AI sparked skepticism, curiosity, and a mini design-review riot. Classic comment-section behavior.

Key Points

  • The article argues that deep learning works in practice without a unified explanatory theory, comparing the field to chemistry before Lavoisier.
  • It describes deep learning theory as fragmented across approaches such as uniform convergence, NTK, PAC-Bayes, stability, optimization, and mean-field methods.
  • The article says classical statistical learning theory predicts overfitting for highly expressive, overparameterized neural networks that can perfectly fit training data.
  • It cites prior results showing neural networks can memorize random labels, which the article presents as evidence that traditional capacity-based explanations are inadequate.
  • The article highlights benign overfitting, double descent, and gradient descent’s selection of generalizing interpolating solutions as key phenomena requiring explanation.

Hottest takes

“that’s not a theory of deep learning, the grand unified theory would explain that” — refulgentis
“the post title might be a bit of an overreach” — prideout
“Why is user select turned off on the sidenotes?” — airza
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.
A Theory of Deep Learning - Weaving News | Weaving News