Six (and a half) intuitions for KL divergence

Callum McDougall’s “Six (and a half) intuitions for KL divergence” just turned a mouthful of math into popcorn reading, and the comments are the show. The post frames KL divergence—think “how wrong your mental model is” measured in surprise—using friendly angles: expected surprise, evidence in testing, maximum-likelihood (how you fit a model), wasted bits in compression, casino/lottery analogies, and a geometry-ish take. It’s all about why this “distance” isn’t symmetrical, and why that’s okay.

The crowd split fast. Fans like ttul applauded the multi-angle tour, while newcomers like RickHull begged for a softer on-ramp. One commenter, abetusk, dropped an ISP-and-compression story to make it concrete—cue nods from readers who like their math with a plot twist. Meanwhile, practitioner dist-epoch brought the “why should I care?” heat: this is how people compare small-bit and big-bit versions of AI models (yes, the stuff behind your chatbot), linking KL directly to 4-bit vs 8-bit model quality. That woke up the “real world” crowd.

The most “mind blown” reaction came from notrealyme123, who finally saw how maximum-likelihood is secretly just minimizing KL. Jokes flew about the mysterious “half intuition” and readers claiming they only read the summary—fitting, since the author admits the summary is over 50% of the value. High-brow math, low-key drama, and a dash of casino fantasy—what’s not to love? Check the [LessWrong post](Link to LessWrong post here.) for the full buffet.

April 8, 2026

KL? More like LOL divergence

Surprise math, decoded: readers cheer, newbies want training wheels, pros point to AI

Key Points

Hottest takes

April 8, 2026

KL? More like LOL divergence

Six (and a half) intuitions for KL divergence

Surprise math, decoded: readers cheer, newbies want training wheels, pros point to AI

Key Points

Hottest takes

Save News