Attention Is Bayesian Inference

Internet splits: genius math or post-hoc hype? Professors, skeptics, and meme lords pile on

TLDR: Vishal Misra claims attention in AI models naturally performs Bayesian inference and says Transformers are “Bayesian by geometry.” The community split hard: some praised the math, others called it post‑hoc hype or even AI “slop,” debating missing sampling, real-world value, and whether this will curb hallucinations.

AI professor and cricket bot builder Vishal Misra just dropped a spicy claim on [Medium]: attention—the core trick in modern AI—acts like Bayesian inference, meaning “updating beliefs when new facts arrive.” He says Transformers are “Bayesian by geometry,” and points to his ESPN AskCricinfo RAG setup (fetch facts first, then write) as the spark.

The comments? Chaos. Supporters cheered the math vibes: as CuriouslyC put it, the “posterior matching” idea is a big deal, but they bristled at alleged hand‑waving on bigger models. The skeptic camp went nuclear: danielscrubs slammed the piece as AI “slop” and roasted any hint of ChatGPT-assisted writing. Others rolled their eyes at grand theories after the fact—behnamoh called it “post hoc” justification, not revelation. kianN’s purists argued the “X is Bayesian” meme ignores the hard part: sampling and “detailed balance,” the rules that keep probabilities honest.

Practical folks popped in asking if simple Bayesian output layers are still king, or if this trilogy actually helps stop AI from making stuff up. Meanwhile, meme lords declared “Bayes or B.S.?” and joked Misra went from wickets to wind tunnels. Whether you’re team math or team meh, the thread was a masterclass in nerd drama, spicy one‑liners, and chaos.

Key Points

•Misra built a natural language interface for ESPNcricinfo’s StatsGuru and found GPT-3 unreliable for precise statistical answers.
•He repurposed the LLM to translate queries into SQL, retrieve exact data, and generate responses, anticipating RAG in Fall 2020.
•AskCricinfo achieved a three orders of magnitude increase in usage compared to StatsGuru.
•Misra, Sid Dalal, and Naman Agarwal produced a trilogy of papers claiming transformer attention inherently performs Bayesian inference.
•The article cites interpretability challenges due to lack of ground truth and describes attention’s geometric structures enabling Bayesian updating.

Hottest takes

"Pretty interesting. The posterior matching is a big deal" — CuriouslyC

"having a CS professor at Colombia putting their name to AI “slop” is a bit unnerving" — danielscrubs

"this stuff is only obvious post hoc" — behnamoh

January 4, 2026

Bayes or B.S.?

Internet splits: genius math or post-hoc hype? Professors, skeptics, and meme lords pile on

Key Points

Hottest takes

Save News