Transformers Are Bayesian Networks

Paper says Transformers = Bayes; fans hype, skeptics cry “fake cite”

TLDR: A paper claims modern AI transformers are doing classic Bayesian reasoning under the hood, even tying “hallucinations” to missing concepts. The community split fast: some cheered the unifying theory, others questioned real-world impact—and one user sparked drama by alleging a fake citation, igniting a credibility brawl.

A new paper boldly declares: transformers—the tech behind today’s chatty AI—are basically Bayesian networks, a classic way of doing probability math. The authors say each layer is a “message-passing” step, attention acts like AND, the feed‑forward part is OR, and yes, they claim formal proofs and experiments to back it. They even argue AI “hallucinations” won’t vanish with bigger models because the systems lack explicit, grounded concepts.

Cue the internet fireworks. On Hacker News, the “Bayes is back” crowd is thrilled, cracking memes about LLMs being “spicy Clippy with priors.” Others poked fun at the irony that old-school Naive Bayes assumes independence between features—“so… we’re doing Bayes on non‑independent stuff now?” Meanwhile, the skeptics rolled in hard: one user asked why calling it Bayesian explains anything when Bayesian networks “never matched transformer performance” before. The biggest drama? A commenter claimed they reported the paper for citing a non-existent ICML reference, sparking a mini witch-hunt over fake citations.

In the middle, some took the “hallucinations are structural” line seriously, asking for hybrid systems that mix learned patterns with explicit knowledge. Verdict: half the thread is philosophy class, half is detective squad, and everyone’s arguing whether this is a grand unifying theory—or just retro math cosplay.

Key Points

•The article claims every sigmoid transformer implements weighted loopy belief propagation on an implicit factor graph, with each layer equal to one BP iteration.
•It provides a constructive proof that transformers can perform exact belief propagation on any declared knowledge base; acyclic KBs yield provably correct node probabilities.
•A uniqueness theorem asserts that any sigmoid transformer producing exact posteriors must have BP-equivalent weights.
•The transformer layer is mapped to a Boolean AND/OR structure: attention as AND, FFN as OR, aligning with Pearl’s gather/update algorithm.
•Experiments are reported to corroborate the formal results and to show practical viability of loopy BP; the article argues verifiable inference requires a finite concept space and that hallucination stems from lack of grounding.

Hottest takes

"Bayesian networks existed long before transformers and never achieved their performance" — getnormality

"ended up reporting it for citing fake sources" — warypet

"Ironic then, because if transformers are Bayesian networks then we're using Bayesian networks for non-independent ..." — westurner

March 24, 2026

Bayes vs. Bots: Internet Cage Match

Paper says Transformers = Bayes; fans hype, skeptics cry “fake cite”

Key Points

Hottest takes

March 24, 2026

Bayes vs. Bots: Internet Cage Match

Transformers Are Bayesian Networks

Paper says Transformers = Bayes; fans hype, skeptics cry “fake cite”

Key Points

Hottest takes

Save News