Chomsky and the Two Cultures of Statistical Learning

Internet brawl: Did Chomsky call it—or did the internet pass him by

TLDR: Chomsky’s decades-old critique of statistical language models is back in the spotlight via Norvig’s rebuttal, and the internet is split. One side says modern AI proves Chomsky wrong; the other says prediction isn’t understanding—causality matters. It’s a battle over what counts as real progress in AI.

Back in 2011 at MIT, Noam Chomsky threw shade at purely statistical language models, basically saying, “Congrats, you copied the vibes, not the meaning.” Today, that old fight just got reheated—hard. Peter Norvig’s classic response, “On Chomsky and the Two Cultures of Statistical Learning”, is making the rounds again, and the comments are pure fireworks.

On one side: the “causality or bust” crowd, insisting that curve‑fitting isn’t science. One top comment gripes that Norvig “confuses the map for the territory,” demanding real explanations over clever predictions. Another asks if this is just a statistics turf war—Bayes vs. frequentist—like it’s Team Red vs. Team Blue for math nerds.

On the other side: the “LLMs made it real” squad, gleefully resurfacing a Chomsky line from 1969 dismissing the “probability of a sentence” as useless. With today’s chatbots and auto‑translate everywhere, they’re calling that take “aged terribly.” Cue the memes: “OK boomer linguistics,” “butterfly collecting vs. bug‑squashing,” and screenshots of chatbots writing passable essays.

There’s also meta‑drama: folks asking “Is this from 2011?” like they stumbled into a time capsule, and one commenter tries to inject unrelated personal scandals—swiftly flagged as off‑topic. Beneath the snark, the fight is real: Is building useful systems enough, or must we explain language like a scientist? Norvig says both. The crowd can’t agree—and that’s why they won’t stop clicking.

Key Points

  • The essay responds to Noam Chomsky’s 2011 critique of statistical methods in linguistics at an MIT symposium.
  • Chomsky argues statistical models define success as approximating unanalyzed data and lack scientific insight.
  • He claims engineering success is irrelevant to science and that language is generated via semantic-to-syntax processes without probability.
  • The essay counters that engineering success can indicate scientific validity and that science depends on both data and theory.
  • It advocates for probabilistic models that integrate words, syntax, semantics, context, and discourse, especially for interpretation tasks like speech recognition.

Hottest takes

“confusing the map … for the territory” — intalentive
“his position has aged terribly” — tripletao
“Is this essay from 2011?” — bo1024
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.