The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

AI flunks the obvious, and the comments section is absolutely not having it

TLDR: Researchers found that chatbots can learn a fact one way and still fail when the same fact is asked in reverse, which could make them miss surprisingly basic answers. Commenters instantly turned it into a grammar cage match, arguing the paper confuses broken AI with the messy way humans use “is.”

A new paper dropped a deceptively simple bombshell: teach a chatbot a fact like “Valentina Tereshkova was the first woman in space,” and it still may totally blank on “Who was the first woman in space?” The researchers call it the Reversal Curse, and the results are eyebrow-raising: even top systems can know one direction of a fact but fail when you flip the question around. In one example, GPT-4 got the direct version right far more often than the reverse, which is the kind of stat that makes people stare at their screens and whisper, “Wait… seriously?”

But the real fireworks were in the comments, where the crowd split into two camps: “Wow, that’s a huge problem” versus “Hold on, that sentence doesn’t even logically reverse.” Several commenters basically put the paper on trial, arguing that “A is B” does not automatically mean “B is A” in everyday language. Cue the parade of gotchas: “A square is a rectangle” does not mean “a rectangle is a square,” and one commenter dryly noted that “Socrates is alive” definitely does not imply “alive is Socrates.” Ouch.

There was also some delicious snark. One user answered the big question about whether the field has moved on with a devastatingly short “(2023)” — a tiny timestamp that landed like a full eye-roll. Others said the so-called failure may just show that language is messy, and that models need clearer context, not a public shaming. So yes, the paper says AI can forget the obvious — but the comments say humans are still undefeated at arguing about grammar on the internet.

Key Points

  • The article identifies a behavior called the Reversal Curse, where autoregressive LLMs trained on “A is B” often fail to answer the reverse query “Who is B?”.
  • In the Valentina Tereshkova example, the article says the correct reverse answer may receive no higher likelihood than a random name.
  • The article reports that when the original fact appears in context, models can infer the reverse relationship more successfully.
  • Researchers finetuned GPT-3 and Llama-1 on fictitious facts and found they failed on reversed questions, with the effect persisting across model sizes and families.
  • In reported real-world tests, GPT-4 answered forward celebrity questions correctly 79% of the time versus 33% for reversed versions.

Hottest takes

“A is B,” in natural language, does not imply “B is A” — gipp
“A square is a rectangle” does not entail “a rectangle is a square” — zmgsabst
“(2023)” — turzmo
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.