Why do AI models use so many em-dashes?

Dash drama: readers rage, writers mourn, Wikipedia blamed

TLDR: Nobody knows why AI loves em-dashes; theories about training data, efficiency, and African English get poked full of holes. Commenters split between mourning a favorite punctuation and cheering it as an AI tell, blaming literature, Wikipedia style cops, and prestige media — and changing their writing because of it.

AI keeps dropping em-dashes like confetti, and the community is in full-on punctuation war. In one corner, the dash romantics—like iansteyn—mourn the backlash, calling it “a real pity” as they retire beloved shortcuts. In the other, gleeful spotters treat the em-dash as a neon “bot was here” sign, with throwaway81523 roasting Wikipedia’s “MOStafarians” for birthing an AI tell. Memes ensued: “hyphen detox,” “en–dash vs em–dash cage match,” and plenty of side-eye for AI “slop.”

So why the dash deluge? Theories flew. Some blame training on highbrow books—Fricken notes literature leans dashy—plus prestige mags like The New Yorker and The Atlantic, which lordnacho says set the vibe. Others point to multilingual habits; Etheryte sees em-dashes in European papers. The article pokes holes: token efficiency isn’t the culprit, and prompts still can’t stop the habit (see the OpenAI forums). The spicy RLHF angle—reinforcement learning with human feedback—gets debunked: Nigerian English shows only 0.022% em-dashes versus ~0.25% in general English, and historic usage even peaked around 1860. Translation: we still don’t really know.

The real story? A punctuation mark turned culture war. Some humans ditch dashes to avoid AI suspicion; others double down—adding three, just to make the bots sweat for good measure.

Key Points

  • The article highlights that em-dashes are widely perceived as a hallmark of AI-generated prose and are difficult to avoid via prompting.
  • It rejects explanations based on training data prevalence and em-dashes’ predictive flexibility compared to other punctuation.
  • Token-efficiency claims are tested with the OpenAI tokenizer, finding em-dashes are not inherently more efficient and often replaceable by commas.
  • RLHF is examined as a potential source of stylistic bias, with emphasis on African English from workers in Kenya and Nigeria.
  • An analysis of a Nigerian English dataset shows a low em-dash rate (0.022%) versus general English estimates (~0.25–0.275%), undermining the African English hypothesis.

Hottest takes

“It’s a real pity that em-dashes are becoming disliked” — iansteyn
“MOStafarians … inadvertently created an AI-detection marker” — throwaway81523
“Em-dashes appear often in prestige publications” — lordnacho
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.