AIs can generate near-verbatim copies of novels from training data

Internet erupts: is your chatbot a sneaky e‑book pirate or just predictable

TLDR: A new study shows top AI chatbots can reproduce big chunks of famous novels, challenging claims they don’t store texts. Commenters split between “obvious and overblown” vs “copyright and privacy nightmare,” with jailbreak caveats and search‑engine comparisons stoking the fight over what counts as copying.

The internet’s book club turned courtroom today after a new Stanford/Yale study showed big-name chatbots can cough up near-verbatim chunks of bestselling novels when nudged just right. Gemini reportedly reproduced about 76.8% of Harry Potter and the Philosopher’s Stone, Grok hit 70.3%, and researchers even jailbroke Claude to spill almost entire books. Cue chaos: half the comment section yelled “no duh”—if a bot learned from books, of course it can finish their sentences. The other half screamed copyright alarm, asking whether this turns your friendly assistant into a stealth e‑book machine.

The hottest fight is over intent: some argue it’s like asking a trivia whiz to recall a passage, not theft. Others say if it can spit out pages on command, that’s not “transformative”—that’s copying, with real legal and privacy fallout. One camp shrugged it off as a “nothing burger,” noting the models needed carefully crafted prompts or even jailbreaks to behave badly. Another called the “AI-as-library” analogy cute but misleading—libraries have licenses; LLMs just ate the shelves.

Humor broke through the tension with memes about wizard bots “summoning Harry Potter paragraphs,” and quips comparing LLMs to search engines on steroids. Legal vibes were heavy—people warned the fair‑use defense might be wobbling. Expect more lawsuits, more guardrails, and a lot more drama every time someone types “continue this sentence…”

Key Points

  • Studies show LLMs from leading AI firms can reproduce near-verbatim text from copyrighted books when strategically prompted.
  • Gemini 2.5 regurgitated 76.8% of Harry Potter and Grok 3 generated 70.3% via text continuation requests.
  • Claude 3.7 Sonnet was jailbroken to extract almost entire novels near-verbatim, revealing guardrail limitations.
  • Prior research found open models like Meta’s Llama memorize large portions of specific books.
  • Findings challenge industry claims that models do not store copies of training data and raise legal and privacy concerns.

Hottest takes

"This feels like a 'no shit' moment" — bena
"This seems like a total nothing burger" — rowanG077
"You can also do this with most search engines" — xnx
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.