January 10, 2026
AI karaoke meets copyright chaos
Extracting books from production language models (2026)
Chatbots can spill whole books — crowd screams memory, devs promise filters
TLDR: Researchers show big chatbots can reproduce long stretches of real books, especially when jailbroken. Commenters split: some want word-sequence filters or synthetic training, others say “obvious, they remember” and point to existing recitation blocks—raising urgent copyright questions for how these systems are built and used.
Hold onto your e-readers: researchers say production AI chatbots can spit out big chunks of real books, even with safety locks on. In tests, Gemini 2.5 Pro and Grok 3 didn’t even need a jailbreak to pour out 70–77% of Harry Potter, while a jailbroken Claude 3.7 Sonnet went nearly 96% verbatim. GPT‑4.1 fought back, needed tons of retries, then basically said “nope,” landing around 4%. Cue the internet calling it AI karaoke.
The comments lit up. One camp is shouting “just block the copy-pasta” with a giant word-sequence filter, like a Bloom filter that stops unique 10-word strings from escaping. User visarga pushes a cleaner diet: train on summaries and Q&A so AIs learn ideas, not exact lines. The other camp? “This was obvious,” says orbital-decay, claiming big models do remember and companies already run “regurgitation filters” that flag and even cite the source with a RECITATION error. Drama brewed between the engineers (“we can patch this”) and the realists (“it’s baked into how these models work”).
Jokes flew: AI as a “walking photocopier,” “Harry Potter DLC,” and “Best-of-N jailbreak” as the slot machine you pull 20 times until a chapter drops. The mood: equal parts alarm, resignation, and popcorn
Key Points
- •A two-phase procedure (probe with optional BoN jailbreak, then iterative continuation prompts) was used to test LLM data extraction.
- •Four production LLMs were evaluated: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3.
- •Extraction success was measured using an nv-recall metric based on a block-approximation of longest common substring.
- •Gemini 2.5 Pro and Grok 3 did not require jailbreaking to achieve high nv-recall scores (76.8% and 70.3%).
- •Claude 3.7 Sonnet (with jailbreak) produced near-verbatim outputs (nv-recall=95.8%), while GPT-4.1 needed many BoN attempts and ultimately refused (nv-recall=4.0%).