Nvidia Contacted Anna's Archive to Access Books

Readers cry piracy, Nvidia says fair use, comments roast

TLDR: An amended lawsuit says NVIDIA sought high-speed access to millions of books from a shadow library to train AI. Comments clash over fair use, mock “asking permission,” and worry about Kindle data—raising questions about how AI is trained and who pays for the words.

An amended class-action lawsuit claims NVIDIA didn’t just stumble across shadow libraries—it allegedly emailed Anna’s Archive asking for “high-speed access” to millions of books to juice its AI training. Cue the comment section chaos. One camp is furious, calling it a corporate book heist; another shrugs that “everyone’s models eat the same diet.” The line that lit the fuse: NVIDIA’s fair use defense that books become “statistical correlations.” User skilled shot back: “Are the copyright laws so bad that this could actually help?” That turned into a meme—“books aren’t words, they’re vibes.” Meanwhile, rtbruhan00’s deadpan, “It’s generous of them to ask for permission,” painted NVIDIA as the polite pirate.

Then came the hypocrisy alarms: poulpy123 noted it’s “quite something” to chase Anna’s Archive while many big AI players allegedly used it. Speculation flew to Amazon—what happens when a tech giant has a vault of Kindle books? Commenters debated whether training is “reading” or “copying,” and whether a promised 500 terabytes—allegedly including works normally behind the Internet Archive’s lending system—sounds more like a library or a loot drop. Important note: these are allegations from an amended complaint; NVIDIA says its training is lawful. The internet, however, is treating it like Ocean’s Eleven with GPUs and a very fast library card.

Key Points

  • An amended class-action lawsuit alleges NVIDIA sought high-speed access to Anna’s Archive’s pirated books for AI training.
  • Plaintiffs cite internal NVIDIA emails and documents indicating competitive pressures led to downloading millions of copyrighted books.
  • Earlier suits claimed NVIDIA trained models on the Books3 dataset sourced from the pirate site Bibliotik without permission.
  • NVIDIA defended its training practices as fair use, describing books as statistical correlations in its models.
  • The complaint states Anna’s Archive provided NVIDIA access to about 500 TB of data after internal approval, including millions of books.

Hottest takes

"Are the copyright laws so bad that a statement like this would actually be in NVIDIA’s favor?" — skilled
"It's generous of them to ask for permission" — rtbruhan00
"going after Anna’s archive while most of the big AI players intensely used it is quite something" — poulpy123
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.