We can't have nice things because of AI scrapers

AI data gobblers crash the party; MetaBrainz locks the door

TLDR: MetaBrainz tightened access—tokens, removals, and login gates—to stop AI scrapers overwhelming MusicBrainz and ListenBrainz. Commenters battle over tactics: tarpit the bots, let market costs curb bad scraping, or route everyone through Common Crawl. It matters because open data sites are getting crushed and real users locked out.

Cue the tiny violins: AI scrapers are hammering MusicBrainz and ListenBrainz like they’re speed-reading every page one at a time. Instead of using the polite bulk download or respecting robots.txt, they’re flooding servers and booting real users from the dance floor. The MetaBrainz crew hit the brakes: key API (application programming interface) endpoints now need an Authorization token, some debug endpoints are gone, and LB Radio is members-only for now. The crowd? Loud, divided, and meme-ready.

The top cheer goes to SchemaLoad, who wants Cloudflare to “tarpit” bad bots with endless junk pages—serve the scrapers spam, baby! Meanwhile, lysace sighs that dumb scraping will fade as costs bite, dropping a burn: today’s AI crawlers feel like “Googlebot in 2001.” That sparked a mini-brawl over IP blocking and platform bans. On the idealist side, lep_qq blasts AI firms for ignoring open data etiquette and not supporting projects like MetaBrainz. Then tensegrist pitches a controversial fix: funnel everyone through Common Crawl so small sites don’t get wrecked, but the web stays scrappable for regular folks. The vibe? A tug-of-war between keeping the web open and slapping locks on the doors—with jokes about feeding bots nonsense and LB Radio turning into a VIP lounge.

Key Points

  • MetaBrainz reports AI scrapers ignoring robots.txt and scraping MusicBrainz page-by-page.
  • ListenBrainz APIs are being hit by scrapers, causing service load issues.
  • /metadata/lookup GET and POST endpoints now require an Authorization token.
  • ListenBrainz Labs endpoints for mbid-mapping, mbid-mapping-release, and mbid-mapping-explain have been removed.
  • LB Radio now requires login and an Authorization header; error messaging will be refined after Year in Music work.

Hottest takes

"detect AI scrapers and send them to a tarpit of infinite AI generated nonsense pages" — SchemaLoad
"Much of the scraping... is still very dumb/repetitive. Like Googlebot in like 2001" — lysace
"force everyone to have to go through something like common crawl" — tensegrist
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.