AI scrapers request commented scripts

Sneaky bots are reading your website comments and pretending to be real users

TLDR: A site owner caught bots fetching a link hidden in page comments, revealing scrapers that ignore rules and mimic real browsers. Commenters debated whether it’s just efficient or outright shady, with many pushing for “honeypot” traps to expose and mess with sloppy crawlers.

A quiet Sunday turned spicy when a site owner spotted bots requesting a JavaScript file that only existed inside an HTML comment—basically the “invisible ink” of web pages. Real browsers should ignore those, but the logs told a different story: scrapers sporting fake Chrome/Firefox IDs, plus blunt instruments like python-httpx and Go clients, all knocking on a door that shouldn’t exist. The community split fast. One camp shrugged, saying it’s simply faster to grab anything that looks like a link than to parse pages properly—cue rokkamokka’s practical take and old-school devs reminiscing about regex (pattern-matching) as their ride-or-die. Another camp called it shady: Noumenon72 says it doesn’t feel abusive, but others see it as bots ignoring “do not crawl” signs and hoovering up scraps for training chatbots. The snark peaked with “regex vs real parsing” jokes and the suggestion to lay honeypot links inside comments to lure sloppy scrapers, courtesy of OhMeadhbh. Meanwhile, sharkjacobs drops a link to related research, adding academic spice. The vibe: bots act differently than humans, and the crowd is ready to booby-trap them. Whether it’s lazy or greedy scraping, folks want receipts—and maybe revenge.

Key Points

  • A commented-out script tag led to 404 requests for a JavaScript file that was never deployed.
  • Server logs showed requests from automated user-agents (python-httpx, Go-http-client, Gulper Web Bot) despite robots.txt disallowing crawling.
  • Many requests used browser-like user-agent strings (Firefox, Chrome, Safari) but behaved unlike real browsers, suggesting spoofing.
  • The author posits scrapers either parse comment text for URLs or use naive text pattern matching to extract links.
  • Observed behavioral differences from human users suggest such bots can be detected and potentially countered with targeted interventions.

Hottest takes

"faster to search the text for http/https than parse the DOM" — rokkamokka
"It doesn't seem that abusive" — Noumenon72
"I blame modern CS programs that don't teach kids about parsing" — OhMeadhbh
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.