March 10, 2026
Crawling into chaos
Cloudflare Crawl Endpoint
Cloudflare unveils one-call site crawler — fans cheer, skeptics cry AI backdoor?
TLDR: Cloudflare launched a tool that can scan an entire site and return content in multiple formats while following site rules. Commenters split between applause for a polite, legit crawler and worries about AI-scraping loopholes, with a hot take suggesting Cloudflare just expose cached pages as a single feed.
Cloudflare just rolled out a one‑call website crawler in open beta, and the comments came in hot. With Browser Rendering and a new /crawl endpoint, you drop in a starting link, it finds pages, and spits them back as HTML, Markdown, or structured JSON. It runs in the background, can skip unchanged pages, and even has a “well‑behaved bot” mode that honors a site’s rules file (robots.txt). Translation: faster site scans for AI training, search tools, uptime checks, and research—on both free and paid plans.
Fans are thrilled. triwats wants to use Cloudflare’s global edge to watch pages for real‑world changes, and memothon cheers that this is a good‑citizen crawler in a world of “scummy” scrapers. But the spice hits fast: jasongill drops a hot take—why not just expose a site’s cached content as a single JSON file?—cue raised eyebrows. And 8cvor6j844qw_d6 asks the question of the hour: “Does this bypass their own anti‑AI crawl measures?” One commenter even name‑drops a “labyrinth” test, like they’re unleashing a Minotaur on scrapers. Meanwhile, archivists like Imustaskforhelp dream of mirroring tech forums before they vanish. The vibe: part celebration, part side‑eye, and a whole lot of Crawl me, maybe energy.
Key Points
- •Cloudflare released an open beta /crawl endpoint in Browser Rendering to crawl entire websites via a single API call.
- •The service auto-discovers pages from links and sitemaps, renders them in a headless browser, and returns HTML, Markdown, and structured JSON.
- •Crawl jobs are asynchronous, returning a job ID to poll for results as pages are processed.
- •Scope controls include crawl depth, page limits, and wildcard include/exclude patterns, plus incremental crawling via modifiedSince and maxAge.
- •The crawler honors robots.txt (including crawl-delay) and offers a static mode (render: false) for faster non-rendered crawls; available on Workers Free and Paid plans.