How Other Link Checkers Do Recursion

Turns out the ‘missing trick’ never existed—and commenters loved the plot twist

TLDR: The author found that rival link checkers handle recursion easily because they were built as crawlers from the start, while lychee was built in a straight line and has been fighting that design ever since. In the comments, the standout reaction was a calm-but-spicy suggestion: stop overcomplicating it and consider a simpler single-threaded approach.

The big reveal in this coding saga is almost hilariously simple: other website link checkers aren’t secretly smarter. They just started life as full-on site crawlers, while lychee was built more like a straight conveyor belt—links go in, results come out, end of story. That difference sounds small, but it’s basically why adding “keep following links forever” has been such a five-year headache. The author went source-diving through rival tools and came back with a verdict that feels equal parts confession and vindication: there was no magic shortcut, just a totally different blueprint from day one.

And yes, the community absolutely pounced on that. The vibe is a mix of “aha, so that’s why this was such a nightmare” and “Rust architecture drama strikes again.” The hottest reaction came from ameliaquining, who jumped in with a very practical curveball: if thread-safety rules are making life miserable, why not use a single-threaded setup in Tokio—the Rust async system—so the code doesn’t have to jump through as many hoops? In plain English, it’s a suggestion to make the whole thing less fussy by keeping it on one lane instead of many. That comment has the energy of someone walking into a chaos-filled kitchen and saying, “Have you tried turning down the stove?”

The jokes practically write themselves: dragons in the pipeline, years of failed attempts, and the dawning realization that competitors didn’t solve the puzzle—they just bought a different puzzle. It’s less “genius hack discovered” and more “the house needed a staircase, but we built a hallway.”

Key Points

  • The article concludes that other recursive link checkers were designed as crawlers from their first commit, unlike lychee’s original stream-based architecture.
  • It identifies three common components in recursive link checkers: a mutable frontier queue, a visited set updated at enqueue time, and a completion mechanism for detecting when all work is done.
  • The article contrasts crawler architectures with lychee’s one-shot DAG pipeline, explaining that recursion requires a back-edge where discovered URLs become new inputs.
  • It says the critical fix for deduplication races is to atomically check and mark the visited set before scheduling or fetching a URL.
  • Muffet is presented as an example implementation that combines deduplication and scheduling, recursively re-enqueues discovered pages, and uses sync.WaitGroup-based tracking for termination.

Hottest takes

"you can have a Node-style single-threaded event loop in Tokio" — ameliaquining
"spawn_local has no Send bounds" — ameliaquining
"the Rust compiler won't require you to make your code thread-safe" — ameliaquining
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.