June 6, 2026
Crawler? I Hardly Know Her
How Other Link Checkers Do Recursion
Turns out the ‘missing trick’ never existed—and commenters loved the plot twist
TLDR: The author found that rival link checkers handle recursion easily because they were built as crawlers from the start, while lychee was built in a straight line and has been fighting that design ever since. In the comments, the standout reaction was a calm-but-spicy suggestion: stop overcomplicating it and consider a simpler single-threaded approach.
The big reveal in this coding saga is almost hilariously simple: other website link checkers aren’t secretly smarter. They just started life as full-on site crawlers, while lychee was built more like a straight conveyor belt—links go in, results come out, end of story. That difference sounds small, but it’s basically why adding “keep following links forever” has been such a five-year headache. The author went source-diving through rival tools and came back with a verdict that feels equal parts confession and vindication: there was no magic shortcut, just a totally different blueprint from day one.
And yes, the community absolutely pounced on that. The vibe is a mix of “aha, so that’s why this was such a nightmare” and “Rust architecture drama strikes again.” The hottest reaction came from ameliaquining, who jumped in with a very practical curveball: if thread-safety rules are making life miserable, why not use a single-threaded setup in Tokio—the Rust async system—so the code doesn’t have to jump through as many hoops? In plain English, it’s a suggestion to make the whole thing less fussy by keeping it on one lane instead of many. That comment has the energy of someone walking into a chaos-filled kitchen and saying, “Have you tried turning down the stove?”
The jokes practically write themselves: dragons in the pipeline, years of failed attempts, and the dawning realization that competitors didn’t solve the puzzle—they just bought a different puzzle. It’s less “genius hack discovered” and more “the house needed a staircase, but we built a hallway.”
Key Points
- •The article concludes that other recursive link checkers were designed as crawlers from their first commit, unlike lychee’s original stream-based architecture.
- •It identifies three common components in recursive link checkers: a mutable frontier queue, a visited set updated at enqueue time, and a completion mechanism for detecting when all work is done.
- •The article contrasts crawler architectures with lychee’s one-shot DAG pipeline, explaining that recursion requires a back-edge where discovered URLs become new inputs.
- •It says the critical fix for deduplication races is to atomically check and mark the visited set before scheduling or fetching a URL.
- •Muffet is presented as an example implementation that combines deduplication and scheduling, recursively re-enqueues discovered pages, and uses sync.WaitGroup-based tracking for termination.