Fast regex search: indexing text for agent tools

Cursor’s “faster grep” claim sparks backlash — missing flags, hidden costs, huge repos

TLDR: Cursor says it can speed up agent coding by indexing code for faster regex searches, reviving a classic search trick. Commenters clap back, alleging missing ripgrep flags, hidden indexing costs, and questionable “15-second” repo claims—turning it into a benchmark brawl that matters for anyone betting on AI dev tools.

1973 called and it wants its regex back — and Cursor picked up. The company pitched a throwback idea with a modern spin: index your code so agents can run lightning‑fast regex searches, instead of brute‑forcing every file with ripgrep. It’s classic “inverted index” stuff with n‑grams — and the comments section immediately turned into Grep Wars.

The loudest heckle: “Just use the flag.” Commenter mpalmer roasted the post for ignoring ripgrep’s -g option (it narrows the search), saying that alone would “mostly obviate this entire exercise.” Others brought the receipts: open‑paren accused Cursor of convenient benchmarks, claiming they left out index build/rebuild time and the CPU/memory hit. Meanwhile, boyter questioned the headline stat that some searches take 15+ seconds, quipping that only happens on “100–200GB repos or spinning rust” (read: old slow hard drives). Translation: the community thinks the demo was a bit… selective.

Not everyone piled on. A quieter camp agreed that agents (automated coding helpers) need faster text lookups and that indexing is how search engines do it anyway. But joke‑smiths had a field day: “Time is a flat circle,” “We reinvented grep again,” and memes about 1973 engineers rising from the grave. Verdict from the peanut gallery: intriguing tech, messy receipts, and a benchmark brawl in full swing.

Key Points

  • AI coding agents still rely on fast regex search for certain queries despite advances in semantic indexing.
  • Ripgrep is widely used for agent search due to performance, but must scan all files, slowing large monorepo searches.
  • Some searches in massive codebases can exceed 15 seconds, disrupting interactive agent workflows.
  • The article proposes building indexes specifically for regex search, analogous to IDE syntactic indexes.
  • It reviews the classic n‑gram inverted index approach from a 1993 paper and references Russ Cox’s 2012 explanation.

Hottest takes

"The omission of rg's `-g` parameter is unsurprising" — mpalmer
"Anysphere/Cursor is being somewhat disingenuous and does not include the index-creation and recreation time" — open-paren
"The only way that works is if you are running it over repos 100-200 gigabytes in size" — boyter
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.