Building a Simple Search Engine That Works

Dev’s DIY search sparks ‘just use Lucene’ vs ‘fix Google’ flame war

TLDR: A developer built a simple, database-based search by chopping words into pieces and scoring matches. Commenters split between celebrating DIY control, urging “just use Lucene,” and venting about today’s bad web search—some begging for a Google replacement—showing a real hunger for simpler, trustworthy search that actually works.

A lone dev just shipped a “simple search engine” that runs inside your existing database—no pricey add‑ons, no mystery black boxes—and the comments went feral. DIY fans applauded, saying modern full‑text search got “complexified” and praising the back‑to‑basics recipe: break text into pieces, store them, score matches. The author’s trick: tokenize words (chop them into bits), stash them in two small tables, and rank results by a clear score. It’s explainable, fixable, and skips learning giants like Elasticsearch or Algolia.

Then the camps formed. Pragmatists said “just use Lucene” to avoid reinventing the wheel. Curious minds wondered how optimized the big players’ tokenizers really are, while nostalgics rolled in with tales of PhD buddies geeking out about search and shout‑outs to Toby Segaran’s classic Programming Collective Intelligence.

And, of course, it morphed into group therapy about web search. A plea begged someone to replace Google, with side‑eye at DuckDuckGo and Qwant for missing results. Joke of the day: “DIY engine today, fix the internet tomorrow.” The mood: hopeful chaos—equal parts tinkering pride and search‑fatigue snark.

Key Points

•The article outlines a simple search engine that uses an existing database instead of external services.
•Content is tokenized at indexing time, and queries are tokenized the same way for matching and scoring.
•Two tables are used: index_tokens (tokens with tokenizer-specific weights) and index_entries (token-document links with final weights).
•Final weight is computed as field_weight × tokenizer_weight × ceil(sqrt(token_length)).
•Multiple tokenizers (word, prefix, n-grams, singular) are supported via a common interface, enabling exact, partial, and typo-tolerant matches.

Hottest takes

"Good. Now please someone replace Google's search engine." — shevy-java

"full text search has been complexified." — eduction

"how heavily optimised the tokenizers used by popular search enginea truly are." — mobeigi

November 16, 2025

Ctrl‑F? More like Ctrl‑Feud

Dev’s DIY search sparks ‘just use Lucene’ vs ‘fix Google’ flame war

Key Points

Hottest takes

November 16, 2025

Ctrl‑F? More like Ctrl‑Feud

Building a Simple Search Engine That Works

Dev’s DIY search sparks ‘just use Lucene’ vs ‘fix Google’ flame war

Key Points

Hottest takes

Save News