So you wanna build a local RAG?

Privacy dream or overkill? Comments clash over chunks, keywords, and going fully local

TLDR: Skald shows you can run a private local chatbot that searches your files using open tools, fast to deploy. The comments erupt over whether fancy “semantic” search is worth it, with some saying simple keyword search works fine and others demanding better document chunking and careful local setups.

Skald says it’s possible to build a fully local, privacy-first “RAG” (a retrieval-augmented chatbot) with open tools like Postgres + pgvector, Sentence Transformers, and Docling—and claims it takes just 8 minutes. The comment section promptly turned into a tech reality show. One camp cheered the privacy angle; the other asked, “Do we even need all this?”

The spiciest thread: semantic vs keyword search. Simonw rolled in with a mic drop: skip fancy “vector databases” and use good old full-text search (think basic keyword matching, even command-line tools). Nilirl backed it up, saying their tests showed semantic search didn’t beat classic keyword search in a meaningful way. Meanwhile, mips_avatar insisted the real issue is chunking—breaking big documents into smart sections using tools like spaCy so the bot doesn’t get confused.

Pragmatists like barbazoo argued for baby steps: keep your documents and the index local first, don’t try to self-host a giant AI model on day one. And _joel dropped a “plug-and-play” curveball, shouting out AnythingLLM for quick local setups. The memes? “Vector cult vs grep gang,” “semantic is astrology,” and “8-minute deploy speedrun.” Result: a fun, fiery split between privacy purists, simplicity stans, and chunking crusaders—aka the perfect internet drama.

Key Points

•Skald proposes a fully local, open-source RAG setup to preserve data privacy without relying on third-party APIs.
•It maps common RAG components to cloud options and open-source/self-hosted alternatives across vector DBs, embeddings, LLMs, rerankers, and document parsing.
•Skald’s current stack uses Postgres+pgvector, Sentence Transformers (all-MiniLM-L6-v2), optional bge-m3, user-managed LLMs (tested GPT-OSS 20B on EC2), and Docling via docling-serve.
•Reranking defaults to a Sentence Transformers cross-encoder, with bge-reranker-v2-m3 and mmarco-mMiniLMv2-L12-H384-v1 also tested for multilingual support.
•A production instance of the full stack was deployed in 8 minutes; further benchmarks and client-driven optimization are planned.

Hottest takes

"Full text search or even grep/rg are a lot faster and cheaper to work with" — simonw

"Why is it implicit that semantic search will outperform lexical search?" — nilirl

"a lack of semantic chunking" — mips_avatar

November 28, 2025

Chunk Wars: Keywords vs Vectors

Privacy dream or overkill? Comments clash over chunks, keywords, and going fully local

Key Points

Hottest takes

November 28, 2025

Chunk Wars: Keywords vs Vectors

So you wanna build a local RAG?

Privacy dream or overkill? Comments clash over chunks, keywords, and going fully local

Key Points

Hottest takes

Save News