March 26, 2026
One chunk to rule them all
From zero to a RAG system: successes and failures
A terabyte of chaos, frozen laptops, and a ‘RAG is dead’ brawl
TLDR: An engineer wrestled a 1‑terabyte document swamp into a private AI chat by filtering junk and using a RAG setup. Commenters brawled over whether RAG is still needed now that models read huge texts, corrected a “ChromaDB is Google’s” mix‑up, and waxed nostalgic about old‑web site vibes—stakes and snark included.
An engineer tried to build a private AI chat that answers questions about a decade of company projects—using a terabyte of messy files—and the comments went feral. The tale: local AI (no cloud), a smart index called RAG (think: supercharged search that feeds the chatbot), and one laptop-defying flood of videos, simulations, and backup junk that froze everything until filters saved the day.
But the real action? The peanut gallery. One early correction shot in: “ChromaDB isn’t Google’s database,” scolded a commenter, fact-checking the record. Then the main fight broke out: Is RAG even needed now? One camp cheered giant “context windows” (the amount of text an AI can read at once), bragging models can swallow all of “The Lord of the Rings.” The other camp, led by realists, fired back: try stuffing a law library—or 451 GB—into that mouth. Cue memes about “one model to rule them all.”
Meanwhile, a side quest emerged: chunking drama—how to slice giant documents so the AI doesn’t get lost. Veterans shared battle scars, while another user dropped a buffet of research tools like NotebookLM and Connected Papers. And in a plot twist, a commenter gushed about the site’s old-school “51 visitors online” counter. AI chaos, meet early-internet vibes.
Key Points
- •The author was tasked with building a fast, local LLM-based chat that answers questions across a decade of company projects with references, focusing on OrcaFlex files.
- •A confidentiality constraint led to a local stack: Ollama (running LLaMA models), Python, LlamaIndex for RAG, and nomic-embed-text for embeddings.
- •The data source was ~1 TB of mixed, unstructured documents on Azure, spanning many formats and minimal organization.
- •Initial large-scale indexing overwhelmed system RAM due to massive non-text files being processed as text and loaded fully into memory.
- •The solution included filtering by file extension/name to exclude large, irrelevant files, removing expensive formats that added little value, and converting common office documents (PDF, DOCX, XLSX).