Document poisoning in RAG systems: How attackers corrupt AI's sources

Three fake files fooled the AI—and the comments melted down

TLDR: A demo shows three planted documents can steer an AI helper into confidently reporting fake company numbers. Commenters split: some dismiss it as an insider-only failure of design, others warn real-world data and social media spam can poison models—fueling a loud call for provenance and tighter defenses.

An engineer slipped three fake “CFO-approved” docs into a local AI search-and-answer setup (called RAG: it retrieves files, then generates an answer), and the bot proudly announced a made-up revenue crash. The lab is fully reproducible with code, and the community went DEFCON-1. Skeptics like sidrag22 called it a nothingburger—if a bad actor already has write access and the bot shows no sources, “that’s just a flawed product.” Others shot back that this is exactly how it breaks in the wild: public databases, regulatory filings, and messy archives get polluted—then AIs swallow it whole. One commenter even flagged engagement-farm networks on X mass-posting “whitepaper-style” text to game future AI ingest, and yes, that set off conspiracy alarms.

Amid the chaos, pragmatic voices said every company’s doc pile is already a junk drawer—old truths, contradictions, and half-baked drafts. The real fix, they argue, is layered defenses: strong source scoring, quarantine loops, and forcing the bot to show receipts. Meanwhile, classicists waved the “nothing new here” flag: this is the same trick humans fall for—hand someone a convincing “CORRECTED” memo and watch them misreport. The drama crown? The idea that a few spicy keywords can shove real numbers out of context. AI, meet office politics, but automated.

Key Points

  • A local RAG system was manipulated by injecting three fabricated documents into a ChromaDB knowledge base, leading to false financial answers.
  • The legitimate Q4 2025 figures ($24.7M revenue, $6.5M profit) were displaced by fabricated values ($8.3M revenue, –47% YoY, layoffs, acquisition talks).
  • The setup used LM Studio with Qwen2.5-7B-Instruct, all-MiniLM-L6-v2 embeddings via sentence-transformers, ChromaDB, and a custom Python pipeline.
  • The attack leverages PoisonedRAG’s two conditions: poisoned documents must rank higher in retrieval and drive the LLM to generate the attacker’s answer.
  • Reproducible code and commands are provided; success is defined over 20 runs at temperature 0.1 where only the fabricated figure appears.

Hottest takes

"Seems just like a flawed product at that point." — sidrag22
"needs many more dimensions with scoring to model true adversaries" — alan_sass
"This attack is not 'new', only the vector is new 'AI'." — altruios
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.