Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File

One‑file AI memory on your Mac: fans swoon, skeptics shout “we already had this”

TLDR: Wax promises blazingly fast, one‑file AI memory on Apple Silicon—no servers, all local. The crowd is split: fans love the privacy and simplicity, skeptics say we already have this and question the performance chart, while others just want a CLI and app connectors to use it now

Meet Wax, the “one file” memory for AI that promises sub‑millisecond lookups on Apple Silicon. Translation for non‑tech folks: it’s like giving your AI a local brain that can instantly look up your notes, photos, and videos to answer better—no cloud, no servers, just a single file. The dev’s pitch is pure therapy for anyone with Docker PTSD: “No vector databases, no DevOps, just open a file and go.” Cue the crowd cheering privacy and speed, with links flying to the repo: Wax on GitHub.

Then the drama hits. One commenter swings in with a haymaker: “sqlite_vec is already the sqlite for AI memory.” The subtext? Wax isn’t revolutionary, it’s a remix. Others piled on with “is this like zvec?” confusion, while the chart police questioned a performance graph where the 9.2ms bar looked longer than the 104ms one. Memes ensued, including “fastest chart to lose trust.” Meanwhile, agent‑people asked for a CLI (a simple command tool) and MCP (a connector that lets apps talk to local tools) so their AI assistants can mine their PDFs.

Overall vibe: Mac crowd hyped, privacy nerds thrilled, engineers split between “finally, sane local RAG” and “it’s just SQLite with vibes.” If Wax’s GPU speed and crash‑proof promises hold, this could be the “Photos app for AI memory”—but the community wants receipts

Key Points

  • Wax is a single-file, on-device RAG system for Apple Silicon that removes the need for servers, vector databases, or network calls.
  • Performance on Apple Silicon (M1 Pro) shows 0.84 ms GPU vector search at 10K × 384-dim, 9.2 ms first GPU query after cold open, 105 ms CPU, and 150 ms SQLite FTS5.
  • The .mv2s file stores documents, embeddings, BM25 (FTS5), HNSW (USearch), a write-ahead log, and metadata/entity graph in an append-only, crash-safe format.
  • Wax provides text, photo (OCR + CLIP via Core ML), and video (segments + transcripts) memory types with simple Swift APIs.
  • The system emphasizes determinism, durability, portability, and privacy, with deterministic results and 100% on-device operation.

Hottest takes

“I wanted the SQLite of RAG — import a library, open a file, query” — ckarani
“sqlite_vec is already the sqlite for AI memory” — kleton
“Why is the 9.2ms bar longer than the 104ms bar” — simlevesque
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.