Asymmetric Quantization: Near-Lossless Retrieval with 97% Storage Reduction

They shrank AI search storage by 97%, and the comments instantly turned feral

TLDR: The company says it found a way to make AI-powered search vastly cheaper to store while keeping almost the same usefulness, which matters because these systems save data for billions of pages. Commenters bounced between impressed, skeptical, and hilariously doomy, arguing over whether “near-lossless” means anything at all.

A search startup just dropped a very nerdy flex with a very non-nerdy promise: keep the smart search quality, slash the storage bill. In plain English, they found a way to store the massive piles of data used by AI search in a much tinier form, cutting average document storage from 393 KiB to 12.28 KiB while barely denting result quality. That’s a huge deal when you’re storing information for billions of documents, and the company’s whole pitch is basically: why pay mansion rent for your files when a studio apartment gets almost the same job done?

But the real fireworks were in the comments, where readers immediately split into camps. One side was impressed, with one commenter practically sliding into the company’s inbox to brag about their own “99+% compression unlock” for giant game replays. Another wanted receipts, asking what the small quality drop actually looks like in real life: if results get worse, how do they get worse? That opened the classic tech-thread tension between benchmark numbers and what normal humans actually notice.

Then came the chaos agents. One drive-by declared, “there is no such thing as ‘near lossless’”, which is the kind of nitpick that can power an entire internet argument. Another deadpanned, “The Pi compression algorithm is better,” because no comment section is complete without a meme grenade. And the darkest joke of the bunch imagined a future of 100% cost reduction where every thought is precomputed by giant AI companies and the government. So yes, the company says it made AI search cheaper. The crowd heard that and replied: cool story, now show us the catch — and also, lol.

Key Points

•Mixedbread says late-interaction retrieval improves precision but significantly increases storage because each document may generate hundreds or thousands of vectors.
•Its Silo retrieval engine stores vectors for more than 2.5 billion documents in object storage and loads them into faster tiers as queries require.
•The article proposes asymmetric quantization: document vectors are stored as 1-bit signs while query vectors remain at int8 precision.
•In Mixedbread's internal benchmarks, average raw document-vector storage fell from 393 KiB to 12.28 KiB per document, about a 32x reduction, while NDCG@10 changed from 90.26 to 89.65.
•The article states that compressing document vectors yields the main production benefits in storage, I/O, cache use, and cold-start time, whereas fully binarizing queries harms ranking quality more.

Hottest takes

"there is no such thing as 'near lossless'" — functionmouse

"The Pi compression algorithm is better." — rq1

"100% storage/cost/compute reduction for LLMs" — Ameo

July 2, 2026

Small files, huge feelings

They shrank AI search storage by 97%, and the comments instantly turned feral

Key Points

Hottest takes

July 2, 2026

Small files, huge feelings

Asymmetric Quantization: Near-Lossless Retrieval with 97% Storage Reduction

They shrank AI search storage by 97%, and the comments instantly turned feral

Key Points

Hottest takes

Save News