January 13, 2026

Did they just bucket the whole web?

Exa-d: How to store the web in S3

Exa-d puts the internet in a giant bucket — and comments explode

TLDR: Exa unveiled exa-d, a DIY system that keeps the web’s data fresh by storing and updating it in Amazon S3 like spreadsheet formulas. Comments split between praising the engineering, debating “S3 isn’t a database,” speculating about Ray/Anyscale, and warning to add strict rate limits before anything melts.

Exa just dropped a behind-the-scenes look at exa-d, their homegrown system for keeping a live index of the whole web by storing everything in Amazon’s S3 (think: a giant online hard drive). Instead of messy scripts, engineers use formula-like rules so the system knows how pieces of data relate. It can tweak tiny parts when a page changes or rebuild massive chunks when a new model lands, all while fanning work across lots of machines to go fast.

The community? Spicy. swyx shows love for the write-up but immediately asks if this is really just another path that ends in Ray (a popular tool for running jobs across many computers) and whether Anyscale (the company behind Ray) is scooping up the market. Meanwhile, the peanut gallery revives the eternal meme: “S3 isn’t a database,” while fans clap back that “databases are just feelings — buckets are forever.” Some cheer the spreadsheet-style simplicity; others roll their eyes: “Congrats, you reinvented Airflow with vibes.” The practical crowd echoes swyx’s caution, begging for rate limits and anomaly detection so a bad update doesn’t melt the cluster. The drama centers on Team Build vs Team Buy, with spectators placing bets on whether exa-d is brilliant engineering or a future migration story. Either way, the bucket jokes are overflowing.

Key Points

  • Exa developed exa-d, an in-house framework to manage web-scale search data and updates.
  • The framework uses typed columns and declarative dependencies, allowing engineers to define relationships instead of procedural steps.
  • exa-d supports both surgical updates to specific rows/columns and full rebuilds for large-scale changes like new embeddings.
  • The system is designed for efficient, parallel execution across CPUs, nodes, and clusters to process petabytes of data.
  • Exa evaluated data warehouses, SQL layers, and orchestrators but built exa-d to better meet dynamic update and scalability needs.

Hottest takes

"why everyone seems to converge on using Ray" — swyx
"how well is Anyscale capturing the Ray market" — swyx
"set up a lot of rate limits/anomaly detection" — swyx
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.