October 31, 2025

TBs in your tab—what could go wrong?

Use DuckDB-WASM to query TB of data in browser

No servers, just your browser: miracle hack or madness

TLDR: A new tool lets your web browser search massive public datasets without a traditional server, slashing costs and complexity. The community is split between awe at the no-backend magic and concerns about crashes, security, and whether the browser is the right place for heavy-duty data work.

Harvard’s Library Innovation Lab just dropped a wild experiment: a search site that lets your web browser query giant public-data files without a traditional server. Translation for civilians: they’re using WebAssembly (WASM, a way to run speedy code in your browser) and DuckDB to turn your laptop into a mini data center—cheap, fast, and hosted as simple files. The crowd immediately split into teams. The cheer squad, led by mlissner, is dazzled: no backend, all browser—“Amazing.” The skeptics rolled in hot: why are we cramming 1 terabyte into a tab? SteveMoody73 says this is peak “everything must be in the browser.” Then came the ops crowd clutching crash logs. One commenter recalled out-of-memory meltdowns and mystery index bugs, basically saying, I tried it, my laptop cried. Others waved the classic fix-it meme: “Just put it behind Cloudflare,” shorthand for “good luck with DDoS.”

Meanwhile, DuckDB is having a Main Character Week. Another post floated serverless “DuckLakes,” fueling the sense this is either the future of low-cost public data or a browser-shaped fad. The vibe: bold, clever, and genuinely useful for archives—but also a little like juggling chainsaws in a coffee shop. Are we impressed? Absolutely. Are we nervous? Also yes.

Key Points

  • LIL built Data.gov Archive Search to enable dynamic discovery of a nearly 18 TB archive via a statically hosted, browser-based system.
  • The approach avoids dedicated servers by running a database engine (DuckDB-Wasm) in the browser and using HTTP range requests for partial data access.
  • Catalog metadata (~1 GB) is stored as sorted, compressed Parquet files on Source.coop for efficient static hosting.
  • The design is informed by prior experience with the Caselaw Access Project (11 TB) and moving case.law to a static site for long-term maintainability.
  • Client-side tools like DuckDB-Wasm, sql.js-httpvfs, and Protomaps, powered by WebAssembly and web workers, enable querying large remote datasets without full downloads.

Hottest takes

"Put all of that together, and you get a website that queries S3 with no backend at all. Amazing." — mlissner
"why query 1TB of data in a browser" — SteveMoody73
"Life is too short for..." — wewewedxfgdf
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.