October 31, 2025
TBs in your tab—what could go wrong?
Use DuckDB-WASM to query TB of data in browser
No servers, just your browser: miracle hack or madness
TLDR: A new tool lets your web browser search massive public datasets without a traditional server, slashing costs and complexity. The community is split between awe at the no-backend magic and concerns about crashes, security, and whether the browser is the right place for heavy-duty data work.
Harvard’s Library Innovation Lab just dropped a wild experiment: a search site that lets your web browser query giant public-data files without a traditional server. Translation for civilians: they’re using WebAssembly (WASM, a way to run speedy code in your browser) and DuckDB to turn your laptop into a mini data center—cheap, fast, and hosted as simple files. The crowd immediately split into teams. The cheer squad, led by mlissner, is dazzled: no backend, all browser—“Amazing.” The skeptics rolled in hot: why are we cramming 1 terabyte into a tab? SteveMoody73 says this is peak “everything must be in the browser.” Then came the ops crowd clutching crash logs. One commenter recalled out-of-memory meltdowns and mystery index bugs, basically saying, I tried it, my laptop cried. Others waved the classic fix-it meme: “Just put it behind Cloudflare,” shorthand for “good luck with DDoS.”
Meanwhile, DuckDB is having a Main Character Week. Another post floated serverless “DuckLakes,” fueling the sense this is either the future of low-cost public data or a browser-shaped fad. The vibe: bold, clever, and genuinely useful for archives—but also a little like juggling chainsaws in a coffee shop. Are we impressed? Absolutely. Are we nervous? Also yes.
Key Points
- •LIL built Data.gov Archive Search to enable dynamic discovery of a nearly 18 TB archive via a statically hosted, browser-based system.
- •The approach avoids dedicated servers by running a database engine (DuckDB-Wasm) in the browser and using HTTP range requests for partial data access.
- •Catalog metadata (~1 GB) is stored as sorted, compressed Parquet files on Source.coop for efficient static hosting.
- •The design is informed by prior experience with the Caselaw Access Project (11 TB) and moving case.law to a static site for long-term maintainability.
- •Client-side tools like DuckDB-Wasm, sql.js-httpvfs, and Protomaps, powered by WebAssembly and web workers, enable querying large remote datasets without full downloads.