Quack-Cluster: A Serverless Distributed SQL Query Engine with DuckDB and Ray

DuckDB goes flock mode: serverless hype vs Ray bill shock

TLDR: Quack-Cluster runs fast SQL on cloud files by fanning work across Ray workers with DuckDB. Commenters love the speed but question the “serverless” claim, Ray’s scaling costs, and whether clustering DuckDB defeats its simplicity—turning a clever idea into a showdown between convenience and cloud bills.

Quack-Cluster splashes in promising fast, “serverless” SQL over cloud files—think Amazon S3 or Google’s storage—by splitting the work across many little worker nodes using the Ray framework and the ultra-speedy DuckDB. Sounds dreamy: no servers to babysit, queries that fly, and data read straight from files without heavy prep.

But the pond erupted. Curious newbies like dogman123 asked how this thing handles “blocking” steps—the kind of operations that must see all the data before producing any result—and whether it’s basically doing big shuffle moves like Spark, pointing to DuckDB’s own performance notes here. The finance ducks quacked loudest: mgaunard warned that Ray clusters “don’t scale well” and can burn cash, calling for shared, elastic setups instead of per-user clusters. Meanwhile, fodkodrasz dropped the purist bomb: DuckDB’s magic is doing biggish analytics without a cluster… so why bolt it onto a cluster? That comment hit a nerve for the simplicity crowd.

And the branding squad? nevalainen chirped that “Cluster-Quack” was the obvious name, which immediately became the meme of the thread. The final splash: rfonseca grilled the “serverless” claim—do workers shut down to zero when idle, or is someone quietly paying for ducks to idle? The vibe: bold idea, fast tech, but the community’s split between speed thrills and wallet chills.

Key Points

•Quack-Cluster is a serverless, distributed SQL engine that runs queries directly on object storage (e.g., AWS S3, Google Cloud Storage).
•It uses DuckDB for in-memory, vectorized SQL execution on each worker and Apache Arrow for efficient data interchange.
•A FastAPI + SQLGlot coordinator parses SQL, plans distributed execution, and aggregates partial results from Ray workers.
•Ray orchestrates parallel execution, with each worker as a Ray Actor running an embedded DuckDB instance on data partitions.
•Deployment is simplified via Docker and make, including sample data generation and Ray Dashboard monitoring.

Hottest takes

"ray clusters don't scale well and end up costing you more money" — mgaunard

"and we now put it to a cluster?" — fodkodrasz

"what is the scalability / scale-to-zero story that makes this serverless?" — rfonseca

January 30, 2026

Quacks, hacks, and cloud bill attacks

DuckDB goes flock mode: serverless hype vs Ray bill shock

Key Points

Hottest takes

January 30, 2026

Quacks, hacks, and cloud bill attacks

Quack-Cluster: A Serverless Distributed SQL Query Engine with DuckDB and Ray

DuckDB goes flock mode: serverless hype vs Ray bill shock

Key Points

Hottest takes

Save News