November 4, 2025

Ducks, icebergs & database drama

Pg_lake: Postgres with Iceberg and data lake access

Postgres dives into cloud files; crowd shouts 'open source Snowflake'

TLDR: pg_lake lets Postgres query cloud files and Apache Iceberg like a modern “lakehouse,” powered by DuckDB for fast crunching. Commenters are hyped, calling it “open source Snowflake,” while others question Snowflake’s role, fear product cannibalization, and debate the odd separate server—making this a big, buzzy shift.

Postgres just jumped into the data lake with pg_lake, an extension that lets the trusty database read and write cloud files like Parquet, CSV, and JSON in S3 and even use Apache Iceberg catalogs—basically turning Postgres into a “lakehouse” (data lake plus warehouse). Underneath, it taps DuckDB for fast analytics and can mix regular tables with file data in one transaction. The crowd went wild: “open source Snowflake” was the loudest cheer, with ozgune calling Crunchy’s work “the most ahead,” while half the thread congratulated both Crunchy and, somehow, Snowflake. Cue confusion: did Snowflake do this?

The plot thickened as ayhanfuat declared Iceberg is winning, pointing to S3 Table Buckets and Cloudflare’s catalog. dkdcio loved that DuckDB over Postgres is already baked in—“looks like that’s what it does!” Meanwhile, beoberha asked why pgduck_server is a separate process, sparking ops anxiety. Then the business take: dharbin wondered if this cannibalizes Snowflake, spawning jokes of ducks ramming icebergs and Postgres wearing a warehouse crown. Skeptics worry about complexity; fans say one SQL to rule them all. Either way, it’s database drama with cloud vibes.

Key Points

  • pg_lake extends PostgreSQL to act as a lakehouse, enabling transactional Iceberg tables and fast queries via DuckDB.
  • It can query/import Parquet, CSV, JSON, and Iceberg files from S3-compatible stores and export results back to S3 using COPY.
  • Supports geospatial formats through GDAL, compression (.gz, .zst), a built-in map type, and schema inference from external sources.
  • Setup is available via Docker or from source; CREATE EXTENSION pg_lake CASCADE installs required sub-extensions.
  • pgduck_server implements the Postgres wire protocol, uses DuckDB, listens on port 5332, and leverages AWS/GCP credential chains for object storage.

Hottest takes

'open source Snowflake.' — ozgune
Iceberg seems to be winning. — ayhanfuat
Doesn't this cannibalize their main product? — dharbin
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.