January 14, 2026
So many columns, so little chill
Ask HN: Distributed SQL engine for ultra-wide tables
A coder says you can query a 'million-column' table fast; the crowd wants receipts
TLDR: A developer claims sub-second queries on tables with up to a million columns by ditching joins and distributing columns. The community is split: some point to tools like ClickHouse and Iceberg, others demand a clear design, and everyone worries about metadata chaos at extreme width.
A bold poster claims they flipped databases sideways: no joins, no transactions, columns spread across machines, and blazing-fast reads from a table with up to a million columns. Cue the Hacker News popcorn. The first reaction? Skeptical applause-turned-interrogation: “What is the design?” asked icsa, echoing a chorus of readers who want more than vibes and benchmarks. Others demanded to know why anyone would avoid joins—database speak for stitching tables together—with remywang asking, “Why don’t you want joins?” The mood: impressed by the numbers, but suspicious of the missing details.
Another camp showed up wielding receipts: “ClickHouse and Scuba address this,” minitoar wrote, pointing to systems that only read the columns you ask for. kentm suggested modern “lakehouse” formats—think Parquet files with a smart tracker—like Iceberg, and engines like Trino/Presto, while warning that metadata might melt down at extreme width. Meanwhile, mamcx went full inventor-mindset, dreaming of custom data types to compress columns and a better way to build storage. The comment section turned into a tug-of-war: pragmatists saying “use existing tools,” purists asking for architecture specifics, and comedians dubbing this the “spreadsheet from hell” and “social distancing for data—no joins allowed.” Verdict: huge promise, but the crowd wants the blueprint before the bravo.
Key Points
- •Ultra-wide datasets in ML and multi-omics shift the challenge from row count to column count.
- •Standard SQL databases typically cap at about 1,000–1,600 columns, limiting schema width.
- •Columnar formats like Parquet can store wide data but often require Spark/Python pipelines; OLAP engines assume narrower schemas.
- •Extreme width introduces bottlenecks in metadata handling, query planning, and SQL parsing.
- •An experimental approach distributes columns (not rows), uses SELECT-only operations, and shows sub-second latency for subset column access with benchmark metrics on a small AMD EPYC cluster.