Hardwood: A New Parser for Apache Parquet

New “Hardwood” promises faster data reads; fans cheer while skeptics demand proof

TLDR: Hardwood is a new open-source Java tool to read Parquet data faster with fewer add-ons. Commenters applaud the dependency diet but demand benchmarks, comparing it to DuckDB + Arrow and questioning whether it can beat parquet‑java’s infamous 74,000‑line bit‑unpacker—proof first, party later.

A new open‑source tool called Hardwood just crashed the data party, promising to read Parquet files (a popular way to store big tables) faster and with less baggage. Think: fewer add‑ons, modern Java, and finally using all your computer’s cores. The crowd reaction? Equal parts applause and side‑eye. One camp is thrilled to ditch dependency sprawl—no more hauling the big, creaky Hadoop suitcase just to open a file. Another camp wants receipts, chanting the classic: “benchmarks or it didn’t happen.”

The loudest spark came from a developer who’s been in the trenches: “Respect,” says willtemperley, who built a Parquet reader in Swift and calls it “the hardest bit of coding I’ve done.” Then the zinger: is Hardwood’s “bit unpacking” (aka squeezing numbers out of tightly packed bits) actually faster than the hulking 74,000‑line legacy in parquet‑java? That number alone became the meme of the thread: “74K lines to open a spreadsheet?!”

Meanwhile, the pragmatists waved a different flag: “We already fly with DuckDB + Apache Arrow,” notes uwemaurer, claiming it’s blazing fast without new toys. So the vibe: minimal‑dependency detox vs. prove‑it performance. Hardwood might be the sleek, modern answer, but until head‑to‑head numbers land, the parquet purists aren’t rolling out the red carpet just yet.

Key Points

•Hardwood is an open-source Java parser for Apache Parquet, licensed under Apache License 2.0.
•It targets minimal dependencies, requiring only optional compression libraries (snappy-java, zstd-jni, lz4-java, brotli4j).
•Hardwood implements a multi-threaded decoding pipeline to utilize all CPU cores and speed up parsing.
•The library supports Java 21+, is available on Maven Central, and the release version cited is 1.0.0.Alpha1.
•Hardwood provides both row-oriented (RowReader) and columnar APIs, with examples showing typed accessors and handling of nested structures.

Hottest takes

“is it faster than the 74 KLOC parquet-java bit unpacker?” — willtemperley

“DuckDB + Arrow is also very fast” — uwemaurer

March 1, 2026

Flooring the data nerds

New “Hardwood” promises faster data reads; fans cheer while skeptics demand proof

Key Points

Hottest takes

March 1, 2026

Flooring the data nerds

Hardwood: A New Parser for Apache Parquet

New “Hardwood” promises faster data reads; fans cheer while skeptics demand proof

Key Points

Hottest takes

Save News