Show HN: Docker pulls more than it needs to - and how we can fix it

Tiny changes, massive downloads: users rant, robots cry, and Docker’s motives get grilled

TLDR: A Show HN project promises smarter Docker downloads that fetch only file changes, saving bandwidth. Commenters split: horror stories of timeouts on slow lines, others say Docker has partial fixes like COPY --link, and cynics argue Docker’s business incentives favor inefficiency.

A Show HN post lit up devland by claiming Docker re-downloads mountains of data for tiny tweaks—like adding one byte or installing vim—and folks building for robots, farms, and warehouse devices shouted “same!” One commenter, PaulHoule, dropped a flashback: slow 2 Mbps internet meant Docker images just timed out, turning updates into rage-quits. The proposed fix? A smarter pull that knows file changes, not just layers, plus a registry that deduplicates identical files so we don’t pay for duplicate Python binaries trapped in tarballs. Cue drama. Some users cheered, while others rolled their eyes and said: the future already exists—in bits. theamk name-dropped OSTree and casync (tools that grab only what's different) as an “aren’t we there yet?” moment. danudey countered with “use what Docker has,” pointing to COPY --link (docs) as a partial, drop-in fix. But the spiciest take came from Havoc, who argued this is a business strategy problem: Docker’s in no rush to cut bandwidth when being the hub is their leverage. Memes flew: “My robot pulled a gigabyte to get vim,” “datacenters in space need thrift,” and “teach the tool, not every dev 24 Docker tricks.” The vibe: pain, hope, and a side of capitalism.

Key Points

  • The article identifies inefficiency in Docker’s layer-based pulls, where small changes invalidate subsequent layers and trigger large re-downloads.
  • It proposes making Docker pulls file-aware so only changed files are transferred, reducing bandwidth and storage waste.
  • Reproducible builds with compilers like clang can produce identical outputs, enabling effective deduplication across images.
  • Bandwidth-constrained environments (e.g., field robots and remote devices) are especially impacted by current Docker pull behavior.
  • Registries could coalesce identical files (e.g., Python 3.10) to reduce storage costs, and the authors are building a smarter docker pull solution.

Hottest takes

“any attempt to download images would time out” — PaulHoule
“Docker already supports this to a degree” — danudey
“they’re actively incentivized to not make this more efficient” — Havoc
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.