Case study: recovery of a corrupted 12 TB multi-device pool

One power cut, 14 custom tools, and a comment‑section riot

TLDR: A sudden power loss trashed a 12TB Btrfs setup; the author built 14 tools to recover nearly everything and shared fixes. The comments split between applauding the rescue and blasting Btrfs reliability, with an AI‑authorship accusation and calls for an official developer response—why this matters for anyone trusting their data to it.

A Linux tinkerer posted a blow‑by‑blow of rescuing a 12TB storage pool after a sudden power cut. The built‑in Btrfs repair ran in circles—46,000+ tries—so they wrote 14 custom C tools to stitch the data back together. Result: about 7.2MB lost out of 4.59TB. They shared a deep analysis and the tools, stressing this was meant to help, not blame.

And then the comments exploded. The loudest vibe: if your “production‑ready” system needs 14 homebrew tools after a power blip, hard pass. One user begged, “please don’t be btrfs,” another demanded, how is this not a bug if a power cycle can lose the whole pool? There’s applause for the engineering marathon, but just as much fear that trust was shaken. People want big, bold warnings and quick, official fixes.

Drama twist: one commenter claimed “this is obviously LLM output” and begged for a Btrfs developer to weigh in. Memes flew—“14‑tool DLC for your file system”—as the storage wars reignited. Is Btrfs a brave new future or a science fair project that bites back? The author stays polite and constructive; the crowd wants clear answers, guarantees, and fewer heroics next time. Popcorn, anyone?

Key Points

  • A hard power cycle corrupted a three-device Btrfs pool’s extent and free space trees (data single, metadata DUP on DM-SMR disks).
  • `btrfs check --repair` entered an infinite loop (46,000+ commits) with no progress and rotated `backup_roots` past rollback points.
  • Recovery was achieved using 14 custom C tools built on the internal `btrfs-progs` API, with ~7.2 MB data loss out of 4.59 TB (~0.00016%).
  • The author provides a full analysis and proposes nine upstream improvements across repair logic, delayed refs, extent tree handling, and internal operations.
  • A reference implementation and a one-line patch to `alloc_reserved_tree_block` are published; tools default to read-only with opt-in `--write`.

Hottest takes

at the cost of 14 custom C tools is a hard pass from me — phoronixrly
in what possible situation is it not a bug that a power cycle can lose the pool? — yjftsjthsd-h
This is obviously LLM output — Retr0id
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.