Incremental Backups of Gmail Takeouts

Dev chops 20 years of Gmail into tiny pieces; commenters yell “just delete it”

TLDR: A developer split Gmail’s giant export into small pieces so backups add only new emails. Commenters clash: some say it’s vital because Google accounts can get locked, others say just delete old mail, and a third camp argues backup tools already solve this—plus, everyone wants a simple SQLite option.

A longtime Gmail user with 20 years of messages (just 5.7GB!) built a tool to turn Google’s chaotic Gmail Takeout file into bite‑size chunks so backups only add what’s new. The twist: the giant email export arrives as one jumbled file, so each backup looks like a whole new monster. His fix splits on the “From” line and tags each piece with a fingerprint, so you can rebuild the original while saving only the fresh bits.

The comments? Pure popcorn. Team Backup shouts “Google can lock you out—protect your inbox!” while Team Marie Kondo waves: “Do you even need emails from five years ago?” One minimalism mic drop: “have you ever needed an email from even 5 years ago?” Fans of the backup app Restic barged in: “Restic will do hash-based chunking,” implying the whole thing might be overengineering, sparking a nerdy skirmish over whether Takeout’s random order ruins dedup magic.

Then the DIY brigade flexed: one commenter runs an old laptop with Thunderbird grabbing mail via POP3, calling it “set it to spew out POP3.” And the crowd-pleaser wish list: “Wouldn’t it be nice if Google just dumped the takeout into a sqlite file?” The vibe: backup or purge, boomer inbox vs. digital declutter, and a dash of “men will write a chunker instead of unsubscribing.”

Key Points

  • Gmail Takeout exports the mailbox as a single text-based mbox file, totaling 5.7 GiB for ~20 years of email.
  • Successive Takeout exports are not append-only, making incremental backups with tools like restic inefficient.
  • An initial attachment-extraction approach worked but required complex parsing due to email format variability.
  • The adopted method splits on leading 'From ' lines, stores each chunk by its MD5 hash, and records the chunk sequence for exact mbox reconstruction.
  • The approach produced ~99.8K chunks for ~50.6K threads and suggests reducing chunking frequency to mitigate filesystem concerns for larger accounts.

Hottest takes

“have you ever needed an email from even 5 years ago?” — SanjayMehta
“Restic will do hash-based chunking” — yooogurt
“Wouldn’t it be nice if Google just dumped the takeout into a sqlite file?” — tehlike
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.