December 29, 2025
Inbox Hoarders vs Delete Squad
Incremental Backups of Gmail Takeouts
Dev chops 20 years of Gmail into tiny pieces; commenters yell “just delete it”
TLDR: A developer split Gmail’s giant export into small pieces so backups add only new emails. Commenters clash: some say it’s vital because Google accounts can get locked, others say just delete old mail, and a third camp argues backup tools already solve this—plus, everyone wants a simple SQLite option.
A longtime Gmail user with 20 years of messages (just 5.7GB!) built a tool to turn Google’s chaotic Gmail Takeout file into bite‑size chunks so backups only add what’s new. The twist: the giant email export arrives as one jumbled file, so each backup looks like a whole new monster. His fix splits on the “From” line and tags each piece with a fingerprint, so you can rebuild the original while saving only the fresh bits.
The comments? Pure popcorn. Team Backup shouts “Google can lock you out—protect your inbox!” while Team Marie Kondo waves: “Do you even need emails from five years ago?” One minimalism mic drop: “have you ever needed an email from even 5 years ago?” Fans of the backup app Restic barged in: “Restic will do hash-based chunking,” implying the whole thing might be overengineering, sparking a nerdy skirmish over whether Takeout’s random order ruins dedup magic.
Then the DIY brigade flexed: one commenter runs an old laptop with Thunderbird grabbing mail via POP3, calling it “set it to spew out POP3.” And the crowd-pleaser wish list: “Wouldn’t it be nice if Google just dumped the takeout into a sqlite file?” The vibe: backup or purge, boomer inbox vs. digital declutter, and a dash of “men will write a chunker instead of unsubscribing.”
Key Points
- •Gmail Takeout exports the mailbox as a single text-based mbox file, totaling 5.7 GiB for ~20 years of email.
- •Successive Takeout exports are not append-only, making incremental backups with tools like restic inefficient.
- •An initial attachment-extraction approach worked but required complex parsing due to email format variability.
- •The adopted method splits on leading 'From ' lines, stores each chunk by its MD5 hash, and records the chunk sequence for exact mbox reconstruction.
- •The approach produced ~99.8K chunks for ~50.6K threads and suggests reducing chunking frequency to mitigate filesystem concerns for larger accounts.