UK Biobank health data keeps ending up on GitHub

‘You can’t un-leak DNA’ – volunteers’ health data keeps popping up on GitHub and the internet is furious

TLDR: A massive UK health database meant to keep 500,000 people’s DNA and medical details safe keeps leaking onto GitHub, with critics saying you can’t ever “un‑leak” such data. Commenters are outraged that volunteers can’t see their own records while the same data is popping up online and even allegedly on Alibaba.

UK Biobank, a huge health project holding DNA and medical records for 500,000 Brits, is suddenly the main character on tech forums – and not in a good way. The big reveal: researchers keep accidentally uploading bits of people’s supposedly secret health data to GitHub, a public code-sharing site, and Biobank is racing around the internet firing off copyright takedown notices like a panicked whack‑a‑mole game.

Commenters are calling the whole setup “naive” at best and “reckless” at worst. One top‑liked take says you “can’t un‑leak medical data,” mocking the idea that a strict contract could ever keep half a million people’s DNA truly safe once 20,000 researchers have access. Another user casually found more exposed data in five minutes, turning the thread into a live “data breach speedrun.”

The real rage ignites when people point out the irony: volunteers can’t even easily see their own data, yet it’s turning up on GitHub and, according to one linked BBC report, even for sale on Alibaba. That twist sent the comments into full conspiracy‑meme mode, with people asking if we’re basically running a bargain‑bin DNA marketplace. Underneath the jokes, there’s a serious mood: if this is how one of the world’s “gold standard” health databases handles privacy, what hope is there for the rest of us?

Key Points

  • UK Biobank data has repeatedly been accidentally uploaded to public GitHub repositories despite strict non-sharing agreements with researchers.
  • A tracker using GitHub’s DMCA archive records 110 takedown notices targeting 197 repositories maintained by 170 developers worldwide.
  • Takedowns often target specific files; nearly half are Jupyter or R notebooks, and about a quarter are genetic/genomic files (PLINK, BOLT-LMM, BGEN).
  • The Guardian demonstrated re-identification of a volunteer from two data points, and a BMJ piece urges UK Biobank to take re-identification risks more seriously.
  • The first notice was filed in July 2025; notices paused in early 2026 and resumed after The Guardian’s investigation; UK Biobank relies on copyright takedowns due to no UK privacy equivalent to the DMCA.

Hottest takes

“You can’t un-leak medical data… there’s no getting the toothpaste back in the tube.” — michaelt
“The irony is, they don’t even provide the data to the participants themselves.” — mil22
“All 500,000 participants for sale on Alibaba…” — adwf
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.