Scientific datasets are riddled with copy-paste errors

Landmark Parkinson’s mouse data shows dupes — commenters are split between outrage and “Excel happens”

TLDR: A scan of public datasets flagged duplicated numbers in a widely cited Parkinson’s mouse study, shaking confidence in a headline‑grabbing “it starts in the gut” claim. Commenters are split between calling for retractions and audits, blaming human error and bad QA, and rallying for open data plus automated checks.

Internet sleuths are side‑eyeing a blockbuster Parkinson’s mouse study after a new error‑sniffing tool found suspicious copy‑pasted numbers hiding in its raw data on Dryad for eight years. The paper claimed Parkinson’s might start in the gut, racked up 3,000+ citations, and rode a media wave — now the raw numbers show identical sequences where different mice should be, with duplicates making up nearly half of some groups. The lab hasn’t replied since the issue was flagged in January, and that silence is pouring gasoline on the thread.

Cue the comments cage match. One camp is yelling “retract and audit everything!”, seeing this as Exhibit A in the reproducibility crisis. Another camp, led by a measured take from steve_adams_86, says science is messy, lab workflows are bespoke, and honest spreadsheet blunders happen. A third faction cheers the toolmaker as a new folk hero for scanning 600 datasets and flagging 18 eyebrow‑raisers, including an “ostrich‑snake mix‑up” that spawned more memes than a herpetology subreddit.

Memes bloomed: Excel in a lab coat, “Ctrl+C, Ctrl‑Scandal,” and a groaner — “so the gut story was a gut feeling?” Pragmatists argue the finding could still be real, but small sample sizes + duplicated rows makes trust wobble. Consensus, if there is one: open data is good, automated checks are overdue, and peer review needs fewer vibes, more verification. Until the authors speak, the community has receipts — and punchlines.

Key Points

  • A software tool scanned 600 open-access datasets and identified 18 cases with serious concerns, including a highly cited Parkinson’s study.
  • The Parkinson’s dataset on Dryad contained duplicated sequences in mouse motor function data across groups (SPF and ExGF) and within germ-free wild-type data.
  • Duplicated rows constituted 50% of SPF and 42% of ExGF samples in the affected measures, heightening impact due to low sample sizes.
  • The Parkinson’s study claimed gut microbiome involvement in Parkinson’s-like symptoms; anomalies could be from editing error or tampering, but no conclusion is drawn.
  • The issue was reported in January with no author response; another case (“ostrich-snake mixup”) also showed duplicated cells in toxin resistance data involving Ouabain and Na,K-ATPase.

Hottest takes

“legitimately so challenging to avoid” — steve_adams_86
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.