The Effective Sample Size

Turns out your “huge dataset” can secretly shrink to almost nothing

TLDR: The post says reweighting data can fix bias but may slash your real usable sample to almost nothing, making results much shakier than they look. In the comments, the mood is understated but smugly academic, with readers immediately pointing to even deeper stats lore.

A mathy blog post about data weighting somehow turned into a tiny but spicy comment-section moment: the author’s big reveal is that even when you “fix” biased data the correct way, you may still pay a brutal price. In plain English, you can start with a mountain of data, but if only a few points get most of the importance, your so-called giant sample behaves more like a sad little handful. That number has a name: effective sample size — basically, how many data points are actually pulling their weight.

The article’s most gasp-worthy stat is pure tabloid bait: with only a modest shift, the top 1% of observations can carry 37% of the total weight, and the amount of truly useful data drops below 2%. Translation: your dataset may be giving “millions of rows” energy while delivering “three guys in a trench coat” reality. The author walks through why this happens and why it matters for things like old training data and reinforcement learning, where stale data can become dangerously overtrusted.

And the community? Minimalist, nerdy, and very on-brand. The main comment coolly drops a Wikipedia link as if to say, ‘nice post, but the rabbit hole goes deeper.’ It’s less flame war, more academic mic-drop. The vibe is classic internet expert culture: no yelling, just a single elegantly deployed link that quietly escalates the whole conversation.

Key Points

  • The article states that correcting covariate shift with exact importance weights can remove bias but increases estimator variance.
  • The article uses a Gaussian example with source \(N(0,1)\) and target \(N(\mu,1)\), where the correct weight is \(e^{\mu x - \mu^2/2}\).
  • At a shift of \(\mu = 2\), the article says the heaviest 1% of observations carry 37% of the total weight and the effective usable fraction of data falls below 2%.
  • The article defines Kish’s effective sample size for normalized weights as \(n_{\mathrm{eff}} = 1 / \sum_i \alpha_i^2\).
  • The article derives the same effective sample size quantity from both variance calculations for weighted Gaussian averages and Hoeffding concentration bounds for bounded variables.

Hottest takes

"See also" — esafak
"Regression effective degrees of freedom" — esafak
"the rabbit hole goes deeper" — esafak
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.