June 1, 2026
Rounding errors, comment chaos
Bias Compounds, Variance Washes Out
Tiny math mistakes caused big trouble — and the comments had jokes, side-eyes, and one grammar roast
TLDR: The big takeaway: a smarter way of rounding tiny numbers helped a cheaper AI training setup perform almost as well as the expensive version. Commenters turned it into a three-way show of surprise, AI-writing suspicion, and “wait, isn’t this just dithering?” skepticism.
A post with the very serious title Bias Compounds, Variance Washes Out somehow turned into a mini comment-section variety show. The basic idea, in plain English: when a computer keeps rounding tiny numbers the same wrong way, those little mistakes pile up and wreck learning. But if the rounding is a little random, the mistakes can cancel each other out. In the author’s tests, that “random rounding” let a cheaper, lower-precision setup come shockingly close to the much heavier full-precision version.
And yes, the community immediately made it weird. One of the funniest reactions came from someone who clicked in expecting a debate about human bias and got a math lesson instead. Another commenter swerved completely off the science and launched a surprisingly spicy critique of the writing style itself, calling out phrases like “bias compounds” and “variance diffuses” as an LLM tell — basically accusing the prose of sounding suspiciously AI-ish because the concepts were written like tiny action heroes doing things on their own. Meanwhile, a third commenter boiled the whole thing down to: isn’t this just dithering? That gave the thread a classic internet flavor: one person impressed, one person nitpicking the vibes, one person asking if the “new” trick is actually an old trick in disguise.
The drama underneath the math is real: the author says fixing this rounding bias can make a lightweight setup perform almost like the expensive one, which matters because training AI is brutally costly. But the comments prove the real law of the internet: even when the numbers are the headline, the reactions are the plot.
Key Points
- •The article argues that deterministic rounding bias compounds linearly over repeated small updates, while zero-mean stochastic rounding error grows more slowly as a random walk.
- •In a BF16 example, adding 0.001 to 1.0 one thousand times stays at 1.0 under round-to-nearest but reaches 2.0 in expectation under stochastic rounding.
- •An experiment on a small MLP using HeavyBall’s AdamW and ECC implementation found BF16 + stochastic rounding nearly matched an fp32 baseline while using 6 bytes instead of 12 bytes for parameters and optimizer moments.
- •Plain BF16 + round-to-nearest was reported as the worst configuration, with optimizer-state bias causing loss to plateau about an order of magnitude above baseline despite fp32 parameters.
- •A correction says earlier ECC results were distorted by a torch.compile fusion removing a bf16 round-trip, and the experiment was rerun with corrected code.