April 9, 2026

When math beef meets meme brackets

The Training Example Lie Bracket

Turns out the order you feed data can change your model—cue math flexes and meme fights

TLDR: A new demo shows how swapping the order of two training examples nudges an AI’s behavior, turning “data order” into a measurable effect. Commenters split between hype for practical uses like smarter batching and skepticism that the math only fits toy setups, not real-world, messy training.

Machine-learning math just dropped: a demo computes a “Lie bracket” — basically a way to measure how much an AI changes if you show training example A before B instead of B before A. The team ran it on a celeb-photo classifier and built an interactive slider to visualize how outputs shift when you swap two pictures. Geeky? Yes. But the comments turned it into a circus.

Practical minds jumped in first: one reader asked if this could power smarter batch filtering, quietly rearranging or skipping examples that make training messy. Then the math crowd arrived, flexing hard — one commenter deadpanned that eventually ML will “discover fiber bundles” (translation: there’s even fancier math behind this). Meanwhile, the jokesters demanded a literal “tournament bracket of best lies,” pun fully intended.

The drama peaked with skepticism: a sharp-eyed reader noted the setup assumes single-example updates, while real training uses mini-batches and optimizer tricks like momentum (think “push from previous steps”), which could break the neat math. Another invoked statistician Andrew Gelman to argue that, in real science, order and context always matter anyway. Bottom line: some see a new tool for designing training curricula; others call it a cool toy until it works on bigger, messier setups.

Key Points

  • The article models each training example as a vector field on parameter space, enabling computation of Lie brackets between examples.
  • The Lie bracket quantifies the difference in final parameters when two training examples are applied in different orders (to second order in learning rate).
  • Linearity implies the effect of swapping two minibatches equals the average of pairwise brackets between their examples.
  • An MXResNet convnet is trained on CelebA for 5,000 steps with Adam (lr=5e-3, betas=(0.8, 0.999), batch size 32), predicting 40 binary attributes.
  • Per-parameter Lie brackets are computed at multiple checkpoints and visualized to show how logits for a test batch are perturbed by swapping example order.

Hottest takes

"they define the induced vector field ... in terms of batch-size 1 SGD" — Majromax
"Eventually ML folks will discover fiber bundles." — measurablefunc
"Was hoping for a tournament bracket of best lies" — willrshansen
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.