Better Activation Functions for NNUE

Chess AI ‘Swish’ Sparks Nerd War: Is the Elo bump real

TLDR: A chess engine swapped in a smoother “Swish” switch and, after penalizing noisy neurons, gained real strength on the board. The community is split: fans credit Swish, skeptics say the regularization trick did the heavy lifting, and memes are roasting the hand‑wavy explanations.

The neural network inside chess engine Viridithas just got a “Swish” glow‑up, and the comments section turned into a tech soap opera. The dev swapped the engine’s middle-layer “on/off switches” (called activation functions) for a smoother, trendy one named Swish, and after a rocky start—more lights turning on than the hardware loved—he “taxed” the noisy neurons to calm them down. Result: a cleaner evaluation scale and a real Elo boost, roughly +14 at blitz and +6 at longer games. Cue chaos.

The hype squad is yelling “Swish is the new meta!”, flexing graphs and the full write-up. Skeptics clap back: “It wasn’t Swish, it was the regularization—penalizing noisy activations did all the work.” Old-school engine fans demand weight clipping (keeping numbers in bounds), while machine learning folks roast the hand-wavy explanation for why dense activations spiked. Performance nerds obsess over “sparsity,” aka keeping most lights off for speed, after it fell from 70% to 50% before the fix.

Memes? Oh yes. “Swish swish, Elo fish,” “Hard-Swish sounds like an energy drink,” and “tax the rich neurons” are everywhere. Someone even launched a mini culture war: CReLU boomers vs Swish zoomers. And with a tease of “SwiGLU next?”, the crowd is already loading popcorn for the sequel.

Key Points

•The author replaced SCReLU with Swish (Hard-Swish approximation, β=1/6) in L₁ and L₂ of Viridithas 19’s NNUE.
•Initial Hard-Swish training reduced block-sparsity in L₁ (from ~70% to ~50%), harming inference due to denser activations.
•Adding an L1 norm penalty on L₀ outputs restored and slightly improved block-sparsity relative to the unregularized SCReLU baseline.
•Swish-based networks showed smoother evaluation distributions and achieved Elo gains over SCReLU: +13.77 ± 5.04 (short TC) and +5.90 ± 3.11 (long TC).
•State-of-the-art elsewhere favored CReLU (L₁) and SCReLU (L₂), and Viridithas’s lack of post-L₁ weight clipping may interact with activation choices.

Hottest takes

"Swish didn’t win Elo, the sparsity tax did" — quant_karen

"Hand-waving isn’t science—show the math or it’s just vibes" — proof_or_GTFO

"Swish is the new meta; CReLU boomers in shambles" — blitz_bro

February 28, 2026

Swish, Swish, Elo Wish

Chess AI ‘Swish’ Sparks Nerd War: Is the Elo bump real

TLDR: A chess engine swapped in a smoother “Swish” switch and, after penalizing noisy neurons, gained real strength on the board. The community is split: fans credit Swish, skeptics say the regularization trick did the heavy lifting, and memes are roasting the hand‑wavy explanations.

Key Points

Hottest takes

February 28, 2026

Swish, Swish, Elo Wish

Better Activation Functions for NNUE

Chess AI ‘Swish’ Sparks Nerd War: Is the Elo bump real

TLDR: A chess engine swapped in a smoother “Swish” switch and, after penalizing noisy neurons, gained real strength on the board. The community is split: fans credit Swish, skeptics say the regularization trick did the heavy lifting, and memes are roasting the hand‑wavy explanations.

Key Points

Hottest takes

Save News