February 28, 2026
Swish, Swish, Elo Wish
Better Activation Functions for NNUE
Chess AI ‘Swish’ Sparks Nerd War: Is the Elo bump real
TLDR: A chess engine swapped in a smoother “Swish” switch and, after penalizing noisy neurons, gained real strength on the board. The community is split: fans credit Swish, skeptics say the regularization trick did the heavy lifting, and memes are roasting the hand‑wavy explanations.
The neural network inside chess engine Viridithas just got a “Swish” glow‑up, and the comments section turned into a tech soap opera. The dev swapped the engine’s middle-layer “on/off switches” (called activation functions) for a smoother, trendy one named Swish, and after a rocky start—more lights turning on than the hardware loved—he “taxed” the noisy neurons to calm them down. Result: a cleaner evaluation scale and a real Elo boost, roughly +14 at blitz and +6 at longer games. Cue chaos.
The hype squad is yelling “Swish is the new meta!”, flexing graphs and the full write-up. Skeptics clap back: “It wasn’t Swish, it was the regularization—penalizing noisy activations did all the work.” Old-school engine fans demand weight clipping (keeping numbers in bounds), while machine learning folks roast the hand-wavy explanation for why dense activations spiked. Performance nerds obsess over “sparsity,” aka keeping most lights off for speed, after it fell from 70% to 50% before the fix.
Memes? Oh yes. “Swish swish, Elo fish,” “Hard-Swish sounds like an energy drink,” and “tax the rich neurons” are everywhere. Someone even launched a mini culture war: CReLU boomers vs Swish zoomers. And with a tease of “SwiGLU next?”, the crowd is already loading popcorn for the sequel.
Key Points
- •The author replaced SCReLU with Swish (Hard-Swish approximation, β=1/6) in L₁ and L₂ of Viridithas 19’s NNUE.
- •Initial Hard-Swish training reduced block-sparsity in L₁ (from ~70% to ~50%), harming inference due to denser activations.
- •Adding an L1 norm penalty on L₀ outputs restored and slightly improved block-sparsity relative to the unregularized SCReLU baseline.
- •Swish-based networks showed smoother evaluation distributions and achieved Elo gains over SCReLU: +13.77 ± 5.04 (short TC) and +5.90 ± 3.11 (long TC).
- •State-of-the-art elsewhere favored CReLU (L₁) and SCReLU (L₂), and Viridithas’s lack of post-L₁ weight clipping may interact with activation choices.