February 17, 2026
Checkmates and hot takes collide
Chess engines do weird stuff
AI chess learns by peeking ahead — and the comments are screaming about NSFW links
TLDR: Chess engines are ditching endless self-play to copy what their own search finds, even adjusting on the fly and using a wild random-tweak method to win more. Comments erupted over a “chess is solved” claim, why SPSA beats fancier methods, and surprise NSFW warnings — proving the drama is half the fun
The nerds say chess engines just got weirder — and the crowd is loving the chaos. Instead of grinding millions of self-play games (reinforcement learning), devs are copying what their own search finds: the engine looks ahead, sees better moves, and the model learns from that. One commenter dropped a response from the Viridithas author, and suddenly the thread turned into a watch party. Bonus twist: some engines now adjust mid-game, correcting their own bias on the fly. And there’s a wild “shake the weights at random and keep what wins” method (SPSA) that can add +50 Elo — basically a free upgrade.
But the real show? Drama in the comments. One user declared, “chess is solved,” claiming modern Stockfish is unbeatable, which sparked instant pushback and a thousand eye-rolls. Another wondered why top engines still use SPSA instead of fancy-sounding tools like Bayesian or evolutionary algorithms — cue old-school devs replying: if it wins, it stays. Meanwhile, multiple users yelled NSFW alert about the linked homepage, turning a chess thread into a workplace hazard zone.
The vibe: amazed that “learning by peeking” beats expensive training, curious about “live-learning” mid-match, and giggling at the “defiantly NSFW” typo while debating if chess is actually solved (spoiler: it isn’t). It’s engine wizardry meets internet circus — check and drama
Key Points
- •Search contributes far more Elo (~1200) than differences between model qualities (~200), enabling effective distillation from weak-model-plus-search into a strong model without extensive RL self-play.
- •lc0’s BT4 was trained via distillation and reportedly performed worse when reintroduced into an RL loop, suggesting RL may be unnecessary after an initial strong model exists.
- •Stockfish implements a runtime calibration technique (PR #4950) that adjusts neural evaluations based on discrepancies with search, adapting outputs to the current position.
- •To directly optimize for winning, lc0 employs SPSA, perturbing weights and selecting the better-performing direction, achieving about +50 Elo on small models at significant computational cost.
- •SPSA also tunes engine heuristics in C++, such as setting a checkmate-detection depth backoff to ~1.09, yielding ~5 Elo gains by optimizing numeric constants.