January 4, 2026
Love at first X-O
Learning to Play Tic-Tac-Toe with Jax
Jax teaches a laptop to master Tic‑Tac‑Toe in 15 seconds—fans cheer, skeptics smirk
TLDR: A clear Jax tutorial teaches a neural net to play perfect Tic‑Tac‑Toe in roughly 15 seconds, with code on GitHub and a slower Colab. Comments cheered the step‑by‑step approach while a small crowd called it overkill, sparking playful jokes about speedruns and “team X vs O” confusion.
A tutorial that trains a neural net—software that learns by trial and error—to play Tic‑Tac‑Toe in Jax (Google’s speedy math toolkit) landed with a thud of excitement. It promises perfect play in about 15 seconds on a laptop, with code in GitHub and a slower Colab. PGX, a Jax game library, handles the rules so you can focus on the learning part, and yes, the “Player 0 isn’t always X” twist sparked playful confusion. The vibe? Teaching-first, no gatekeeping, and people loved it. One fan gushed about the clear, fully written walkthrough, another is already plotting to train the bot on other games. But the peanut gallery did show up. The hottest mini-debate: is reinforcement learning—teaching a machine by rewarding good moves—total overkill for a kid’s game? The defenders snapped back: “It’s a perfect starter project, chill.” Jokes flew about the 15‑second speedrun (“JAX to the future!”), while the slower Colab became a meme: “Cloud gym day—no gains.” The usual “Jax vs. anything else” sparks tried to ignite, but most commenters were just happy to have a clean, runnable playbook. Bottom line: fun, fast, and surprisingly dramatic for nine squares.
Key Points
- •A neural network is trained to play Tic-Tac-Toe with reinforcement learning in JAX.
- •PGX represents game state via a State dataclass with fields for current_player, observation, legal_action_mask, rewards, and terminated/truncated.
- •Observations are boolean arrays of shape (3, 3, 2) for current and opponent pieces; axes swap each turn as current_player changes.
- •Actions are square indices numbered left-to-right, top-to-bottom; PGX’s step function applies actions to transition states.
- •Code is available on GitHub and in a Google Colab notebook; training can reach perfect play in about 15 seconds on a laptop.