January 4, 2026
Survival of the snarkiest
Evolution: Training neural networks with genetic selection achieves 81% on MNIST
Survival-of-the-fittest AI scores 81% on handwriting; commenters riot
TLDR: A dev’s “evolve the AI” project scored 81% on a popular handwriting test, skipping traditional training. Comments split between cheering the fresh approach and mocking the low score, with a fiery accusation of ChatGPT‑made code pushing calls for proper benchmarks and transparency.
An indie dev dropped GENREG, an “evolve-your-AI” experiment where the best models reproduce and the worst get cut, no math-heavy training. It hit 81% on the classic MNIST handwriting test in about 40 minutes, and the crowd immediately split. Fans cheered the throwback-to-Darwin approach, praising that training uses a graphics card but inference runs on low-end CPUs. Skeptics rolled their eyes: modern methods blast past 99% on MNIST, so 81% feels… beginner tier.
Then came the flamethrower: one commenter accused the dev of using ChatGPT to write the code and leaving a placeholder username in the repo, demanding proper explanations and citations. Others rushed in to defend: “It’s open-source, let them iterate,” while the hardliners insisted, “Show comparisons against standard training and tougher, real-world data.” The dev’s notes—like “child mutation” being crucial and averaging more samples to stabilize results—sparked memes. Cue jokes: “Swipe right on high-trust genomes,” “No grads, just chads,” and “Darwin meets digits.” The 100% on rendered alphabet? Dismissed as too easy.
Under the drama, curiosity survived: people are grabbing checkpoints, asking for head‑to‑head benchmarks, and pushing for 95% or bust. Evolution may be slow, but the comment section evolved into a full‑blown ecosystem
Key Points
- •GENREG trains neural networks via evolutionary trust-based selection without gradients or backpropagation.
- •On MNIST, a 784→64→10 MLP (50,890 params) achieved 81.47% test accuracy after ~600 generations (~40 minutes).
- •Per-digit MNIST accuracy ranges from 70.9% (digit 5) to 94.5% (digit 1), with detailed results provided.
- •An alphabet task (10,000→128→26) reached 100% test accuracy in ~1,800 generations on rendered letters A–Z.
- •Key findings: stabilizing fitness signals via more samples, child mutation driving exploration, and capacity constraints enabling efficient solutions.