February 20, 2026
It’s-a me, Auto-Mario drama!
Testing Super Mario Using a Behavior Model Autonomously
Mario plays itself while comments brawl over RL, genetics, and “AI supremacy”
TLDR: An open-source bot auto-plays Super Mario by mutating button presses and checking behavior as it goes. Commenters split between calling it reinforcement-learning adjacent, a genetic/fuzzing mashup, or chatbot-sounding hype—while bold claims about “AI always wins in games” fuel the popcorn-worthy debate.
Super Mario just got a new test lead: a bot. Devs dropped an open-source demo that lets Mario play himself by flipping controller inputs, saving the best runs, and inching further right each try—then a behavior model checks if Mario’s doing the right thing on the fly. The code’s up for grabs in their repo, and the community immediately turned the comments into a spectator sport.
One camp is hyped that this feels like reinforcement learning—RL, the game-beating technique behind alpha-bots—without the math headache. As one fan summarized, this could jump-start RL by cherry-picking good "episodes" instead of training from zero. Another crew says nah, it smells like genetic algorithms (think evolve-the-best-button-mashes) with a side of classic fuzzing, and they’re side-eyeing any hand-wavy LLM-style explanations. Cue the spicy jab: “starts to sound like LLM.”
Then there’s the big proclamation: “AI beats humans in closed worlds—games and beyond. AlphaGo proved it.” That sent skeptics rolling their eyes—Mario isn’t Go, and testing isn’t the same as mastering. Meanwhile, jokesters dubbed it “Right + Jump: The Sequel,” and imagined speedrunners sweating as QA-bot discovers cursed routes. Whether you see RL cosplay or clever automation, one thing’s clear: Mario’s unionizing for more breaks from all this automated grinding.
Key Points
- •The article implements Antithesis’s mutation-based autonomous exploration to play Super Mario Bros.
- •An open-source implementation is provided in the TestFlows Examples/SuperMario repository (v2.0).
- •Inputs are encoded as bytes; bits represent keys, and bits are randomly flipped (~10%) using XOR to create sequence variations.
- •Because the game is deterministic, traveled input sequences are stored and replayed to continue exploration from known states.
- •Path selection uses a fitness function favoring greater x-axis progress to move toward level completion.