Testing Super Mario Using a Behavior Model Autonomously

Mario plays itself while comments brawl over RL, genetics, and “AI supremacy”

TLDR: An open-source bot auto-plays Super Mario by mutating button presses and checking behavior as it goes. Commenters split between calling it reinforcement-learning adjacent, a genetic/fuzzing mashup, or chatbot-sounding hype—while bold claims about “AI always wins in games” fuel the popcorn-worthy debate.

Super Mario just got a new test lead: a bot. Devs dropped an open-source demo that lets Mario play himself by flipping controller inputs, saving the best runs, and inching further right each try—then a behavior model checks if Mario’s doing the right thing on the fly. The code’s up for grabs in their repo, and the community immediately turned the comments into a spectator sport.

One camp is hyped that this feels like reinforcement learning—RL, the game-beating technique behind alpha-bots—without the math headache. As one fan summarized, this could jump-start RL by cherry-picking good "episodes" instead of training from zero. Another crew says nah, it smells like genetic algorithms (think evolve-the-best-button-mashes) with a side of classic fuzzing, and they’re side-eyeing any hand-wavy LLM-style explanations. Cue the spicy jab: “starts to sound like LLM.”

Then there’s the big proclamation: “AI beats humans in closed worlds—games and beyond. AlphaGo proved it.” That sent skeptics rolling their eyes—Mario isn’t Go, and testing isn’t the same as mastering. Meanwhile, jokesters dubbed it “Right + Jump: The Sequel,” and imagined speedrunners sweating as QA-bot discovers cursed routes. Whether you see RL cosplay or clever automation, one thing’s clear: Mario’s unionizing for more breaks from all this automated grinding.

Key Points

  • The article implements Antithesis’s mutation-based autonomous exploration to play Super Mario Bros.
  • An open-source implementation is provided in the TestFlows Examples/SuperMario repository (v2.0).
  • Inputs are encoded as bytes; bits represent keys, and bits are randomly flipped (~10%) using XOR to create sequence variations.
  • Because the game is deterministic, traveled input sequences are stored and replayed to continue exploration from known states.
  • Path selection uses a fitness function favoring greater x-axis progress to move toward level completion.

Hottest takes

“There’s a ton of crossover between your method and RL” — janalsncm
“AI is much more powerful than human in the closed fields” — wa008
“starts to sound like LLM” — DevelopingElk
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.