Why Are Large Language Models So Terrible at Video Games?

AI can write code, but gamers say it still plays like a clueless button-masher

TLDR: Researchers say today’s chatbot-style AI is still terrible at most video games, even while getting much better at writing code. Commenters weren’t shocked at all: many mocked the idea as obvious, saying text-trained AI was never built for split-second play and screen-based thinking.

The latest reality check for artificial intelligence is deliciously simple: it may be able to write software, but hand it a video game controller and the internet says it turns into that one friend who has never touched a console before. In the IEEE Spectrum interview, researcher Julian Togelius flat-out says today’s large language models — the chatty kind of AI behind chatbots — are awful at games, even when they shine at coding. His blunt line that they "absolutely suck" lit up the discussion, but commenters were even less patient.

The strongest reaction was basically: why is anyone surprised? One poster mocked the article’s "it’s super weird" framing and snapped that code is a language, while playing a fast-moving game obviously isn’t. Another piled on with the now-classic common-sense roast: these are language models, so of course they struggle with joystick timing, moving pictures, and spatial awareness. Translation for non-tech readers: AI is good at text because it trained on mountains of text; games demand quick reactions and understanding space on screen, which is a whole different beast.

Still, not everyone cared. One commenter shrugged, asking whether it even matters if machines are bad at games, since games are made for humans anyway. Others got nostalgic, missing the older era of AI research where bots hilariously broke Atari games in bizarre ways. And yes, there was meme energy too: the mysterious "cough JEPA cough" drop landed like a nerdy subtweet, as if to say, somebody thinks they already know the missing piece. The vibe? Equal parts dunking, eye-rolling, and popcorn-worthy "told you so."

Key Points

  • The article says large language models have advanced rapidly on many benchmarks but still perform very poorly at video games.
  • Julian Togelius argues that coding is easier for LLMs because it is highly structured and provides immediate, granular feedback such as compile errors and test results.
  • The article states that general game AI remains unsolved, and even systems like AlphaZero require retraining and reengineering for each game.
  • Togelius says performance differences across games are partly driven by data availability, with titles like Minecraft and Pokémon having far more public gameplay material than lesser-known games.
  • According to Togelius, LLMs adapted to general video-game benchmark frameworks perform worse than simple search algorithms because they were not trained on those games and are weak at spatial reasoning.

Hottest takes

"No, it's really not" — danaris
"Its almost like the Large Language Model has trouble with things that arent Language" — voidUpdate
"Video games are made to entertain humans, so does it really matter" — jiehong
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.