ARC-AGI-3

ARC-AGI-3: New AI IQ game drops, humans rage-click and ask “what do I even do”

TLDR: ARC-AGI-3 is a new game-like test to see if AI can learn and adapt like humans, scoring efficiency over time instead of one-off answers. The community’s split between calling it a smart way to measure real learning and roasting it as confusing, repetitive benchmarking—complete with “I’m not AGI” memes.

ARC-AGI-3 just landed with a bold promise: a “play-to-learn” gauntlet that measures whether AI can adapt like humans—no hand-holding, no cheat sheets, just figure-it-out-as-you-go. But the internet’s first reaction? Confusion. Loud, funny, meme-powered confusion. One brave soul clicked the first level at this link and confessed they “couldn’t begin to guess” what the point was. Another joked the only lesson they learned was that they aren’t artificial general intelligence.

The benchmark itself sounds big: human-solvable games, long-term planning, sparse feedback, and scores that track how fast you learn, not just whether you win. It’s designed to be easy for people to pick up, with clear goals and no hidden tricks—plus replayable runs and a dev toolkit. Yet the vibe in the comments is, well, “WTF is the goal?” and “Why does every new benchmark spawn another benchmark?” Cue one cynic predicting we’ll be reading about ARC-AGI-26 in the year 2057.

That’s the drama: Is this a real intelligence test or just another confusing puzzle pack? Fans say measuring adaptation over time is exactly what separates humans from today’s chatty, test-cramming bots. Skeptics want proof it’s not just novelty for novelty’s sake—and, hilariously, some humans can’t clear the “human-easy” levels. The emerging meme: AGI stands for “Actually Getting It,” and right now, neither bots nor a bunch of us meat-sacks are passing with flying colors.

Key Points

  • ARC-AGI-3 is an interactive reasoning benchmark for AI agents operating in novel environments.
  • Agents must learn from experience within each environment without relying on natural-language instructions.
  • A 100% score means agents beat every game as efficiently as humans.
  • The benchmark measures time-based skill acquisition, long-horizon planning with sparse feedback, and multi-step adaptation.
  • ARC-AGI-3 provides replayable runs, a developer toolkit for agent integration, and a UI for transparent evaluation.

Hottest takes

“couldn’t begin to guess what I was supposed to do” — CamperBob2
“Will ARC-AGI-26 hit the front page of Hacker News in 2057” — tasuki
“I am definitely not AGI” — typs
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.