March 26, 2026

Score or skew? The ARC‑ade drama begins

From 0% to 36% on Day 1 of ARC-AGI-3

36% on Day 1 has fans hyped, skeptics yelling “not official” and wallet-watchers cheering

TLDR: Symbolica says its Agentica toolkit hit 36% on ARC‑AGI‑3’s public puzzles for about $1,005, far above tiny baselines and much cheaper. Commenters split: some dismiss it as non‑official and easier practice levels, others hail clever “scaffolding” as the real win and ask if Agentica works beyond demos.

Symbolica just bragged their Agentica toolkit hit 36% on ARC-AGI-3—think mind-bending puzzle “games” meant to test reasoning—while spending around $1,005. That’s way higher than some step‑by‑step model baselines at 0.2–0.3% that reportedly cost $8,900. Cue confetti? Not so fast—the comments lit up like a scoreboard.

One camp is calling “technicality!” with users noting it “uses a harness,” basically a controller around the model, so it doesn’t qualify for the official leaderboard. The authors say the harness isn’t specific to this test, but the “leaderboard police” are already writing tickets. Another crowd points out this is the public practice set (25 problems) rather than the secret, harder private set used for real scoring. Translation: great warm‑up, not the championship.

Meanwhile, the meme brigade arrived with Goodhart’s Law: when you design to the test, the test stops testing. Others are downright bullish, arguing “scaffolding”—extra planning and tool‑use steps around the model—plus post‑training can push scores past 80%. One commenter even asked the big question: has anyone actually used Agentica in the wild yet?

So the vibe is equal parts victory lap and vibes check: a flashy, cost‑efficient run that turns heads, with skeptics poking holes in the “official” status and rigor. It’s the ARC‑ade, and everyone’s got quarters—debate included.

Key Points

  • Agentica SDK achieved an unverified 36.08% on ARC-AGI-3, passing 113/182 playable levels and completing 7/25 games.
  • Performance and cost compared to CoT baselines: 0.2% (Opus 4.6 Max) and 0.3% (GPT 5.4 High) vs. Agentica’s 36.08%.
  • Reported cost: Agentica’s 36.08% for $1,005 vs. Opus 4.6’s 0.25% for $8,900.
  • A figure analyzes score and cost per task on the ARC-AGI-3 public evaluation set using Opus 4.6 (120k) High with Agentica.
  • Code and cost details are available in the GitHub repository symbolica-ai/ARC-AGI-3-Agents.

Hottest takes

"this uses a harness so it doesn’t qualify for the official ARC-AGI-3 leaderboard" — lairv
"the public set is materially easier than the private set" — modeless
"we constantly underestimate the power of inference scaffolding" — bytesandbits
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.