February 27, 2026
Carry drama, not just digits
Smallest transformer that can add two 10-digit numbers
Tiny AI adds big numbers—and commenters feud over what’s “real”
TLDR: New record: a tiny AI adds 10-digit numbers with 36 hand‑coded parameters (100%) and 311 learned (near perfect). Commenters feud over what counts as real AI, joke it’s one matrix multiply, and argue about an unverified 28‑param claim—showing why ultra‑efficient, transparent models matter.
The internet is speedrunning long addition, and the leaderboard is wild: hand‑coded entries nail perfect 100% with just 36 parameters (think ultra‑tiny settings), while trained models learn their way down to 311 with near‑perfect accuracy. But the real show is the comments, where the crowd argues what even counts as “AI.” One skeptic, amelius, lays down the rules of legitimacy—if you can swap weights and reuse the same code, it’s a model; if the code is the trick, it’s not—then jabs, “Why not just write the code themselves?” Ouch. Meanwhile, medi8r swats the hype with, “You can do that in a single matmul,” meaning one big multiply—translation: calm down, it’s math, not magic. Then E‑Reverance tosses a grenade: a 28‑parameter claim that has everyone side‑eyeing the rules and demanding proofs. ks2048’s practical challenge: can those elegant hand‑coded designs be learned from scratch, or are they just beautifully rigged contraptions? And the meme energy is strong—1over137 quips, “Now wrap it all in an Electron app!” imagining a tiny math brain shipped in a hilariously bloated desktop wrapper. The mood is half science fair, half rules‑lawyering: is this genius minimal AI, or just clever arithmetic cosplay?
Key Points
- •AdderBoard challenges building the smallest transformer to add two 10‑digit numbers with ≥99% accuracy on a held‑out 10K test set.
- •The leaderboard tracks two categories: Trained (weights learned with a generic algorithm) and Hand‑coded (analytically set weights).
- •The best hand‑coded result achieves 100% accuracy with only 36 parameters using a compact 2‑layer decoder and ALiBi-based positional weighting.
- •The best trained result reaches 99.999% accuracy with 311 parameters using rank‑3 factorization, tied/shared projections, RMSNorm, and grokking.
- •The challenge originated from “Addition Under Pressure,” where initial tool-generated baselines were 6,080 (Claude Code) and 1,644 (Codex) parameters; community solutions reduced sizes significantly.