TinyLoRA – Learning to Reason in 13 Parameters

TinyLoRA claims “reasoning” with 13 knobs — commenters say the elephant is already in there

TLDR: Researchers say they boosted a big AI’s math reasoning by tweaking just 13 parameters, sparking claims that “the smarts were inside all along.” The community is split between unlock-vs-train, RL-vs-finetune, and jokes about fitting elephants—while pragmatists tout small models plus great data as the real win.

The internet just spit out its coffee: a new paper, Learning to Reason in 13 Parameters, says a giant AI can hit high math scores by tweaking just 13 tiny settings — roughly 26 bytes. Cue chaos. One camp is yelling, “Reasoning was inside the model all along!” If a teeny change boosts logic, they argue, maybe these models already had the brainpower; TinyLoRA just flips a hidden switch. Another camp fires back that it only works with reinforcement learning (teaching by trial-and-reward), not simple fine-tuning, so skill still has to be earned, not unlocked. The nerd humor came fast: one commenter riffed, “with four parameters I can fit an elephant… with five I can make him wiggle his trunk,” turning the 13-parameter flex into a full-blown meme. Meanwhile, the practical crowd is like, keep calm and curate datasets — claiming small models (3–7B) with good reasoning data are already scary good, name-dropping cartesien.io and Salesforce’s WebscaleRL. The spiciest debate? Whether “reasoning” is real or just clever pattern tweaks. Fans say these results prove efficiency is king; skeptics say it’s cosmetic — impressive scores, but the same old parlor tricks. Either way, 13 parameters just dragged the whole field into a 🔥 fight over how much intelligence is learned versus revealed.

Key Points

  • TinyLoRA is proposed to scale low-rank adapters down to as few as one parameter, addressing limits of conventional LoRA.
  • Using TinyLoRA with RL, Qwen2.5-8B reaches 91% accuracy on GSM8K with only 13 trained parameters in bf16 (26 bytes).
  • Across harder benchmarks (AIME, AMC, MATH500), TinyLoRA recovers about 90% of performance gains while training 1000× fewer parameters.
  • Strong results are achieved only with reinforcement learning; SFT needs 100–1000× larger updates to match performance.
  • The study questions the necessity of even rank=1 LoRA for learning reasoning, introducing an alternative parameterization.

Hottest takes

"With four parameters I can fit an elephant… with five I can make him wiggle his trunk" — measurablefunc
"The quality… even with small models (3–7B is sweet spot) is incredible now" — a-t-c-g
"What we call 'reasoning' is latent within the model" — matt123456789
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.