April 15, 2026

Did your dusty laptop just clap ChatGPT?

CPUs Aren't Dead. Gemma2B Out Scored GPT-3.5 Turbo on Test That Made It Famous

Internet loses it as tiny ‘potato PC’ AI claims to beat ChatGPT on a laptop

TLDR: A tiny free AI model running on a normal laptop is being hailed as matching an older version of ChatGPT, with fans dreaming of powerful offline assistants and no monthly fees. Skeptics fire back that it probably just memorized the test and that the fancy “fixes” are just old‑school tools in disguise.

A new experiment is throwing gasoline on the AI wars: a tiny “Gemma 2B” model, small enough to run on a regular laptop CPU, reportedly matched and even slightly beat the older ChatGPT (GPT‑3.5) on a popular test. The author is shouting, essentially, “CPUs were enough all along – it was just bad software!” and the internet immediately split into camps.

On one side you’ve got the hype squad: people dreaming of the day they can run a smart coding assistant entirely on their own computer, no tracking, no monthly bill. One commenter practically sighed, saying they “yearn for the days” they can just code with a local AI brain humming on their PC, like the 90s but with a robot sidekick.

On the other side, the skeptics came in swinging. The spiciest accusation? That this tiny model just memorized the test – one user sneered that a “tiny model overfit on a benchmark from three years ago,” basically calling the whole thing a glorified exam cheat. Others mocked the dramatic language about “surgical guardrails,” pointing out that these so‑called precision fixes are just boring tools and scripts dressed up like brain surgery.

Meanwhile, meta‑drama kicked off as someone noted the original post “may be LLM‑assisted” and might get flagged, which only fueled the vibe: an AI‑boosted post about AI, defending a tiny AI, being judged by humans…and probably other AIs. Peak 2024 energy.

Key Points

  • Gemma 4 E2B-it (2B parameters) achieved ~8.0 on MT-Bench, matching GPT-3.5 Turbo’s 7.94.
  • The evaluation ran entirely on a CPU-only setup (4 cores, 16 GB RAM) using a simple Python wrapper without fine-tuning or tools.
  • Seven specific failure patterns were identified; six targeted software fixes (~60 lines each) were applied, raising the projected score to ~8.2.
  • All benchmark traces, code, and fixes were released openly; a raw model demo bot is live on Telegram.
  • The model can be run locally (pip install torch, transformers, accelerate; python chat.py) and deployed via Cloudflare Containers (~$5/month).

Hottest takes

"Tiny model overfit on benchmark published 3 years prior to its training. News at 10" — 100ms
"Surgical guardrails? Tools, those are just too…" — svnt
"I yearn for the days when I can program on my PC with a programming llm running on the CPU locally" — roschdal
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.