December 9, 2025

From gamer rig to mini AI in 48 hours

LLM from scratch, part 28 – training a base model from scratch on an RTX 3090

DIY mini AI on a gaming PC sparks cloud vs home brawl

TLDR: A builder trained a small GPT‑2-style model in 48 hours on an RTX 3090. Comments split between DIY pride, renting cloud power, and cleaning messy web data, with TPU fans chiming in—showing home PCs can cook real AI without big lab budgets.

One tinkerer just trained a mini chatbot brain from scratch in about 48 hours on a single RTX 3090 graphics card, and the comments went wild. Using a do-it-yourself recipe from Sebastian Raschka’s book and the FineWeb dataset, they matched a classic “GPT-2 small” nearly head-to-head. The vibe? DIY pride meets shocked applause. RagnarD cheered the step-by-step breakdown, while DeathArrow called it a “valuable exercise” for anyone who wants to actually understand how large language models (LLMs) think. Meanwhile, name-drops of Andrej Karpathy’s nanochat stirred the pot: sure, he says you can get a strong model for about $100, but that uses eight high-end H100 chips—aka not your average home rig.

Then came the drama. Ducktective lit up the thread with the eternal question: buy a gaming card or rent the cloud? The frugal crowd yelled “use what you already own,” while the speed demons flexed their “spin up and scale” swagger. Havoc dropped a spicy reality check, calling web training data “a huge amount of slop and garbage,” sparking talk of smarter filtering before training. And just when the GPU vs cloud brawl peaked, billylo slid in with a TPU link (Google’s how-to), turning it into a brand war: GPUs vs cloud vs Google’s special chips. The takeaway: base models aren’t just for big labs anymore—and the community is here for the chaos, the memes, and the home-cooked AI.

Key Points

  • A GPT-2 small–sized base model (~163M parameters) was trained from scratch on a single RTX 3090.
  • Training used Hugging Face FineWeb-series datasets and achieved near–original GPT-2 small performance.
  • Total training time was just over 48 hours on consumer hardware.
  • Model configuration matched GPT-2 small: vocab_size 50257, context_length 1,024, emb_dim 768, 12 heads/layers, drop_rate 0.1, qkv_bias False.
  • Karpathy’s nanochat (PyTorch) provides cost/time benchmarks: d32 (1.9B) ~$800; d20 (561M) ~4 hours on 8× H100 at ~$24/hour.

Hottest takes

“valuable exercise… to understand how LLMs work” — DeathArrow
“3090 or rent the cloud?” — ducktective
“a huge amount of slop and garbage” — Havoc
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.