April 20, 2026

ELIZA’s grandkids boot from a floppy

Soul Player C64 – A real transformer running on a 1 MHz Commodore 64

Retro breadbox learns to “chat” — and the comments explode with nostalgia and salt

TLDR: A tiny ChatGPT-style model now runs on a classic Commodore 64, slowly but for real, beeping out replies from a floppy. Comments split between delighted nostalgia, skeptical eye-rolls about usefulness, and nerdy jokes, proving the charm is the stunt itself and the retro wizardry behind it.

Someone just taught a 1982 Commodore 64 to mumble like a baby chatbot, and the internet lost its mind. A tiny version of the tech behind ChatGPT — just 25,000 parameters, running in hand-written assembly — spits out words at a glacial pace (about a minute per word) and still fits on a floppy. It even makes cute beeps while it “thinks.” You can train your own tiny “soul” and talk to it, but keep it lowercase or it gets confused. The creators tweaked the math so it actually pays attention — and yes, that’s a flex.

The comments? Pure theater. One user cracked, “Eliza called…” — summoning the OG 1960s chatbot for a family reunion gag. Another confessed, “I hate AI and love the C64, but I’ll allow it,” capturing the split: purists vs tinkerers. Skeptics rolled in too: If it babbles nonsense, does it even ‘work’ at this size? Meanwhile, deep-cut nerds joked about adding “1541 flash attention” (translation: make the old floppy drive do speed magic), while practical folks urged: Run it in the VICE emulator with warp mode, so you’re not aging a year per reply. Love it or roast it, everyone agrees: this is peak retro-meets-AI chaos — part museum piece, part meme, all vibe.

Key Points

  • A 2-layer decoder-only transformer with ~25K int8 parameters runs on an unmodified 1 MHz Commodore 64, implemented in 6502/6510 assembly.
  • The model includes real multi-head causal self-attention, softmax, and RMSNorm, generating about one token per ~60 seconds and fitting on a floppy disk.
  • A critical fix involved shifting attention scores by 14 bits (not 17) so a 128-entry exp lookup table yields meaningful attention weights.
  • Training uses quantization-aware methods with a 128-token BPE tokenizer; best checkpoints are chosen by int8 inference quality and exported as soul.bin and tokenizer.json.
  • Scripts and assets enable users to train, build, and run the model in VICE or on real hardware, with tests verifying the full float-to-assembly pipeline.

Hottest takes

“Eliza called, and asked if we saw her grand kids...” — harel
“i hate ai, and i love the c64, but i'll allow it.” — bighead1
“i'm not sure if it does work at this scale.” — wk_end
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.