Happy Zelda's 40th first LLM running on N64 hardware (4MB RAM, 93MHz)

Yes, the N64 talks now — fans beg for a video

TLDR: A developer squeezed a tiny chatty AI onto a real Nintendo 64, making it talk offline like it’s 1996 with a brain transplant. The crowd’s buzzing for a video, asking for an easy test and the missing AI file, while one commenter roasts the confusing title — nostalgia meets nitpicks.

An indie dev just made an N64 literally talk back using a tiny AI model — no internet, no modern chip, just a 1996 console spitting out words in real time. It’s called Legend of Elya, a homebrew game where a small “LLM” (large language model) runs on the Nintendo 64’s humble 93 MHz brain and 4 MB of memory. Translation: a teeny-tiny chatbot squeezed onto a cartridge. The model is so small it fits in about 232 KB — roughly a fraction of one MP3 — and it runs entirely in old-school number math instead of fancy graphics-processor magic.

The community reaction? Equal parts hype parade and help desk. One camp is chanting “video or it didn’t happen”, with shomp begging for a demo, while great_psy wants a way to test it without dusting off the console. The dev, AutoJanitor, jumps in like a hype man: it runs in an emulator, text glitches are being fixed, and there’s “a surprise coming soon.” Cue the Zelda 40th birthday confetti. Meanwhile, a mini-mutiny brews: mlaux tried to build it but hit a wall because the weights file — the AI’s “brain” — wasn’t included, sparking pleas to add it to the repo. And acuozzo went full grammar cop, calling the title “extremely challenging to parse,” birthing a new meme: “Make titles readable again.”

It’s retro magic meets AI minimalism, with the crowd split between awe, FOMO, and “pls upload the file.” Want to try it? You’ll need an emulator or real hardware — and maybe libdragon — but what everyone really wants is a clean video and one-click download.

Key Points

  • Legend of Elya is an N64 homebrew game embedding a character-level nano-GPT that runs entirely on the console’s VR4300 CPU without floating-point.
  • The model has 2 transformer blocks (embed 128, 4 heads, FFN 512), 256-byte vocabulary, 32-token context, and ~427k parameters with a 232 KB Q4-quantized weight file.
  • All inference uses Q8.7 fixed-point arithmetic with integer approximations for layer norm and softmax; no float/double operations are used.
  • Training is done with PyTorch + CUDA (AdamW, cosine LR); about 7 minutes on an RTX 5070, achieving loss 0.3389 (perplexity ≈ 1.40).
  • A roadmap proposes RSP microcode acceleration (DMA to DMEM, VMULF/VMADH) for an estimated 4–8× speedup; builds run in ares or on real hardware via EverDrive.

Hottest takes

“the surprise is coming soon. Happy 40th Zelda!” — AutoJanitor
“it’s missing the weights.bin file” — mlaux
“this title was extremely challenging to parse” — acuozzo
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.