Show HN: Tiny Diffusion – A character-level text diffusion model from scratch

Tiny Shakespeare bot drops; devs feud over token limits, nerdy tweaks, and terminal vibes

TLDR: A tiny, from-scratch diffusion model writes Shakespeare-style text and runs locally. The crowd debates fixed-length infill quirks, questions a quirky activation choice, and celebrates a terminal UI remix, while rival projects surface—proof that text diffusion is hot and the community is loudly curious.

Tiny Diffusion is a bite-size text generator trained on Tiny Shakespeare, and the demo GIF has devs swooning. It’s only 10.7M parameters and runs locally—cue the “I can try this at home!” crowd. But the real show is the comments: one camp is cheering the minimal, from-scratch charm, while another is poking holes in how it actually fills them.

User yugretcx storms in asking if these diffusion models force a fixed-size blank. If the best word is longer than the gap, do you just chop it? That sparked a mini-melodrama over whether diffusion is good for “infill” (smart auto-complete) or just pretty noise. Meanwhile, simonw went full retro and turned the matplotlib demo into a terminal-tastic remake using curses—yes, an artsy ASCII UI—with a playful “[because it’s cute]” vibe and a gist to prove it.

Then the nerd fight rang the bell: Majromax questioned the unusual ReLU² activation (a mathy way the model’s brain fires), demanding receipts on why it’s used here and not in nanoGPT. Others asked if we should skip characters entirely and diffuse in “embeddings”—vector vibes instead of letters. And volodia threw in a rival link, char-mdlm, fueling a “which tiny text demo reigns?” showdown.

Between Shakespeare jokes (“Out, damned token!”) and hardware flexes (4×A100 training, but weights are pre-baked), the mood is equal parts tinkering joy and spicy skepticism. Classic Show HN: tiny tool, big feelings.

Key Points

  • tiny-diffusion is a 10.7M-parameter character-level text diffusion model trained on Tiny Shakespeare.
  • It modifies the nanochat GPT implementation and provides pretrained weights for immediate use.
  • Training with training.py saves weights to weights/diffusion_model.pt; sample and animation scripts load from that file.
  • The model was trained for 20,000 steps in about 30 minutes on 4×A100 GPUs.
  • The repo includes scripts for text generation (current context length 30) and visualization of the diffusion process.

Hottest takes

“does it just abandon the best fit or truncate it to fit?” — yugretcx
“I figured it would be cute if it used a terminal UI instead” — simonw
“Is there some documentation on the choice of this activation function?” — Majromax
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.