February 15, 2026
Tiny brain, big opinions
Show HN: Microgpt is a GPT you can visualize in the browser
Tiny AI you can watch learn in your browser sparks nitpicks, nostalgia, and name-chaos
TLDR: A bite-size AI you can watch learn to spell names in your browser has HN buzzing: fans love the clear visuals, nitpickers want a big “characters ≠ tokens” label, and others ask how long it takes to get good. It’s a teachable, watchable peek into how text generators think.
Show HN drops a tiny, in-browser text brain that learns to spit out baby-name vibes, and the crowd immediately turns into Comment Olympics. The demo explains how this pint-sized model guesses characters one by one and shows its “thinking” as colorful weight maps and attention doodles. It’s meant to demystify the big stuff—like ChatGPT—by showing a pocket version you can literally watch learn.
But the loudest take? “It’s character-based, not tokens!” One commenter insists the demo should flag the difference, since big models use chunks of text (not just single letters) to build sentences. To the nitpickers, this detail is crucial. To everyone else, it’s a neat toy that spells names and shows the basics without melting your brain.
Meanwhile, the practical crowd asks: “How many steps ’til it gets good?” They want numbers, not vibes. And then there’s the nostalgia brigade: someone’s hunting for an old page that visualized GPT-2’s inner life in black-and-white—“watching order emerge from chaos”—like a lost indie film. The vibe swings between classroom curiosity and retro AI mixtape.
The verdict: Microgpt is tiny, visual, and surprisingly charming, especially for newcomers. It’s sparking debates about what “real” language models do while making people grin as it conjures weirdo names like it’s auditioning for a baby registry.
Key Points
- •Microgpt is a small, browser-visualized character-level GPT trained on a dataset of names.
- •The model uses 16-dimensional representations, 4 attention heads, and an MLP size of 64 to balance speed and capability.
- •Training uses next-character prediction with cross-entropy loss and gradient descent/backpropagation.
- •Transformer mechanisms explained include Q/K/V attention, 1/√d attention scaling, RMSNorm, and residual connections.
- •Compared to ChatGPT, the micro model is structurally similar but vastly smaller; ChatGPT uses tokens, human feedback, and many more parameters and layers.