April 24, 2026
Scrolls, tokens, and 44TB tantrums
Show HN: How LLMs Work – Interactive visual guide based on Karpathy's lecture
Beautiful AI explainer drops, HN nitpicks 44TB and iPhone scrolls
TLDR: A slick, clickable guide explains how chatbots learn from piles of text and predict the next word. The crowd loved it but sparred over a '44TB on one drive' claim, begged for clearer basics like embeddings and inputs, and grumbled about iPhone scroll bugs—proof polish matters.
An eye-candy explainer on how chatbots learn—based on Karpathy’s famed lecture—hit Show HN, and the crowd went wild, then picky. Folks praised the clear walkthrough from “download the internet” to “predict the next word,” and the clickable demos. But the line about 44 TB on a single hard drive sparked immediate nitpicks, with hardware realists calling it out as wishful thinking. Then came the bug brigade: iOS Safari scroll‑jumping, overlapping labels, and layout gremlins stole the spotlight. The comments read like a split-screen: teacher doing magic on the chalkboard while the class shouts, “Your projector’s flickering!”
Beneath the UI drama, the real crave was more basics in plain English. One curious reader asked, “What does the input side of the ‘neutral’ network look like?”—translation: how the model reads shorter or longer prompts. Another begged for an embeddings section: how numbers that capture meaning get plugged in to guide answers. Others giggled at the idea of “downloading the internet” onto one disk, while fans cheered the tokenizer toy and pointed to tiktokenizer.vercel.app. Verdict: a gorgeous guide that makes AI feel human—if it can fix the scrolls and add a spoonful more basics. Right now, it’s half masterclass, half bug report theater on HN.
Key Points
- •The guide traces the LLM pipeline from web-scale data collection to a conversational assistant, inspired by Karpathy’s lecture.
- •Raw web data (e.g., Common Crawl) is filtered into high-quality datasets like FineWeb, totaling about 44 TB (~15 trillion tokens).
- •Tokenization uses subword methods such as BPE; GPT‑4’s tokenizer has a 100,277-token vocabulary enabling efficient multilingual handling.
- •Transformers are trained via next-token prediction with iterative updates that reduce loss over billions of steps.
- •Inference is autoregressive and stochastic, controlled by temperature; a pre-trained base model requires post-training to behave like an assistant.