How fast is N tokens per second really?

This viral demo finally shows what AI speed feels like — and commenters are obsessed

TLDR: A new demo lets people actually watch how fast AI-generated text appears, turning confusing speed numbers into something you can feel. Commenters loved the clarity and jokes, but one hot take stole the spotlight: speed is nice, yet bad answers are still bad answers.

A tiny interactive demo about how fast an artificial intelligence model "types" has somehow turned into a full-on comment-section mood board. The idea is simple: people keep hearing braggy numbers like 10, 60, or 500 "tokens per second" — basically how quickly an AI spits out chunks of text — but most humans have no clue what that actually looks like. This tool lets you watch it in different styles, from plain writing to code and even fake "thinking," and suddenly those abstract speed claims become very real.

And the crowd? Delighted. Several commenters were instantly won over by the vibe, praising it as the kind of "gut feel" tool the AI world badly needs. One person summed up the whole emotional arc with a line that deserves a trophy: "5 tok/s is still faster than me!" That set the tone: half appreciation, half self-own, fully relatable. The jokes may be light, but there was one clear mini-debate bubbling underneath the praise. While many people loved finally understanding the numbers, one commenter cut through the hype with a reality check: at everyday local-computer speeds, the bigger issue may not be speed at all — it may be whether the answer is any good. Ouch.

So yes, the demo is useful. But the real show is the reaction: part relief, part nerd joy, part "please stop flexing benchmark numbers at me" energy. For once, the comments made performance talk feel human.

Key Points

•The article introduces a visualization that shows what different LLM token-per-second rates look like in real time.
•It provides four output modes—code, text, think, and agent—to illustrate how throughput feels across different content types.
•It recommends trying speeds from 5 tok/s to 800 tok/s, mapping them to example hardware and service tiers.
•The tool uses an approximation of BPE-style tokenization rather than matching vendor-specific tokenizers such as tiktoken or Claude's tokenizer.
•The article states that code is more token-dense than prose and estimates that English prose averages about 1.3 tokens per word, so 30 tok/s is about 23 words per second.

Hottest takes

"gut feel calibration utilities" — dario-dentes

"5 tok/s is still faster than me!" — dfollent

"the real issue is quality of output, not tokens per sec" — bjelkeman-again

May 20, 2026

Token drama hits refresh

This viral demo finally shows what AI speed feels like — and commenters are obsessed

TLDR: A new demo lets people actually watch how fast AI-generated text appears, turning confusing speed numbers into something you can feel. Commenters loved the clarity and jokes, but one hot take stole the spotlight: speed is nice, yet bad answers are still bad answers.

Key Points

Hottest takes

May 20, 2026

Token drama hits refresh

How fast is N tokens per second really?

This viral demo finally shows what AI speed feels like — and commenters are obsessed

TLDR: A new demo lets people actually watch how fast AI-generated text appears, turning confusing speed numbers into something you can feel. Commenters loved the clarity and jokes, but one hot take stole the spotlight: speed is nice, yet bad answers are still bad answers.

Key Points

Hottest takes

Save News