June 16, 2026
Chip happens: comment war edition
GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz
Homemade AI chip stuns at first glance, but the comments came for blood
TLDR: A developer built a tiny AI text model directly into custom hardware and showed eye-popping speed numbers without using a normal processor. Commenters immediately argued the demo was misleading, saying the model was so small that a laptop core beat it easily and the benchmark didn’t mean much in practice.
A maker showed off a home-built AI chip running a tiny text generator on an FPGA, a reprogrammable piece of hardware, and bragged about 56,000+ tokens per second at just 80 MHz. On paper, that sounds like sci-fi garage genius energy: no regular processor, no graphics card, just raw digital circuitry spelling out names. But the real action wasn’t in the demo — it was in the comment section, where the applause quickly turned into a courtroom drama.
The harshest reaction came from people saying the headline was doing a lot of heavy lifting. One commenter dropped a link and delivered the brutal counterpunch: a single MacBook CPU core was allegedly 71 times faster on this tiny model. Another went for the jugular by pointing out the system’s memory was only 16 characters, basically accusing the whole “tokens per second” flex of being flashy but not meaningful in the real world. Ouch.
Still, not everyone was in roast mode. Some commenters played the “yes, but…” card, saying this is still genuinely impressive as a proof of concept — especially because bigger AI systems get much harder to run as they grow. That sparked the nerdy dream scenario: could future chips put memory and compute side by side and become monsters at this kind of work? So the mood was split between “cool hack” and “nice stunt, but come back when it matters.” In other words: classic internet tech drama, with a side of meme-worthy skepticism.
Key Points
- •The article claims a Transformer with KV cache achieved more than 56,000 tokens per second at 80 MHz.
- •The design is described as a custom digital integrated circuit created gate by gate.
- •The implementation was prototyped on an FPGA.
- •The author says the system uses no CPU and no GPU.
- •The demonstration shown in the article runs Andrej Karpathy’s microGPT.