Nvidia with unusually fast coding model on plate-sized chips

Blazing-fast code on dinner-plate chips—but commenters smell spin and demand receipts

TLDR: OpenAI launched a super-fast coding model on Cerebras’s giant chips, claiming 1,000+ tokens per second. Commenters cheered the speed but challenged the headline (AMD was used before), questioned article trust and benchmarks, and asked whether Cerebras can be more than a niche inference play—speedy code meets skeptical crowd.

OpenAI just dropped GPT-5.3-Codex-Spark, a coding model running on Cerebras’s plate-sized chips that reportedly spits out over 1,000 tokens per second—about 15x faster than its predecessor. It’s text-only, tuned for speed, packs a 128,000-token window, and is rolling out to $200/month ChatGPT Pro users via the Codex app, CLI, and VS Code, with API access for select partners. On paper, it’s a rocket. In the comments? It’s a courtroom.

The hottest thread wasn’t about speed; it was about trust. After recent criticism that Ars allegedly used AI-made quotes, one commenter openly wondered if this piece is AI-written too, supercharging skepticism. Others hit the brakes on the headline’s claim that this is OpenAI’s first non-Nvidia production run—one reader flatly says they used AMD GPUs before via Azure, calling out the framing. Meanwhile, a pragmatic crowd is side-eyeing the benchmarks and asking for independent validation, especially since Ars’s own test last year found Codex lagging behind rivals.

There’s also a bigger debate: Is Cerebras more than a cool side quest? One commenter asks if the wafer-scale startup has a future beyond being a fast inference box. Meme-watch: plenty of pizza jokes about those ‘plate-sized’ chips, and a running gag that it’s ‘fast tokens, slow trust.’ Another reader linked a previous HN thread, suggesting the news isn’t exactly fresh. Verdict: the model’s fast; the audience’s patience isn’t.

Key Points

  • OpenAI released GPT-5.3-Codex-Spark, a coding model running on Cerebras chips, its first production AI model on non-Nvidia hardware.
  • Spark achieves over 1,000 tokens per second, about 15 times faster than its predecessor, and is optimized for speed over depth.
  • The model is text-only at launch, offers a 128,000-token context window, and is available as a research preview to ChatGPT Pro subscribers, with API access for select partners.
  • Spark builds on the full GPT-5.3-Codex model, which targets heavier, agentic coding tasks and broader functionality.
  • OpenAI reports Spark outperforms GPT-5.1-Codex-mini on SWE-Bench Pro and Terminal-Bench 2.0, though no independent validation is provided; performance is contextualized against Anthropic’s Claude models.

Hottest takes

I have to wonder whether any of these quotes are AI-hallucinated — gortok
They used amd gpus before — Havoc
do they have a future beyond being inference platform — reliabilityguy
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.