Cerebras Code now supports GLM 4.6 at 1000 tokens/sec

Cerebras touts ‘1000 tokens/sec’ coding—commenters ask if it’s real, worth $50, or just vibes

TLDR: Cerebras says its coding AI runs GLM‑4.6 at over 1,000 tokens per second, with plans from free to $200. The comments demand proof, question pricing and hidden tricks, spin up a SWE‑1.5 conspiracy, and meme the launch—asking if speed alone is worth $50 and whether quality keeps up.

Cerebras came sprinting into the chat claiming its code AI now runs GLM‑4.6 at “1,000+ tokens per second,” pitching it as the fastest way to code and pairing it with a Free tier, a $50 Pro plan, and a $200 Max plan. It even flexed fresh funding with a Series G raise. GLM‑4.6 is billed as top-tier—“#1 for tool calling” and comparable to Sonnet 4.5—but the internet’s reaction? Speed hype meets trust issues.

Skeptics immediately poked holes: Is that just the rate it spits out text, not how fast it thinks? One user asked if Cerebras is using “speculative decoding” (a speed trick that guesses ahead) or lossy quantization (compressing the math) to hit those numbers. Another dragged pricing: at $50/month, this better be lightning—especially when rivals like Claude and ChatGPT are cheaper. The Groq comparison came up fast, along with the classic: “We have no way to prove it.”

Then came the detective subplot: a claim that Cognition’s SWE‑1.5 might be a GLM‑4.6 finetune, sending model-spotters into conspiracy mode. And the meme machine revved up with the instant classic: “Vibe Slopping at 1000 tokens per second.” Meanwhile, fans like the bring-your-own editor support (Cline, RooCode, and more) and the idea of “staying in flow.” But the room’s energy? Prove it, price it right, and don’t just go fast—be good.

Key Points

  • Cerebras now runs GLM‑4.6 for code generation, advertising 1,000+ tokens per second.
  • GLM‑4.6 is claimed to be #1 for tool calling on the Berkeley Function Calling Leaderboard and comparable to Sonnet 4.5 for web development.
  • Cerebras Code Pro supports a BYO editor approach, with compatibility listed for Cline, RooCode, OpenCode, and Crush.
  • Pricing tiers: Free ($0, limited usage), Pro ($50, up to 24M tokens/day), Max ($200, up to 120M tokens/day).
  • The page links to a press release noting Cerebras’ $1.1B Series G at an $8.1B valuation and provides the company’s Sunnyvale, CA address.

Hottest takes

I tend to think the number of tokens per second a model can generate to be relatively low on the list of things I care about — alyxya
their premium $50 (as opposed to $20 on Claude Pro or ChatGPT Plus) should be justified by the speed — behnamoh
Vibe Slopping at 1000 tokens per second — gatienboquet
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.