GPT-5.5 Codex reasoning-token clustering may be leading to degraded performance

Users say GPT-5.5 keeps hitting a weird wall — and the coding results can get alarmingly dumb

TLDR: A large batch of coding replies from GPT-5.5 appears to stop “thinking” at the exact same number, and that pattern may be linked to worse answers on hard tasks. Commenters say they’ve felt the drop in quality for months, with some blaming cost-cutting and others ditching the tool entirely.

OpenAI’s coding helper is getting roasted after one user spotted a bizarre pattern: GPT-5.5 keeps stopping its “thinking” at the exact same numbers over and over, especially 516, with matching spikes at 1034 and 1552. In plain English, people think the model may be hitting an invisible ceiling mid-problem — and when that happens, the answers can go off the rails. The post stops short of claiming a smoking gun, but the stats are spicy: GPT-5.5 made up only a slice of all responses, yet dominated the exact-516 pileup. That was enough to send the comments into full detective mode.

The loudest reaction? “Yep, I’ve felt this.” One heavy user said they “almost never” use 5.5 for serious reasoning anymore because it’s “not even in the same galaxy” as better options. Another commenter said Codex used to feel brilliantly thorough earlier this year, but now delivers “incredibly stupid implementations” so often they’ve defected to Claude. Ouch. The mood is basically: we’re not shocked, we’re annoyed.

Then came the theories and snark. One camp thinks this is just a boring efficiency trick — batching work in neat chunks like a warehouse packing boxes. The other camp heard “OpenAI cut compute costs” and immediately translated it to: so the bargain-bin brain mode is live now? The meme energy is strong: users are treating 516 like it’s the new cursed number, a kind of digital cliff where smart coding goes to trip over its own shoelaces.

Key Points

•The article analyzes 390,195 Codex response records across 865 sessions and reports that GPT-5.5 disproportionately lands at exactly 516 reasoning tokens.
•GPT-5.5 accounted for 19.3% of all responses but 82.0% of exact-516 events, with an exact-516 / >=516 ratio of 44.0% versus 1.3% for non-GPT-5.5 responses.
•Additional fixed-boundary spikes were reported at 1034 and 1552 reasoning tokens, which the author says resemble threshold boundaries.
•Monthly mean and P90 reasoning-token counts decreased from February to May 2026 even as exact-516 clustering increased sharply.
•The article asks the Codex team to investigate whether GPT-5.5 has a reasoning-budget, routing, truncation, fallback, or scheduler behavior causing termination near 516/1034/1552 tokens.

Hottest takes

“I almost never use it for reasoning anymore. It’s not even in the same galaxy” — ProofHouse

“I’m seeing incredibly stupid implementations intermittently, and have simply switched to Claude” — zenapollo

“So this is it?” — siva7

July 4, 2026

The Case of the Cursed 516

Users say GPT-5.5 keeps hitting a weird wall — and the coding results can get alarmingly dumb

Key Points

Hottest takes

July 4, 2026

The Case of the Cursed 516

GPT-5.5 Codex reasoning-token clustering may be leading to degraded performance

Users say GPT-5.5 keeps hitting a weird wall — and the coding results can get alarmingly dumb

Key Points

Hottest takes

Save News