Bypassing the Branch Predictor

Dev forum erupts over “always send” hack as AI gets schooled

TLDR: A dev tried to bypass the CPU’s guesswork and force a fast “always send” path, but modern chips don’t honor old hint tricks. Commenters split between crafty hacks, fake-send shenanigans, and calling out an AI’s ARM myth—showing how hard performance tuning is when the CPU thinks it knows better.

A programmer tried to outsmart the CPU’s “guessing brain” (the branch predictor) by forcing a money-transfer function to always take the fast path—basically, “assume we send, never hesitate.” The article reveals that older Intel chips had cryptic hints, but modern ones ignore them; ARM doesn’t have them either, and C++ “likely/unlikely” labels don’t save the day. Cue the community fireworks.

Hackers rushed in with bold, borderline-chaotic ideas. tux3 pitched a “conditional move” trick—no branching, just pick a lane—while warning it could stop the CPU from preparing the right path. IshKebab delivered the vibe check: even perfect hints won’t help if the send code isn’t warmed up in the instruction cache (think: the kitchen’s fast-access shelf). nneonneo wanted to abuse the return-address predictor, a sneaky detour through the CPU’s memory of past returns. Then jcul dropped a prankster’s dream: make “abandon” requests look like fake sends and toss them later—no branches, just vibes.

The spiciest subplot? An AI claimed ARM had “predict taken/not taken” instructions—community clapback: those were hallucinated. x86’s old-school Pentium 4 did have prefixes like 0x3E, but modern chips ghost them. Memes flew: “Always Be Sending,” “No Branch November,” and a nostalgic “P4 was right all along.” Bottom line: speed demons want hacks; pragmatists want maintainability; everyone wants fewer mispredict penalties.

Key Points

  • A skewed branch outcome can prime the predictor to expect false, causing a ~20-cycle penalty when the true path occurs.
  • Rarely executed paths like send() may be cold in the instruction cache and lack pipelining benefits, adding latency.
  • ARM does not provide explicit branch prediction hint instructions; the previously mentioned BEQP/BEQNP were incorrect.
  • Older x86 (e.g., Pentium 4) supported branch prediction hints via instruction prefixes (e.g., 0x3E predicts taken), but modern x86 ignores them.
  • C++20 [[likely]]/[[unlikely]] annotations influence code layout, not the CPU predictor; Clang and GCC emit the same assembly in the example.

Hottest takes

“My first instinct for a poorly predicted branch would be to use a conditional move.” — tux3
“Why not just make all the abandon transactions into fake discarded transactions…” — jcul
“Those ARM instructions are just hallucinated” — kklisura
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.