April 20, 2026
When memcpy meets James Bond
Writing string.h functions using string instructions in asm x86-64
Nerds Are Fighting Over One Weird CPU Trick That Makes Copy‑Paste Faster
TLDR: A deep‑dive into how computers copy data revealed that modern compilers often skip the usual function and use a single ultra‑fast CPU trick instead. Commenters are split between calling out old, clumsy patterns as slow and laughing that a 2002 James Bond game was already doing this secret performance wizardry.
Developers went in expecting a calm explainer about how computers copy chunks of data, and instead walked into a full‑blown comment‑section bar fight over one obscure instruction and a two–decades‑old James Bond game. The article itself dissects how the C language’s basic “copy this stuff over there” function secretly turns into a single magic CPU spell called rep movsq, which copies data eight bytes at a time like a high‑speed conveyor belt. But the real show starts when one commenter points at a cryptic code snippet and basically says: this is nonsense, why are people still doing this?
User themafia calls out a popular pattern as pointless and slow, accusing it of abusing obscure status flags and even slowing down the processor’s instruction pipeline. Translation: you’re pushing your sports car with the handbrake on. That sparks a whole vibe of “we’ve been doing it wrong for years.” Then jamesfinlayson casually strolls in with a retro bombshell: they remember decompiling Gearbox’s utility library from a 2002 James Bond: Nightfire game and finding these same fancy string tricks already in use. Suddenly the thread turns into a mix of performance nerds arguing about micro‑optimizations and gamers realizing that early‑2000s shooters were secretly running hand‑tuned assembly code. Half the crowd is impressed, half is side‑eyeing old compilers, and everyone’s a little shocked the CPU has been quietly doing this wizardry all along.
Key Points
- •Compilers often implement string.h functions as built-ins for performance, avoiding function calls.
- •On x86-64, GCC can inline memcpy using the REP MOVSQ instruction instead of calling memcpy.
- •x86 provides string instructions (MOVS, CMPS, SCAS, LODS, STOS) with size suffixes and repeat prefixes (REP/REPE/REPNE).
- •The example demonstrates compiling with GCC 14.2 and disassembling with objdump to observe inlined code.
- •REP MOVSQ copies 8 bytes per iteration, advancing pointers and decrementing RCX until completion, matching memcpy for lengths divisible by eight.