Disassembling terabytes of random data with Zig and Capstone to prove a point

A bar bet over “random code” explodes into stats, repost rage, and AI side-eye

TLDR: A massive Zig-and-Capstone experiment suggests raw instructions show up in random data more than compressed ones, with nuance in the stats. Comments split between repost fatigue, AI-disclosure skepticism, and reminders that “disassembling” isn’t “running”—most random code would crash, making this a fun but practical reality check.

A coder ran terabytes of random data through a disassembler to settle a nerdy bet: are random bytes more likely to look like real machine instructions or compressed ones? The experiment (done in Zig, a fast systems language, and Capstone, a code-reading tool) lit up the comments. One reader sighed, “this is the third or fourth posting,” while another fixated on the author’s bold “no AI used” disclosure, asking why it even matters. Cue the crowd accusing a stealth flex and linking to a hot take about large language models. Drama achieved. Veteran voices jumped in with war stories: kazinator says random data often “looks like code” because instruction encodings are super dense—like mistaking song lyrics for a recipe if you squint long enough. Then the numbers drop: 4.4% of random data disassembles like code, 4.0% decodes as a simpler compression format, and 1.2% both decompresses and disassembles—fuel for the “compression might help” camp. But swisniewski throws a reality check: even if it disassembles, it probably crashes when run, because nonsense instructions hit memory they shouldn’t. The vibe? Entropy bros vs opcode dads, repost fatigue vs fresh math, and lots of eyebrow raises at that AI disclaimer. Want receipts? Check Capstone

Key Points

  • The article investigates whether random bytes more often form valid ARM Thumb instructions directly or after DEFLATE decompression.
  • An experiment is implemented in Zig to generate random bytes, disassemble them with Capstone, and also inflate then disassemble.
  • Results are recorded as success/failure and a percentage of disassembled instruction bytes for ARM Thumb mode.
  • The post provides detailed steps to fetch, build (via CMake), and link Capstone (v5.0.6) into a Zig project tested with Zig v0.14.1.2.
  • An AI Usage Disclosure states no Large Language Models were used; a typeset PDF version and a GitHub repository are provided.

Hottest takes

"I believe this is the third or fourth posting of this article in the last week" — 0x1ch
"Why the AI disclosure?" — mfcl
"1.2% of the data decompresses and disassembles." — kazinator
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.