Finding Miscompiles for Fun, Not Profit

He spent $10,000 letting AI hunt code mistakes, and commenters want this everywhere

TLDR: An engineer says AI found huge numbers of serious mistakes in important code-building software after he spent over $10,000 letting it search at scale. Commenters are impressed but immediately want the real test: can cheaper AI do the same, and will this become an automatic safety check for every project?

A compiler expert says he dropped more than $10,000 on AI agents and watched them tear through major code tools like a gossip blogger with receipts, surfacing hundreds of likely bugs and a few genuinely scary ones. The basic idea is simple even if the machinery isn’t: make random little programs, run them through software that translates code for computers and graphics chips, then check whether the output still behaves the same. According to the author, the shocking part wasn’t that bugs existed — it was how fast the AI found them, and how little human babysitting it needed. The community reaction? Equal parts awe, alarm, and “okay, but can we get the budget version?”

The loudest take in the comments came from people instantly imagining this becoming a normal safety check on GitHub: every code change automatically swept by cheap AI bug-hunters before anyone ships a disaster. That’s the dream. The drama is in the price tag: if the system burns five figures in tokens, is this the future of software safety or just a very expensive flex? One commenter basically summed up the mood with: run it on cheaper models and let’s see if the magic holds. Meanwhile, the author jumped into the thread with a calm “happy to answer questions,” which only added to the delicious tension: the crowd wants numbers, proof, and maybe a coupon code. The meme-y subtext of the whole discussion is obvious: AI didn’t just write code this time — it snitched on everybody else’s code too.

Key Points

•The author says they spent more than $10,000 in one afternoon running AI agents on compiler code and found hundreds of plausible LLVM bugs, including many miscompiles.
•In January 2026, the author and Codex built an LLVM fuzzer that generated random programs and compared behavior before and after compilation, leading to five fixed bugs in LLVM’s instcombine pass.
•In mid-May 2026, after joining SemiAnalysis as a contractor, the author applied the technique to NVIDIA’s ptxas and found 40 miscompiling programs in three days, increasing to about 80 a week later.
•The author expected fuzzing ptxas to be harder than fuzzing LLVM because ptxas is closed-source, requires end-to-end compilation, and offered fewer practical instrumentation options.
•The article says improved LLM assistance handled repetitive fuzzing tasks such as adapting the fuzzer, minimizing test cases, and selecting PTX instruction sequences, which greatly reduced manual effort.

Hottest takes

"run with cheaper models too" — mNovak

"a full repo sweep like this is a default Github action" — mNovak

"happy to answer questions, take criticism" — jlebar

May 29, 2026

Bug Bounty? More Like Bug Bonanza

He spent $10,000 letting AI hunt code mistakes, and commenters want this everywhere

Key Points

Hottest takes

May 29, 2026

Bug Bounty? More Like Bug Bonanza

Finding Miscompiles for Fun, Not Profit

He spent $10,000 letting AI hunt code mistakes, and commenters want this everywhere

Key Points

Hottest takes

Save News