CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

A new paper called CODA says it can make training big artificial intelligence models faster by keeping more work on the graphics card instead of constantly sending data back and forth to memory. In plain English: the authors are trying to cut down on wasteful shuffling so the chip spends more time actually doing useful math. That sounds impressive, and some commenters were genuinely excited by one line in particular: large language models writing these fast kernels themselves. One reaction basically boiled down to, if the bots can help build the speedups, progress might suddenly move a lot faster.

But the real action was in the comment section, where the mood was less “jaw dropped” and more “hang on, isn’t this just an old trick with a fresh label?” The sharpest pushback came from people saying CODA doesn’t unlock some magical new performance boost so much as package existing ideas in a way that’s friendlier for machine-generated code. That turned the conversation into a spicy little identity crisis: is this a breakthrough in speed, or a breakthrough in how we describe the speed trick so an AI can assemble it?

And yes, the nerd humor arrived right on schedule. One commenter joked that people seeing this paper were getting major “second kernel” energy, as if veteran chip programmers were watching a remake and recognizing every plot beat. Another user dropped a chaotic mini-summary so dense it read like the community’s version of a detective board covered in red string. In other words: the paper brought optimization news, but the comments delivered the drama, skepticism, and memes.

May 21, 2026

Fast math, hotter comments

Researchers say they found a clever speed trick, but the comments say “we’ve seen this movie”

TLDR: CODA claims it can speed up AI training by reorganizing work so less time is wasted moving data around on the chip. Commenters weren’t fully dazzled: some said the trick isn’t new, and the real story is that this setup may be easier for AI tools to write.

Key Points

Hottest takes

May 21, 2026

Fast math, hotter comments

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Researchers say they found a clever speed trick, but the comments say “we’ve seen this movie”

TLDR: CODA claims it can speed up AI training by reorganizing work so less time is wasted moving data around on the chip. Commenters weren’t fully dazzled: some said the trick isn’t new, and the real story is that this setup may be easier for AI tools to write.

Key Points

Hottest takes

Save News