BarraCUDA Open-source CUDA compiler targeting AMD GPUs

Lone dev builds CUDA-to-AMD tool; fans cheer, skeptics squint, lawyers lurk

TLDR: A solo developer released BarraCUDA, a tool that compiles Nvidia’s CUDA code to run on AMD GPUs with a simple build and no heavy dependencies. Commenters split between cheering the minimalist magic, worrying about C++ support and trademark risks, and wondering if AMD should hire the creator—potentially loosening Nvidia’s grip on developers.

One fearless coder just dropped “BarraCUDA,” a DIY tool that turns Nvidia-only CUDA code into binaries that run on AMD graphics cards—no giant toolkits, no translator, just 15,000 lines of plain C and a one-word build: make. The community reaction? A full-on comment-section cage match.

On one side: the hype squad. “Beautiful,” swoons one fan over the no-dependency build. Another beams, hoping AMD hires the dev. The project’s attitude—“LLVM is NOT required… like an adult”—has folks memeing that this is how you storm Nvidia’s walled garden with a pocketknife and a coffee.

On the other side: the eyebrow-raisers. “Doesn’t CUDA mean C++ too?” asks one skeptic, worried that skipping the usual compiler stack could hit limits when real-world C++ heavy CUDA code enters the chat. Then there’s the spicy legal subplot: the name uses a registered trademark, and several commenters nervously whisper “cease-and-desist incoming.”

But the hottest take? The idea that a handful of enthusiasts might do what a billion-dollar company hasn’t: make running CUDA on AMD feel simple. It’s equal parts folk hero energy and “this is gonna get complicated” vibes. Whether this becomes a movement or a lightning-in-a-bottle moment, the comments are already legendary.

Key Points

  • BarraCUDA is an open-source CUDA compiler that outputs AMD RDNA 3 (GFX11) ELF .hsaco binaries.
  • The project is written in ~15,000 lines of C99 and has zero dependency on LLVM or HIP.
  • Its pipeline includes custom lexer/parser, BIR in SSA, mem2reg, hand-written instruction selection, register allocation, and ELF emission.
  • Supported CUDA features include shared memory (LDS), syncthreads (s_barrier), atomics, warp intrinsics/votes, vector types, half precision, launch_bounds, and cooperative groups.
  • Build requires only a C99 compiler (e.g., gcc); usage supports emitting binaries, IR, AST, and running semantic analysis.

Hottest takes

“Going ‘pure’ C seems limiting” — whizzter
“Brave choice putting a registered trademark in the name” — phoronixrly
“Funny and sad if enthusiasts pulled off what AMD couldn’t” — esafak
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.