Rust Threads on the GPU

Genius leap or turning a racecar into a tractor? The internet is divided

TLDR: VectorWare says you can now run normal Rust threads on a GPU to make coding there feel familiar. Commenters erupted: many fear it wastes GPU strengths and slows things down, while others demand an open-source repo—sparking a heated debate over performance, practicality, and transparency.

VectorWare just dropped a bold claim: they’ve got Rust’s everyday “spawn a thread” trick running on a graphics card. Translation for non-coders: they want GPUs—machines built to run thousands of tiny tasks at once—to feel like regular computers for developers. And the comment section? Absolutely lit. The loudest chorus says this is a “solution looking for a problem,” warning that treating a GPU like a normal computer could make your app slow and messy. One skeptic summed it up as turning a fighter jet into a bus: sure, it moves people, but why would you? Others grilled VectorWare with, “Cool demo, but is this open or locked up?” with folks hunting for a repo and finding crickets.

There’s a spicy technical twist too. VectorWare hints that mapping each Rust thread to a GPU “warp” (think: a bundle of lanes moving in lockstep) could avoid “divergence”—when those lanes try to do different things and slow down. Critics fired back: forcing every lane to march in perfect sync throws away what makes GPUs special. The memes flowed—“GPU cosplay as CPU,” “Ferrari with a trailer hitch,” and “threadripper on a toaster.” Love it or hate it, the pitch is clear: make GPU coding feel familiar. The crowd? Still sharpening their pitchforks.

Key Points

  • VectorWare claims it can successfully use Rust’s std::thread on GPU hardware.
  • The article contrasts CPU and GPU execution models: CPUs spawn threads explicitly; GPUs run kernels with many instances in parallel.
  • GPU kernels are written as functions but execute thousands of times, creating a semantic mismatch that complicates safety and indexing.
  • A CUDA C and a Rust (NVPTX) kernel example illustrate parallel indexing using block and thread IDs.
  • Rust GPU kernels currently require unsafe code and raw pointers, treated like an FFI boundary due to the ownership model and parallel execution; the article aspires to extend Rust safety to GPUs without introducing a separate programming model.

Hottest takes

"turning a GPU into a slower CPU?" — kevmo314
"hopelessly inefficient programs" — nynx
"This programming model seems like the wrong one" — 20k
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.