December 25, 2025

GPU goes BRRR, internet goes GRRR

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

5-second clips in ~2s: hype, skeptics, and 'digital heroin' fears

TLDR: TurboDiffusion claims 100–205× faster AI video, turning a 5‑second clip into ~2 seconds of generation on one RTX 5090. Commenters are split between awe at near real‑time uses and skepticism about benchmark methods and missing optimizations, with extra worry about addictive personalized content and calls for an M4 Max version.

TurboDiffusion just slammed the gas: from 166 seconds down to about 1.8 seconds to spit out a 5‑second AI video on a single RTX 5090. That’s like going from a kettle to a microwave. The repo ships model files and a “run it now” script; training code comes later. The authors claim negligible quality loss, and the demos sit at 480p or 720p, with special 5090‑friendly files to fit memory. Check the GitHub repo if you want to peek under the hood.

The community went full split‑screen. On the hype track, people are stunned: near‑real‑time video on one GPU sounds like sci‑fi becoming Tuesday. On the skeptic track, a top comment says the benchmark only counts the core model steps, not all the encode/decode bits, and calls out missing NVIDIA speed tricks like Cutlass and TensorRT (translation: turbo‑buttons). One builder even dropped a Show HN for a one‑GPU video site. Meanwhile, Mac folks begged: please, an M4 Max version—because some say their last 5‑second clip took an hour.

Then came the ethics speed bump. A worried voice warned that super‑personalized, instant video could become “digital heroin,” pointing to a NeurIPS paper. Jokes flew — “GPU goes brrr,” “two seconds or it didn’t happen” — while the big drama remains: is this a real leap for everyone, or a flashy demo that hides the fine print?

Key Points

  • TurboDiffusion claims 100–205× end-to-end acceleration for video generation, reducing time from 166s to 1.8s on an RTX 5090.
  • The repository provides finetuned checkpoints and inference code; training code will be released later.
  • Four TurboWan models are available, all supporting 480p and 720p, with specified best resolutions.
  • Installation requires Python ≥ 3.9 and torch ≥ 2.7.0 (2.8.0 recommended); SageSLA attention is enabled via the SpargeAttn package.
  • Inference involves downloading Wan2.1 VAE and umT5 encoder, selecting quantized or unquantized checkpoints based on GPU memory, and configuring parameters including attention type and sampling steps.

Hottest takes

"2s to generate a 5s video on a 5090 … absolutely crazy" — jjcm
"baselines were deliberately worse … only for DIT steps … No actual use of FA4/Cutlass … nor TRT" — villgax
"We are scarily close to realtime personalization of video… may lead to someone inadvertently creating 'digital heroin'" — codingbuddy
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.