May 16, 2026
Lights, camera, comment war
SANA-WM, a 2.6B open-source world model for 1-minute 720p video
This AI video maker wowed people, but the comments instantly turned into a download riot
TLDR: SANA-WM claims it can make a full minute of HD video from one image on a single powerful GPU, which is a big deal for open AI tools. But the comment section was less impressed by the hype and more obsessed with one question: where’s the actual download, and can it really compete with the locked-up giants?
A new open-source AI project called SANA-WM just rolled in with a big flex: it can turn a single image plus camera directions into a full one-minute, 720p video. In plain English, that means you give it one picture, tell it how the camera should move, and it tries to build a whole moving scene around that. The creators say it was trained on 64 high-end chips, but can generate on just one powerful graphics card. And yes, the community immediately did what the community always does: skipped the victory lap and went straight to “cool, where’s the download?”
That was the first mini-drama. One commenter couldn’t even find the files and noticed the site’s download button was disabled, which instantly changed the vibe from “future of video!” to “show us the goods.” Another person basically said, if open video AI is the future, then why not just release the really famous heavy hitters too? That sparked the classic open-source tension: is this a genuine breakthrough, or just a nice demo while the secret giants keep the best stuff locked away?
Then came the mood swing. One user argued the real problem is training data: open models may be clever, but closed companies have mountains of video, so they still win when scenes get weird or objects need to behave naturally. And because no internet launch is complete without chaos, the thread also delivered drive-by comedy: “Who wrote your comment?” and the brutally short “Stop posting slop.” In other words, SANA-WM brought the tech, but the comments brought the popcorn.
Key Points
- •SANA-WM is a 2.6B-parameter open-source world model that generates 60-second 720p controllable video from a single image and camera trajectory.
- •The model’s architecture is built around Hybrid Linear Attention, Dual-Branch Camera Control, a two-stage generation pipeline with a 17B refiner, and a pose-based annotation pipeline.
- •The article states SANA-WM was trained on about 213,000 public video clips with metric-scale pose supervision in 15 days using 64 H100 GPUs.
- •At inference, a single H100 can generate a one-minute 720p clip, and a distilled variant can run on a single RTX 5090 with NVFP4 quantization.
- •The article claims SANA-WM matches the visual quality of industrial baselines such as LingBot-World and HY-WorldPlay while delivering 36x higher throughput and stronger action-following accuracy than prior open-source baselines.