Z-Image: Powerful and highly efficient image generation model with 6B parameters

Open image model stuns: fast, uncensored, and fans say it’s SD1.5’s glow-up

TLDR: Z-Image, a 6B open image model, is wowing users with fast, high-quality results and local-friendly installs. Comments split between celebrating its uncensored freedom and nitpicking text accuracy, with GPU flexing (4090 vs “5090 when?”) fueling the hype—making it a big moment for open image tools.

The internet is buzzing over Z-Image, a new 6B-parameter image generator that fans say finally gives the classic Stable Diffusion a true successor. Hype-heavy commenters crowned it the “SD1.5 glow-up,” while the drama-lovers went straight for the jugular: “Z-Image isn’t censored,” one hot take sneered, dunking on Flux 2’s safety talk and sparking a mini culture war over art freedom vs. model guardrails. Meanwhile, pragmatists cheered that it’s open and can run on regular hardware—yes, the Turbo version fits on 16GB graphics cards—and the vibes are pure “download now, ask questions later.”

Speed flexes dominated the thread. One tester bragged “~3 seconds on my RTX 4090,” while another joked about the mythical “5090 when?” energy. The Turbo variant promises sub-second latency on big iron and strong photorealism, plus surprisingly accurate English/Chinese text in images—though skeptics cautioned it’s “not always perfect” and a tester admitted only 2 of 4 trials passed. In classic GPU-forum fashion, the memes wrote themselves: “Frames per prompt,” “uncensorcore,” and “SD1.5 finally got a stylist.” For anyone itching to try it, you’ll need to install diffusers from source—because yes, the PRs landed and the ecosystem is forming fast. The mood: excited, a little chaotic, and very, very online.

Key Points

  • Z-Image is a 6B-parameter image generation foundation model released in three variants: Turbo, Base, and Edit.
  • Z-Image-Turbo achieves strong quality with only 8 NFEs, sub-second inference on H800 GPUs, and fits within 16G VRAM consumer devices.
  • The architecture uses Scalable Single-Stream DiT (S3-DiT), concatenating text, semantic, and image VAE tokens into a unified input stream for efficiency.
  • Elo-based human preference evaluations on Alibaba AI Arena indicate Z-Image-Turbo is highly competitive and SOTA among open-source models.
  • Support for Z-Image has been merged into Hugging Face diffusers via PRs #12703 and #12715; users are advised to install diffusers from source and can use optional Flash Attention, compilation, and CPU offloading.

Hottest takes

"safe (read: censored and lobotomized)" — danielbln
"Excitement is high and an ecosystem is forming fast" — xnx
"It’s fast (~3 seconds on my RTX 4090)" — vunderba
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.