TRELLIS.2: state-of-the-art large 3D generative model (4B)

Gorgeous 3D, huge GPU, mixed demo results — commenters go off

TLDR: TRELLIS.2 promises stunning image-to-3D generation and a new format for complex shapes, but it needs a beefy NVIDIA GPU and the public demo looks weaker than the teaser. Fans praise its open roadmap while others gripe about hardware lock-in and mixed real-world results, making this a big, buzzy deal.

Microsoft dropped TRELLIS.2, a mega 3D generator that turns pictures into detailed, textured 3D objects. The sizzle reel and the project page look jaw‑dropping, with talk of a new 3D format (O‑Voxel) that handles tricky shapes and realistic materials (think glass, metal, fabric). But the comments? Pure popcorn.

The hottest take is the hardware gatekeeping: “Needs 24GB GPU to run,” snaps one user, while others point out the speed claims were tested on pricey H100 chips. Translation: your gaming laptop is not invited. Meanwhile, the open‑source crowd is buzzing: a veteran notes TRELLIS 1 “changed the game” by releasing data and code, and they’re hyped for this sequel’s roadmap (inference, checkpoints, and training code promised by end of 2025).

Then the demo drama hit. A tester tried the Hugging Face demo and said results from random pics weren’t close to the showcase — calling it possibly a “gimped” model. Cue debates: is the public demo cut‑down, or is image‑to‑3D just hard outside carefully chosen examples?

Sprinkled in are memes about “joining the 24GB club,” jokes that Linux + CUDA is the new velvet rope, and wishlists for a local Sparc3D‑level build or a Hunyuan3D‑3 glow‑up. Verdict: huge promise, high hype, and a comment section ready to rumble.

Key Points

  • TRELLIS.2 is a 4B-parameter 3D generative model for high-fidelity image-to-3D, using O-Voxel sparse voxels.
  • It combines vanilla Diffusion Transformers and a Sparse 3D VAE with 16× downsampling for compact latents.
  • Performance on an NVIDIA H100: 512³ ~3s, 1024³ ~17s, 1536³ ~60s (shape/material breakdown provided).
  • It supports full PBR materials (base color, roughness, metallic, opacity) and handles complex topologies.
  • Minimal processing enables fast conversions: Textured Mesh→O-Voxel <10s (CPU); O-Voxel→Textured Mesh <100ms (CUDA).

Hottest takes

“Needs 24GB gpu to run” — NotGMan
“The results from arbitrary pictures are not nearly as good… gimped version” — nice_byte
“TRELLIS 1 had a massive impact… excited for the follow‑ups” — summarity
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.