April 19, 2026

Sketches, sparks, and salty browsers

Show HN: Prompt-to-Excalidraw demo with Gemma 4 E2B in the browser (3.1GB)

AI draws your diagrams in-browser — Chrome wins, Firefox fumes

TLDR: A new browser demo turns text into Excalidraw diagrams using an on‑device AI but only works on Chrome and needs a hefty 3GB download. Comments split between wow and “why,” with Firefox users shut out, people begging for a fast CDN, and others baffled by “Unsupported GPU” messages — impressive but gated.

Type a sentence, get a sketch: that’s the promise of this Prompt → Excalidraw demo, which uses the Gemma 4 E2B model to turn your words into neat diagrams right in your browser. It runs entirely on your machine, squeezing memory so longer chats fit, and pushing the GPU to spit out results fast. The catch? It’s Desktop Chrome-only, needs a GPU feature Chrome has and Firefox/Safari don’t yet, and wants around 3 GB of RAM — cue the drama.

The comments lit up like a GPU under load. Firefox faithful showed up with the classic: “no firefox support?”. One Chrome user on version 147 with a GTX 1060 still got slapped with “Unsupported browser/GPU,” sparking theories about hidden hardware requirements and driver gremlins. Meanwhile, the 3.1GB model download had folks groaning — and laughing — with one plea to “just throw some Claude credits” at a fast CDN. People love the idea of instant, private diagrams, but they’re not loving repeated downloads and compatibility roulette.

Beyond the salt, there’s curiosity: someone asked if Qwen performs differently than Gemma, and another admitted the techy bits (like the memory-squeezing “TurboQuant” trick) flew over their head. The vibe? Gorgeous potential, gated by Chrome and gigabytes. If you’ve got the right browser and beefy RAM, it’s magic; if not, it’s a very pretty “Nope.” For the devs, there’s a CPU-side cousin, turboquant-wasm, but the crowd wants plug‑and‑play speed, not homework.

Key Points

  • The demo generates Excalidraw diagrams from text prompts using Gemma 4 E2B entirely in-browser.
  • The model outputs a compact code (~50 tokens) instead of full Excalidraw JSON (~5,000 tokens).
  • TurboQuant algorithm compresses the KV cache by ~2.4× to fit longer conversations in GPU memory.
  • Implementation in WGSL compute shaders enables GPU inference at 30+ tokens/second; requires WebGPU subgroups and ~3 GB RAM.
  • A sibling npm package, turboquant-wasm, offers a WASM+SIMD CPU-side implementation for vector search.

Hottest takes

"no firefox support?" — COOLmanYT
"make a cdn ... just throw some claude credits" — hhthrowaway1230
"Unsupported browser/GPU ... Chrome 147 ... 1060" — logicallee
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.