April 5, 2026
Gemma + Claude = Frenemies?
Running Google Gemma 4 Locally with LM Studio's New Headless CLI and Claude Code
Laptop AI goes fast, but users feud over memory claims, setup hacks, and Anthropic backlash
TLDR: LM Studio’s new command-line mode lets people run Google’s Gemma 4 locally and even pipe it into Claude Code. Commenters are split between hype over fast, private laptop AI and snark about memory myths, speed hiccups, and fears Anthropic might curb the turn‑key party.
Local‑AI fans are losing it over LM Studio’s new headless command‑line mode, which lets you run Google’s Gemma 4 on your own machine—no cloud, no fees, no waiting. The star of the show is Gemma 4’s “mixture of experts” model: the 26B version reportedly hits ~51 tokens/sec on a MacBook Pro while feeling close to a much bigger model. But the vibe isn’t just victory laps—there’s drama. Some users say Gemma flies solo, others say it slows down when routed through Claude Code (Anthropic’s coding app), spawning the week’s hot question: Are Gemma and Claude competitors, collaborators, or a situationship?
The thread swings from helpful to chaotic. One hero drops a one‑liner—“ollama launch claude --model gemma4:26b”—and it instantly becomes the meme of the day. Setup guides appear, confusion ensues (“So what exactly talks to what?”), and then the cold water: “MoE doesn’t really save VRAM,” warns a skeptic, meaning you might still need lots of memory even if it feels faster. Meanwhile, a spicy crowd wonders if Anthropic will clamp down on using Claude Code as a plug‑and‑play frontend for non‑Anthropic models. In short: LM Studio 0.4.0 makes local AI feel mainstream, Gemma 4 looks punchy, and the comments are pure popcorn.
Key Points
- •Google’s Gemma 4 uses a mixture-of-experts design that activates ~4B parameters per pass, enabling fast local inference on consumer hardware.
- •On a 14" MacBook Pro M4 Pro (48 GB), Gemma 4 26B-A4B runs at ~51 tokens/s and fits comfortably in memory, though slower within Claude Code.
- •Gemma 4 family includes E2B/E4B (with Per-Layer Embeddings and audio support) and a dense 31B model scoring 85.2% MMLU Pro and 89.2% AIME 2026.
- •Gemma 4 26B-A4B (8/129 experts active, ~3.8B params per token) scores 82.6% MMLU Pro and 88.3% AIME 2026, achieving Elo ~1441 near the 31B dense (~1451).
- •LM Studio 0.4.0 adds a headless architecture with llmster (server) and the lms CLI, enabling command-line use and parallel request processing.