Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference

Crowd cheers, skeptics scoff, and Apple gets side‑eye over AI app rules

TLDR: Google’s Gemma 4 now runs fully offline on iPhones, promising fast, private AI on the device. Comments split between Android parity chants, doubts about answer quality, and claims Apple blocks local‑AI apps — making privacy wins exciting, but real‑world usefulness and App Store policy the big questions

Google just squeezed a brain into your iPhone — and the crowd immediately split into clapping, squinting, and side‑eyeing. Gemma 4 runs fully offline via Google’s AI Edge Gallery, with small, phone‑friendly versions (E2B/E4B) doing quick replies, image reads, and voice — no cloud, no waiting. Cue the rivalry: one user begged for iPhone vs Android numbers, while another chimed in “It runs on Android too,” turning the thread into a platform tug‑of‑war and a meme about Siri packing a suitcase. Skeptics rolled in hard: “Is the output coherent though?” became the refrain, with veterans warning that local models often talk fast but say little. Then came the pedants, debating whether “edge” means “near the user” or “on the phone” — nerd fight, round one. The real spice? An indie dev claimed Apple is blocking apps that ship local AI models, citing App Store rules, and the comments lit up with “Google’s demo vs Apple’s gate.” Meanwhile, early testers reported snappy responses thanks to the iPhone GPU, fueling hopes for private, on‑site uses like field work and clinics. Verdict from the crowd: huge step, lots of hype — but show us useful answers, cross‑platform proof, and an App Store that plays nice

Key Points

  • Google’s Gemma 4 models now run natively and fully offline on iPhones.
  • Early benchmarks place Gemma 4 (31B) near Qwen 3.5’s 27B model, with trade-offs across tasks.
  • Mobile-focused E2B and E4B variants prioritize efficiency over maximum capability; E2B is recommended in-app.
  • Users can install Google AI Edge Gallery from the App Store to select models and run local inference without cloud or APIs.
  • Inference uses the iPhone’s GPU to achieve low latency, supporting privacy-sensitive and field use cases.

Hottest takes

"Is the output coherent though?" — bossyTeacher
"It runs on Android too" — mistic92
"Apple appears to be blocking the use of these llms" — codybontecou
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.