April 5, 2026

Siri who? HN’s got a new crush

Show HN: Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B

Local AI on your Mac has HN calling it the Siri we were promised

TLDR: A solo dev shipped a real-time, on-device voice-and-vision assistant that runs fast on a Mac M3 Pro with no cloud. Commenters cheered the privacy and speed, roasted Siri for lagging behind, and begged for a one-click Mac/iPhone app while dreaming of truly hands‑free helpers for driving and daily tasks.

Hacker News is buzzing over “Parlor,” a scrappy demo that puts real-time voice-and-vision AI on your own Mac—no cloud, no fees, just talk and it talks back in about 2–3 seconds. The dev says it runs on an M3 Pro using Google’s compact Gemma model and Kokoro for that smooth voice. Translation: a homegrown Siri… that actually listens.

The crowd’s loudest take? Apple fumbled. One top comment cheered the low latency, then roasted Cupertino: this “should be a Siri demo.” Others piled on with the running gag—“You will have to unlock your iPhone first”—as the poster child for why folks want a truly hands‑free helper. Meanwhile, practical dreamers showed up fast: commuters want a dashboard co‑pilot, workshop tinkerers want timers and math without smudging their phones, and small business owners want an assistant that can read pages and manage posts.

Of course, it’s still an early “research preview,” and the DIY setup sparked a mini‑riot of convenience seekers. The meme of the moment: “someone vibe code a Mac app so I don’t have to touch Terminal.” Still, the big mood is hopeful. If this runs locally today on laptops, imagine phones tomorrow—private, fast, and finally useful. Siri, your move.

Key Points

  • Parlor runs a real-time, on-device multimodal assistant that takes audio/video input and outputs voice locally.
  • It uses Gemma 4 E2B via LiteRT-LM for speech and vision understanding and Kokoro TTS (MLX on macOS, ONNX on Linux) for speech synthesis.
  • Browser features include Silero VAD for hands-free use, barge-in to interrupt responses, and sentence-level TTS streaming to reduce latency.
  • On an Apple M3 Pro, end-to-end latency is ~2.5–3.0s with ~83 tokens/sec decode; requires Python 3.12+, Apple Silicon macOS or Linux GPU, and ~3 GB RAM.
  • Quick start uses uv and FastAPI with models auto-downloading (~2.6 GB for Gemma 4 E2B); project is released under Apache 2.0 with configuration options.

Hottest takes

"Feels like your demo should be a Siri demo" — dvt
"'You will have to unlock your iPhone first' is kind of a deal-breaker" — jwr
"Can someone quickly vibe code MacOS native app for that" — divan
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.