March 16, 2026

Siri slander. Space-heater smackdowns.

My Journey to a reliable and enjoyable locally hosted voice assistant

Fans yell 'Siri who?' while others say it's a noisy space heater with a sleepy wake word

TLDR: A maker built a fast, private, at‑home voice assistant, ditching Google and showing real progress. Commenters split between “better than Siri” hype, “wake word and audio stink” gripes, and a privacy‑vs‑power‑bill brawl—plus many asking if talking to gadgets beats just pressing a button.

A Home Assistant tinkerer ditched Google Home for a fully local voice assistant—running on their own hardware instead of the cloud—and claims snappy, private replies using beefy graphics cards. But the real show is the comments under the post: one camp is cheering, the other is roasting.

On the hype side, one fan crowed it’s “already 10x better than Siri,” praising how it knows which light you mean without nagging. The skeptics rolled in fast: another user says the buzzy 2024 OpenAI voice demo still hasn’t materialized into reality—“zero progress,” they sigh. Then there’s the “do we even want to talk to gadgets?” crowd, arguing it’s faster (and less cringey) to just tap a switch than “talk to empty air.”

The loudest fight? Local vs. Cloud. Privacy lovers hail running everything at home; pragmatists clap back that Google’s Gemini is cheaper and faster than powering a 300-watt “space heater” GPU. Yet both sides agree on the true final boss: wake word detection (the “Hey…” trigger) keeps missing or false-firing. One buyer of Home Assistant’s own speaker says responses sound bad and the basics aren’t there yet—while Amazon’s Alexa catches strays for being stuffed with ads.

Verdict: Local voice is getting real—but between power bills, sleepy wake words, and mic quality, the vibe is “promising… with a side of chaos.”

Key Points

  • The author replaced Google Home (Nest Minis) with a fully local Home Assistant Assist setup using llama.cpp (previously Ollama).
  • Most modern discrete GPUs can run local Assist effectively; performance depends on chosen model size and hardware.
  • GPU benchmarks show 1–2 s responses on RTX 3090/RX 7900 XTX with 20B–30B MoE and 9B dense models; mid-range 16GB GPUs achieve 1.5–4 s; RTX 3050 8GB handles 4B models at ~3 s.
  • All tested models support basic tool calling; GGML GPT-OSS 20B performed best across advanced behaviors, while smaller models struggled with misheard commands and noise.
  • Home Assistant runs on an Unraid NAS (not critical to voice performance); response times reported are after prompt caching.

Hottest takes

"already 10x better than Siri" — dewey
"much cheaper than the electricity that would be needed to keep a 3090 awake" — hamdingers
"faster for me to just do it myself… than talking to empty air" — voidUpdate
Made with <3 by @siedrix and @shesho from CDMX. Powered by Forge&Hive.