March 10, 2026
Local goes brr, zsh goes ouch
Show HN: RunAnwhere – Faster AI Inference on Apple Silicon
Macs get a new voice—fans hype speed, others hit crashes and confusion
TLDR: A new Mac app promises super-fast, fully local voice control and document Q&A without the cloud, thrilling privacy fans and tinkerers. But the comments split between excitement over on-device speed, confusion about what this tool really is, and a few crash-and-burn install woes—buzzing, but bumpy.
The devs behind RunAnywhere’s RCLI just dropped a “talk to your Mac” moment: a fully local voice assistant that listens, thinks, and talks back on Apple Silicon. It controls apps with 43 voice commands, searches your own files, and claims lightning speed—sub-200ms from speech-to-reply. No cloud, no API keys, just you and your Mac. Cue the comments section going full rollercoaster.
Privacy hawks immediately asked if any telemetry phones home, while one futurist declared on-device AI is the only way forward. Another user slammed the brakes with the ultimate party foul: “zsh: segmentation fault rcli.” Meanwhile, a confused commenter wondered if this is a simple voice assistant or a full-on “do anything” AI, and why the document Q&A feature (RAG—basically local file search) is glued to voice chat. Fans cheered the slick demos; tinkerers begged for a model picker and better quantization so they can mix-and-match brains. A Homebrew install bug added drama, because of course it did.
Tech aside, the vibe is pure chaos comedy: Siri who? vs zsh said nope. The promise is super fast, totally local AI on your Mac. The question is whether it’s a polished sidekick—or a brilliant, crash-prone science fair project you’ll be debugging this weekend.
Key Points
- •RunAnywhere released RCLI, an on-device voice AI for macOS that runs STT, LLM, and TTS locally with no cloud or API keys.
- •RCLI supports 43 macOS actions via local tool calling and offers a TUI for push-to-talk, model management, and benchmarks.
- •The voice pipeline uses Silero VAD, Zipformer streaming STT with Whisper/Parakeet offline options, LLMs (Qwen3/LFM2/Qwen3.5), and double-buffered TTS.
- •Local RAG provides hybrid vector+BM25 retrieval with ~4ms latency over 5K+ chunks, supporting PDF, DOCX, and text.
- •MetalRT, a GPU engine for Apple Silicon M3+ by RunAnywhere, delivers up to 550 tok/s and sub-200ms latency; M1/M2 fall back to llama.cpp.