Show HN: RunAnwhere – Faster AI Inference on Apple Silicon - Weaving News

Macs get a new voice—fans hype speed, others hit crashes and confusion

TLDR: A new Mac app promises super-fast, fully local voice control and document Q&A without the cloud, thrilling privacy fans and tinkerers. But the comments split between excitement over on-device speed, confusion about what this tool really is, and a few crash-and-burn install woes—buzzing, but bumpy.

The devs behind RunAnywhere’s RCLI just dropped a “talk to your Mac” moment: a fully local voice assistant that listens, thinks, and talks back on Apple Silicon. It controls apps with 43 voice commands, searches your own files, and claims lightning speed—sub-200ms from speech-to-reply. No cloud, no API keys, just you and your Mac. Cue the comments section going full rollercoaster.

Privacy hawks immediately asked if any telemetry phones home, while one futurist declared on-device AI is the only way forward. Another user slammed the brakes with the ultimate party foul: “zsh: segmentation fault rcli.” Meanwhile, a confused commenter wondered if this is a simple voice assistant or a full-on “do anything” AI, and why the document Q&A feature (RAG—basically local file search) is glued to voice chat. Fans cheered the slick demos; tinkerers begged for a model picker and better quantization so they can mix-and-match brains. A Homebrew install bug added drama, because of course it did.

Tech aside, the vibe is pure chaos comedy: Siri who? vs zsh said nope. The promise is super fast, totally local AI on your Mac. The question is whether it’s a polished sidekick—or a brilliant, crash-prone science fair project you’ll be debugging this weekend.

Key Points

•RunAnywhere released RCLI, an on-device voice AI for macOS that runs STT, LLM, and TTS locally with no cloud or API keys.
•RCLI supports 43 macOS actions via local tool calling and offers a TUI for push-to-talk, model management, and benchmarks.
•The voice pipeline uses Silero VAD, Zipformer streaming STT with Whisper/Parakeet offline options, LLMs (Qwen3/LFM2/Qwen3.5), and double-buffered TTS.
•Local RAG provides hybrid vector+BM25 retrieval with ~4ms latency over 5K+ chunks, supporting PDF, DOCX, and text.
•MetalRT, a GPU engine for Apple Silicon M3+ by RunAnywhere, delivers up to 550 tok/s and sub-200ms latency; M1/M2 fall back to llama.cpp.

Hottest takes

"Latency is killer in the STT-LLM-TTS pipeline" — DetroitThrow

"How does the RAG fit in, a voice-to-RAG seems a bit random" — stingraycharles

"zsh: segmentation fault rcli" — Tacite

March 10, 2026

Local goes brr, zsh goes ouch

Show HN: RunAnwhere – Faster AI Inference on Apple Silicon

Macs get a new voice—fans hype speed, others hit crashes and confusion

Key Points

Hottest takes

March 10, 2026

Local goes brr, zsh goes ouch

Show HN: RunAnwhere – Faster AI Inference on Apple Silicon

Macs get a new voice—fans hype speed, others hit crashes and confusion

Key Points

Hottest takes

Save News