March 2, 2026
Latency wars: DIY strikes back
Show HN: I built a sub-500ms latency voice agent from scratch
Built in a day for $100, talks back in under half a second — DIY hero vs platform wars
TLDR: A developer built a voice assistant in one day for $100 that replies in under half a second, claiming it’s faster than a well-known platform. The comments split between cheering DIY speed tricks, a vendor touting their own tool, and skeptics asking how this stacks up to existing frameworks — because speed makes voice feel human.
A lone dev just bragged they built a talking AI assistant in about a day for roughly $100 that answers in ~400ms — under half a second — and even beat a popular plug‑and‑play service on speed. The thread instantly turned into “DIY wizard vs. big platforms” and people brought popcorn.
Fans cheered the scrappy speed run. One commenter said fixing lag is everything for voice tech, and another compared it to early online gaming: it’s all about orchestration, not bigger models — think stitching the pieces together fast, not just buying fancier parts. That take got love, plus a name‑drop of VR legend John Carmack because, of course.
Then drama arrived. A Soniox rep slid in to say their real‑time speech tool “always works better than VAD” (voice activity detection, a way to detect when someone stops talking), dropping a link. Helpful tip or brand drive‑by? Reactions were… mixed. Some nodded; others rolled eyes at vendor energy.
Meanwhile, the practical crowd asked the spicy question: how does this hand‑vibed Python setup stack up against frameworks like Pipecat or LiveKit? The phrase became a mini‑meme, with jokes about this bot replying “faster than your ex.” Under the laughs, the core debate stayed sharp: co‑locate, stream everything, kill the lag — or just use a platform and sleep at night?
Key Points
- •The author built a voice agent orchestration layer from scratch in about one day using roughly $100 in API credits.
- •The resulting system achieved ~400ms end-to-end latency, about 2× faster than an equivalent setup using Vapi.
- •The build wires STT, an LLM, and TTS into a streaming pipeline, with geography and model selection being major latency drivers.
- •Voice agents are harder than text agents due to continuous real-time orchestration and precise turn-taking requirements.
- •All-in-one SDKs abstract complexity but hinder visibility; building the core loop improved understanding and control.