I built a sub-500ms latency voice agent from scrat
I built a voice agent from scratch that averages ~400ms end-to-end latency (phone stop → first syllable). That’s with fu...
Our Take
Nick Tikhonov spent six months building agent prototypes for one of the largest CPG companies in the world, and somewhere along the way he got frustrated with the existing voice agent options. So he did what any reasonable engineer would do—he built his own. From scratch. In about a day. For roughly a hundred bucks in API credits.
The result: ~400ms end-to-end latency from phone stop to first syllable. That's full STT → LLM → TTS in the loop, clean barge-ins, no precomputed responses, and it beats Vapi's equivalent setup by 2×. Let that sink in. A solo engineer with a GitHub repo just dunked on a funded platform.
Here's what he figured out that the big players are missing: voice is a turn-taking problem, not a transcription problem. VAD alone fails—you need semantic end-of-turn detection. The whole system reduces to one loop: speaking versus listening. The two transitions—cancel instantly on barge-in, respond instantly on end-of-turn—define the entire experience. STT → LLM → TTS must stream. Sequential pipelines are dead on arrival for natural conversation. TTFT (time to first token) dominates everything. Groq's ~80ms TTFT was the single biggest win. Geography matters more than prompts—colocate everything or lose before you start.
This is the kind of build that makes you wonder why we're still tolerating laggy voice assistants. Nick put the code on GitHub and wrote it up in a post that hit the Hacker News front page. If you're building AI or voice products and want hands-on help, he does focused consulting. The future of voice isn't waiting for big tech to figure it out—it's people like Nick tearing down the stack and rebuilding it right.
The people behind I built a sub-500ms latency voice agent from scrat
Nick Tikhonov
profileAuthor / Software Engineer
Software engineer who spent six months working on a startup building agent prototypes for a major CPG company. Built a sub-500ms latency voice agent from scratch as a technical project. Speaks fluent English, Russian, and some German. Lives between London, Isle of Wight, and Vienna. Consults with teams on product and engineering.
Links
Similar products worth knowing

Cardboard
Cursor for video editing.

Copperlane
Agents for Mortgage Origination

MochaCare
AI-Supercharged Humans for Home Care Agency Growth

Didit v3
The all-in-one Identity platform
Want products like this in your inbox every morning?
Five products. Every morning. Written by someone who actually cares whether they're good or not. Free forever, unsubscribe whenever.