← Back to Companies
Show HN: I built a sub-500ms latency voice agent from scratch
I built a voice agent from scratch that averages ~400ms end-to-end latency (phone stop → first syllable).
About
I built a voice agent from scratch that averages ~400ms end-to-end latency (phone stop → first syllable). That’s with full STT → LLM → TTS in the loop, clean barge-ins, and no precomputed responses.What moved the needle:Voice is a turn-taking problem, not a transcription problem. VAD alone fails; you need semantic end-of-turn detection.The system reduces to one loop: speaking vs listening. The two transitions - cancel instantly on barge-in, respond instantly on end-of-turn - define the experienc
🔐
Unlock Founder Contacts
Sign up for free to see the full profile. Upgrade to PRO for verified founder emails and LinkedIn profiles.