Voice & Conversation
agents that don't sound like 2018.
Inbound and outbound voice agents with sub-400ms latency, native turn-taking, and escalation paths to humans in three rings or fewer. We handle the SIP, the eval, the prompts, and the transcripts.
Voice is a different problem than chat.
Latency dominates. Turn-taking is brittle. Barge-in breaks every naive pipeline. We've shipped enough of these to have opinions: realtime APIs first, streaming TTS where possible, and an escalation path that's actually rehearsed — not bolted on at the end.
What we actually build.
Realtime pipeline
Speech-to-speech via realtime APIs when accent, latency, and naturalness matter. Token-streamed TTS with mid-sentence interruption.
Turn-taking
Native VAD with prosody awareness; the agent learns when 'um' is a hold and when it's a release. Tunable per persona.
Barge-in
Fast user override. The agent shuts up within 80ms of detecting fresh speech. We've measured every alternative; this one wins.
Tool calls
The agent can read your CRM, your booking system, your knowledge base — mid-call. Latency budgets are explicit per tool.
Escalation
Warm transfers to humans, with the transcript and intent summary already in their headset. Three rings, max.
Eval & QA
Recorded calls, redacted, scored on completion, satisfaction proxies, and policy adherence. Drift surfaces in days, not quarters.