Diagnosing Latency
Response latency is the time between when a caller finishes speaking and when the agent starts replying. The agent editor toolbar shows estimated latency for your configuration.Latency Breakdown
Total latency = Transcription + AI Model + Voice Synthesis + Network| Component | Typical Range | How to Optimize |
|---|---|---|
| Transcription | 200-700ms | Use Deepgram Nova-3 (~300ms) over Azure Speech (~500-700ms) |
| AI Model | 300-2000ms | Use faster models (Groq, GPT-4.1 Nano) for speed-critical agents |
| Voice Synthesis | 100-500ms | Use low-latency providers (Cartesia is fastest) |
| Network | 50-200ms | Phone calls add more network hops than web calls |
When Latency Is a Problem
- Under 1 second: Excellent — feels like a natural conversation
- 1-2 seconds: Acceptable for most use cases
- 2-3 seconds: Noticeable — consider optimizing or adding thinking sounds
- Over 3 seconds: Poor experience — take action now
Quick Fixes for High Latency
- Switch to a faster model — Go to General → Thinking and try Balanced or Fast presets
- Enable thinking sounds — General → Sounds → Thinking Sounds (Expert Mode) fills processing time with keyboard audio
- Enable smart filler — General → Sounds → Smart Filler (Expert Mode) generates contextual filler phrases
- Use a faster voice provider — Check Supported Providers for latency benchmarks
- Simplify your prompt — Shorter prompts process faster
- Adjust VAD turn detection — VAD settings can have a significant impact on perceived response timing
Turn-Taking Issues
Agent Talks Over the Caller
Symptoms: Agent starts speaking while the caller is still talking, or responds too quickly after brief pauses. Fix:- Open VAD Turn Detection settings
- Switch to a more Patient response timing preset
- In Expert Mode, increase Silence before responding (e.g., from 300ms to 500ms)
- Enable AI Turn Detection (Expert Mode) for smarter end-of-turn detection
Agent Waits Too Long to Respond
Symptoms: Awkward silences after the caller finishes speaking. Fix:- Switch to a more Responsive timing preset
- In Expert Mode, reduce Silence before responding
- Check if AI Turn Detection is causing delays — try toggling it off
- Switch to a faster AI model
Caller Can’t Interrupt the Agent
Symptoms: Caller speaks but the agent continues its response without stopping. Fix:- In Expert Mode, enable Allow Interruptions
- Reduce Speech duration to trigger interrupt (how long the caller must speak to interrupt)
- Reduce Minimum words to interrupt threshold
Audio Quality Issues
Robotic or Unnatural Voice
- Try a different voice — some voices sound better for conversational use
- Adjust voice settings (Expert Mode) — tweak stability, similarity, and style parameters
- Lower the Response Style (temperature) setting — higher values can produce less consistent speech patterns
Echo or Feedback
- This typically occurs on web calls. Check that the caller’s browser has echo cancellation enabled
- Reduce ambient sound volume if using background audio
- Test with headphones to isolate the issue
Muffled or Unclear Speech
- Check your transcriber selection — Deepgram Nova-3 has the best general accuracy
- Add custom pronunciations for frequently misheard terms
- In Expert Mode, add keywords to boost recognition of specific terms
Phone vs Web Quality Differences
Phone calls go through additional compression and network hops, which can affect quality:- Always test with real phone calls before launching — web simulator results may differ
- Phone networks add 50-200ms of latency that web calls do not have
- Codec compression can affect voice quality — some voice providers handle this better
Silence Handling
Agent Does Not Respond When Caller Goes Silent
Configure Inactivity Timeout:- Enable Silence Reminders to prompt the caller after a period of silence
- Set Max Call Duration to end calls that run too long
- In Expert Mode, configure reminder timing (delay and max count)
Agent Hangs Up Too Quickly
- Increase Max Call Duration (default may be too short for your use case)
- Check that Allow AI to Hang Up is not ending calls prematurely
- Review your prompt — remove language that tells the agent to end calls aggressively
Testing Methodology
- Start with web calls — fastest iteration cycle, no phone network variables
- Move to phone calls — test real network conditions, AMD behavior, and voice quality
- Test from different environments — quiet office, noisy space, mobile, landline
- Compare models — try the same conversation with different AI models to find the best speed/quality tradeoff
- Review conversation timelines — check tool execution times and knowledge retrieval latency
Next Steps
VAD Turn Detection
Configure response timing and interruptions
Choose AI Model
Select the right model for speed vs quality
Thinking Sounds
Fill processing pauses with audio
Common Issues
Find quick fixes for common problems