Skip to main content
Your callers do not experience your transcriber, voice, model, and timing settings separately. They experience one conversation. This guide helps you treat the AI pipeline as one system so you can make better tradeoffs before launch.

Why This Matters

Most voice-quality problems are not caused by one bad setting. They usually come from the interaction between:
  • how accurately the caller is transcribed
  • how quickly the model decides what to say
  • how natural the chosen voice sounds
  • how the system handles pauses, interruptions, and pronunciation
If you only optimize one layer, the conversation can still feel slow, robotic, or error-prone.

The Five Parts Of The AI Pipeline

PartWhat it controlsMain doc
TranscriberHow caller audio becomes textTranscriber
AI modelHow the agent reasons and respondsChoose AI Model
VoiceHow the response sounds to the callerSelect Voice
Voice behaviorSpeed, stability, style, and pronunciationVoice Settings and Custom Pronunciations
TimingInterruptions, pauses, silence, and turn-taking feelTurn-Taking and Timing

Start With The Outcome You Need

Choose the pipeline configuration based on the actual conversation you are deploying.

Fast phone support or triage

Prioritize low latency, clear pronunciation, and interruption handling.Start with:
Prioritize warmth, brand fit, and consistent pacing.Start with:
  • a voice that matches your tone and audience
  • stronger voice prompting
  • pronunciation rules for product and company names
  • test calls with realistic objections and interruptions
Prioritize language coverage and locale accuracy.Start with:
  • language support in the transcriber
  • locale-matched voices in Select Voice
  • test scripts for each target language
  • explicit prompt instructions if tone or phrasing changes by region
Prioritize clarity, consent, and predictable behavior.Start with:
  • short, direct voices with minimal embellishment
  • clear announcements
  • explicit privacy controls
  • conservative timing settings so callers can interrupt easily

Configuration Order

Work through the pipeline in this order. Each layer depends on the one before it.
StepWhat to configureWhy first
1. TranscriberLanguage, provider, modelIf the caller is misheard, nothing downstream can recover
2. VoiceProvider, voice, cloningPick what callers hear once transcription is solid
3. Voice refinementsSettings, pronunciations, ambient sound, thinking soundsFine-tune after the core voice is chosen
4. TimingTurn-taking, silence, interruptionsTune last — timing sliders can mask deeper problems
Do not start with timing. If the transcriber, voice, or prompt is already causing friction, adjusting timing hides the real problem instead of fixing it.
Where to go for each step:

Common Symptoms And Where To Look First

SymptomFirst place to lookThen check
Agent mishears names, addresses, or numbersTranscriberCustom Pronunciations
Voice sounds wrong for the brandSelect VoicePrompt Engineering Guide
Speech sounds robotic or unevenVoice SettingsSelect Voice
Agent cuts callers offTurn-Taking and TimingTranscriber
Agent feels slow after the caller stops talkingTurn-Taking and TimingChoose AI Model
Product names or company names are spoken badlyCustom PronunciationsPrompt Engineering Guide
Cloned voice sounds inconsistentVoice CloningVoice Settings

A Practical Rollout Sequence

1

Prove the logic in chat

Confirm the prompt, tools, and knowledge work before you spend time on voice tuning.
2

Evaluate the voice in Web Call

Listen for pace, pronunciation, and interruption feel in the browser.
3

Validate the full call on the phone

Run at least one real phone call. Phone audio and network behavior often change the result.
4

Review the conversation detail

Check the transcript, timing, tool execution, and any post-call automation before launch.

Common Mistakes

A beautiful voice does not help if the caller is transcribed inaccurately. Start with recognition quality, then optimize style.
Ambient sound can improve feel, but it does not solve slow model responses, slow tools, or high-latency transcription.
Always test with the kinds of callers you actually expect: different accents, speeds, noise levels, and interruption patterns.
If you change the transcriber, voice, prompt, and timing together, you will not know what actually improved or broke the conversation.

Next Steps

Select Voice

Browse, preview, and choose the voice your callers hear

Transcriber

Pick the speech-to-text layer that fits your languages and latency needs

Voice Cloning

Create and evaluate custom branded voices

Turn-Taking and Timing

Tune pauses, interruptions, and silence handling