Overview
The AI model (LLM) is the brain of your voice agent. It processes what customers say, understands their intent, reasons about the best response, and decides when to take actions. Choosing the right model means balancing performance, latency, cost, and compliance requirements.Model selection happens under Models > Model in your agent configuration. Changes apply immediately—no separate publish step required.
Understanding Language Models
Language models are trained on vast amounts of text to understand and generate human language. In voice agents, the LLM interprets customer requests, reasons about the best response based on your instructions and knowledge base, decides when to use actions like transfers or bookings, generates natural conversation responses, and maintains context throughout the conversation. Different models excel at different tasks. Some prioritize speed, others accuracy, and some offer the best balance for conversational AI.Recommended Models
Based on real-world performance from thousands of voice agents, here are the proven models for different use cases:Best for Most Use Cases: GPT-4.1 Mini
Best for Most Use Cases: GPT-4.1 Mini
Our top recommendation for production voice agents.Why it works:
- Excellent latency (~700-800ms response time)
- 70%+ success rate in function calling (transfers, bookings, actions)
- Strong instruction following
- Affordable cost
- Customer support
- Appointment booking
- Order processing
- Most conversational scenarios
For Complex Tasks: GPT-4.1
For Complex Tasks: GPT-4.1
When you need maximum intelligence and reasoning.Why it works:
- Best-in-class reasoning and multi-step logic
- Handles complex troubleshooting
- Superior context understanding
- Higher latency than GPT-4.1 Mini
- Higher cost per conversation
- Technical support with complex diagnostics
- Multi-step sales conversations
- Tasks requiring deep reasoning
Fast & Affordable: Claude Haiku 4.5
Fast & Affordable: Claude Haiku 4.5
Anthropic’s fastest model with strong performance.Why it works:
- Sub-second response times
- Good balance of speed and intelligence
- Constitutional AI for safer responses
- Lower cost than Sonnet
- High-volume call centers
- Speed-critical applications
- Budget-conscious deployments
Ultra-Fast Open Source: Groq Llama 3.1 8B
Ultra-Fast Open Source: Groq Llama 3.1 8B
Fastest option available, powered by Groq’s custom hardware.Why it works:
- Sub-500ms response times
- Handles hundreds of tokens per second
- Open source model
- Very low cost
- Less intelligent than GPT-4.1 or Claude
- Occasional latency spikes under load
- Better for simpler conversations
- Simple qualification calls
- IVR and routing
- High-volume, low-complexity scenarios
More Intelligence from Groq: Llama 3.3 70B
More Intelligence from Groq: Llama 3.3 70B
Better reasoning while keeping Groq’s speed advantage.Why it works:
- Step up in quality from 8B model
- Still fast on Groq infrastructure
- Good middle ground
- When Llama 3.1 8B quality isn’t enough
- Need speed but more intelligence
Model Selection Interface
Provider Catalog
The model selection interface groups providers with helpful metadata:Provider Icons
Visual branding for OpenAI, Anthropic, Groq, Azure, and more
EU-Hosted Badge
Indicates models that process data within EU regions
Model Count
Shows how many models are available from each provider
Active Selection
Highlights your currently selected model
Filtering and Search
Click a provider to filter the model table to that vendor only. Use the search box to quickly find specific models by name or capability.Model Provider Details
OpenAI
OpenAI models deliver the best balance of reliability and function calling for voice agents. GPT-4.1 Mini ⭐ Recommended- Real-world performance: ~700-800ms response time, 70%+ function calling success rate
- Best for: Production voice agents - support, booking, sales
- Why it works: Proven reliability, excellent tool use, good latency
- Real-world performance: Higher latency than Mini but superior reasoning
- Best for: Complex multi-step conversations, technical support
- Trade-off: Higher cost and latency for more intelligence
- Status: Next-generation models with advanced reasoning
- Considerations: GPT-5 has higher latency (~1s+); GPT-5 Mini offers better balance
- Best for: Tasks where intelligence matters more than speed
- Status: Still functional but consider GPT-4.1/5 series for new agents
Azure OpenAI (EU-Hosted)
Same OpenAI models hosted in EU (Sweden Central region). Why choose Azure OpenAI:- EU hosting: Data processed within EU
- Enterprise features: Azure security, compliance, SLAs
- Same models: GPT-4.1, GPT-4.1 Mini, GPT-5 Mini/Nano
Anthropic
Claude models excel at safety, instruction following, and complex reasoning. Claude Haiku 4.5 ⭐ Recommended- Real-world performance: Sub-second responses, excellent speed-to-intelligence ratio
- Best for: Speed-critical deployments, high-volume use cases
- Why it works: Fast, affordable, strong Constitutional AI safety
- Real-world performance: Excellent for complex agent workflows and tool use
- Best for: Multi-step reasoning, complex procedures, coding tasks
- Considerations: Can have latency spikes under heavy load; monitor timeouts in production
- Extended thinking: Supports longer reasoning chains for complex problems
Claude models are more conversational and rich in their responses compared to OpenAI models. They naturally provide fuller, more nuanced answers. This makes them excellent for engaging customer interactions, but they may occasionally over-apologize. Test with your specific use case to see if the conversational style fits your needs.
Groq (Ultra-Low Latency)
Open-source models on custom hardware for maximum speed. Llama 3.1 8B Instant ⭐ Fastest- Real-world performance: Sub-500ms response times, hundreds of tokens/second
- Best for: Simple qualification, IVR, routing, high-volume scenarios
- Trade-off: Less intelligent than GPT-4.1 or Claude
- Watch for: Occasional latency spikes under heavy load
- Real-world performance: Better reasoning than 8B while keeping Groq speed
- Best for: When you need more intelligence than 8B but want Groq’s speed advantage
- Real-world performance: 20B model is super fast on Groq hardware, similar to Llama speeds
- Status: Open-weight OpenAI models with tool use support
- Best for: Fast open-source alternative with function calling
Model Parameters
Click Model Parameters to access advanced configuration options that control how the model behaves.Temperature
Controls randomness in responses (range: 0.0 to 2.0)-
0.0 (Recommended): Deterministic, consistent responses
- Use for: Most voice agents, tool calling, action execution
- Maximizes reliability for transfers, bookings, and API calls
- Ensures consistent behavior and predictable responses
-
0.1 - 0.3: Slightly varied but still highly consistent
- Use for: Agents that need slight natural variation
- Still reliable for tool calling
-
0.4 - 0.7: More creative and varied
- Use for: Personality-driven agents where creativity matters more than consistency
- Tool calling reliability decreases
-
0.8+: Highly creative, unpredictable
- Avoid for production voice agents
- Tool calling becomes unreliable
Default recommendation: Use 0.0 unless your agent needs more human-like creativity. Temperature above 0 reduces tool calling reliability (transfers, bookings, actions).
Choosing the Right Model
Decision Framework
Use this framework to select your model:1. Start with the Right Default
1. Start with the Right Default
For most use cases, start here:
- GPT-4.1 Mini → Best balance of speed, reliability, and cost
- Claude Haiku 4.5 → When you need faster responses or lower cost
- GPT-4.1 → Complex multi-step reasoning required
- Claude Sonnet 4.5 → Maximum conversational quality
- Groq Llama 3.1 8B → Sub-500ms speed is critical
2. Match to Your Use Case
2. Match to Your Use Case
Simple Routing / FAQ:
- Groq Llama 3.1 8B (fastest)
- Llama 3.3 70B (more intelligent)
- GPT-4.1 Mini ⭐ (recommended - best balance)
- Claude Haiku 4.5 (faster, more conversational)
- GPT-4.1 (when Mini isn’t enough)
- Claude Sonnet 4.5 (maximum quality)
- Claude Sonnet 4.5 (richest, most conversational)
- GPT-4.1 (when you need reasoning + personality)
3. EU Hosting
3. EU Hosting
Need GDPR-compliant EU hosting?
- Azure OpenAI is the only provider with EU hosting
- All GPT-4.1, GPT-4.1 Mini, and GPT-5 models available
Common Model Combinations
Many customers use different models for different agents:Testing Model Performance
A/B Testing Models
To compare models scientifically:- Duplicate your agent in the dashboard
- Change only the model on one version
- Keep all other settings identical (instructions, voice, actions)
- Run identical test scenarios on both
- Compare:
- Response quality and accuracy
- Latency and speed
- Conversation naturalness
- Action trigger reliability
Evaluation Criteria
Rate each model on:| Criteria | What to Look For |
|---|---|
| Accuracy | Does it understand requests correctly? |
| Instruction Adherence | Does it follow your system prompt rules? |
| Latency | How quickly does it respond? |
| Context Retention | Does it remember earlier conversation? |
| Action Timing | Does it trigger actions at right moments? |
| Error Handling | How does it handle unclear requests? |
Best Practices
Start with GPT-4.1 Mini
Start with GPT-4.1 Mini
For most voice agents, start with:
- Model: GPT-4.1 Mini
- Temperature: 0.0 (or 0.7 for more personality)
Don't Over-Spend on Intelligence
Don't Over-Spend on Intelligence
Start small, upgrade only if needed:
- Most use cases work great with GPT-4.1 Mini
- Only upgrade to GPT-4.1 or Claude Sonnet 4.5 if Mini can’t handle your complexity
- Use Groq for simple routing/FAQ where speed matters more than intelligence
Monitor Real-World Performance
Monitor Real-World Performance
Use analytics to track:
- Average response time
- Action success rates
- Transfer rates (high transfers may indicate reasoning issues)
- Customer satisfaction scores
Consider Regional Deployment
Consider Regional Deployment
If serving global customers:
- Use EU-hosted models for European callers (GDPR)
- Consider regional Azure deployments for enterprise compliance
- Factor in latency from model hosting region to customers
Document Model Changes
Document Model Changes
When changing models in production:
- Note the date and reason in agent description
- Monitor metrics for 24-48 hours after
- Keep previous model ID documented for rollback
- Test thoroughly before switching high-volume agents
Troubleshooting Model Issues
Agent Responses Are Too Verbose
Solutions:- Add to instructions: “Keep every response under 25 seconds”
- Use temperature 0.0 for more focused, concise responses
- Consider faster model that encourages brevity
Agent Misunderstands Requests
Solutions:- Switch to higher-capability model (GPT-4.1, Claude Sonnet 4.5)
- Improve instructions with more specific examples
- Add keyword boosting in transcriber settings
- Review transcription accuracy first (may be STT issue, not LLM)
Agent Doesn’t Follow Instructions
Solutions:- Claude models typically better at instruction adherence
- Simplify and clarify instructions
- Use bulleted lists instead of paragraphs
- Add explicit examples of correct behavior
- Use temperature 0.0 for maximum consistency
High Latency / Slow Responses
Solutions:- Switch to faster model (Groq Llama 3.1 8B, Claude Haiku 4.5)
- Check if issue is model or network latency (test with different providers)
Agent Repeats Same Phrases
Solutions:- Add instruction: “Vary your phrasing; avoid repetitive expressions”
- Consider different model (some have better diversity)
- Review if instructions inadvertently cause repetition
Model Updates and Versioning
Provider Model Updates
Model providers regularly update their offerings:- Minor updates often improve performance without breaking changes
- Major version changes (e.g., GPT-4 → GPT-5) may require testing
- itellicoAI notifies customers before automatic version updates
Controlling Model Versions
Some providers let you pin to specific versions:- Latest: Always use newest version (default, recommended)
- Pinned: Stay on specific version (use if you’ve heavily optimized for that model)
Deprecation Policy
When providers deprecate models:- itellicoAI notifies affected customers in advance
- Recommended migration path provided
- Agents automatically moved to successor model if no action taken
- Migration assistance available from support