Skip to main content

Overview

The AI model (LLM) is the brain of your voice agent. It processes what customers say, understands their intent, reasons about the best response, and decides when to take actions. Choosing the right model means balancing performance, latency, cost, and compliance requirements.
Model selection happens under Models > Model in your agent configuration. Changes apply immediately—no separate publish step required.

Understanding Language Models

Language models are trained on vast amounts of text to understand and generate human language. In voice agents, the LLM interprets customer requests, reasons about the best response based on your instructions and knowledge base, decides when to use actions like transfers or bookings, generates natural conversation responses, and maintains context throughout the conversation. Different models excel at different tasks. Some prioritize speed, others accuracy, and some offer the best balance for conversational AI.
Based on real-world performance from thousands of voice agents, here are the proven models for different use cases:

Best for Most Use Cases: GPT-4.1 Mini

Our top recommendation for production voice agents.Why it works:
  • Excellent latency (~700-800ms response time)
  • 70%+ success rate in function calling (transfers, bookings, actions)
  • Strong instruction following
  • Affordable cost
Use for:
  • Customer support
  • Appointment booking
  • Order processing
  • Most conversational scenarios
Available on: OpenAI, Azure OpenAI (EU-hosted)
When you need maximum intelligence and reasoning.Why it works:
  • Best-in-class reasoning and multi-step logic
  • Handles complex troubleshooting
  • Superior context understanding
Trade-offs:
  • Higher latency than GPT-4.1 Mini
  • Higher cost per conversation
Use for:
  • Technical support with complex diagnostics
  • Multi-step sales conversations
  • Tasks requiring deep reasoning
Available on: OpenAI, Azure OpenAI (EU-hosted)
Anthropic’s fastest model with strong performance.Why it works:
  • Sub-second response times
  • Good balance of speed and intelligence
  • Constitutional AI for safer responses
  • Lower cost than Sonnet
Use for:
  • High-volume call centers
  • Speed-critical applications
  • Budget-conscious deployments
Available on: Anthropic
Fastest option available, powered by Groq’s custom hardware.Why it works:
  • Sub-500ms response times
  • Handles hundreds of tokens per second
  • Open source model
  • Very low cost
Trade-offs:
  • Less intelligent than GPT-4.1 or Claude
  • Occasional latency spikes under load
  • Better for simpler conversations
Use for:
  • Simple qualification calls
  • IVR and routing
  • High-volume, low-complexity scenarios
Available on: Groq
Better reasoning while keeping Groq’s speed advantage.Why it works:
  • Step up in quality from 8B model
  • Still fast on Groq infrastructure
  • Good middle ground
Use for:
  • When Llama 3.1 8B quality isn’t enough
  • Need speed but more intelligence
Available on: Groq
Quick decision guide:
  • Start with GPT-4.1 Mini → reliable, fast, great for most use cases
  • Need more reasoning? → GPT-4.1
  • Need faster/cheaper? → Claude Haiku 4.5
  • Need fastest? → Groq Llama 3.1 8B (but less intelligent)

Model Selection Interface

Provider Catalog

The model selection interface groups providers with helpful metadata:

Provider Icons

Visual branding for OpenAI, Anthropic, Groq, Azure, and more

EU-Hosted Badge

Indicates models that process data within EU regions

Model Count

Shows how many models are available from each provider

Active Selection

Highlights your currently selected model
Click a provider to filter the model table to that vendor only. Use the search box to quickly find specific models by name or capability.

Model Provider Details

OpenAI

OpenAI models deliver the best balance of reliability and function calling for voice agents. GPT-4.1 MiniRecommended
  • Real-world performance: ~700-800ms response time, 70%+ function calling success rate
  • Best for: Production voice agents - support, booking, sales
  • Why it works: Proven reliability, excellent tool use, good latency
GPT-4.1
  • Real-world performance: Higher latency than Mini but superior reasoning
  • Best for: Complex multi-step conversations, technical support
  • Trade-off: Higher cost and latency for more intelligence
GPT-5 Series (Mini, Nano)
  • Status: Next-generation models with advanced reasoning
  • Considerations: GPT-5 has higher latency (~1s+); GPT-5 Mini offers better balance
  • Best for: Tasks where intelligence matters more than speed
Legacy models (GPT-4o, GPT-4o Mini)
  • Status: Still functional but consider GPT-4.1/5 series for new agents

Azure OpenAI (EU-Hosted)

Same OpenAI models hosted in EU (Sweden Central region). Why choose Azure OpenAI:
  • EU hosting: Data processed within EU
  • Enterprise features: Azure security, compliance, SLAs
  • Same models: GPT-4.1, GPT-4.1 Mini, GPT-5 Mini/Nano

Anthropic

Claude models excel at safety, instruction following, and complex reasoning. Claude Haiku 4.5Recommended
  • Real-world performance: Sub-second responses, excellent speed-to-intelligence ratio
  • Best for: Speed-critical deployments, high-volume use cases
  • Why it works: Fast, affordable, strong Constitutional AI safety
Claude Sonnet 4.5
  • Real-world performance: Excellent for complex agent workflows and tool use
  • Best for: Multi-step reasoning, complex procedures, coding tasks
  • Considerations: Can have latency spikes under heavy load; monitor timeouts in production
  • Extended thinking: Supports longer reasoning chains for complex problems
Claude models are more conversational and rich in their responses compared to OpenAI models. They naturally provide fuller, more nuanced answers. This makes them excellent for engaging customer interactions, but they may occasionally over-apologize. Test with your specific use case to see if the conversational style fits your needs.

Groq (Ultra-Low Latency)

Open-source models on custom hardware for maximum speed. Llama 3.1 8B InstantFastest
  • Real-world performance: Sub-500ms response times, hundreds of tokens/second
  • Best for: Simple qualification, IVR, routing, high-volume scenarios
  • Trade-off: Less intelligent than GPT-4.1 or Claude
  • Watch for: Occasional latency spikes under heavy load
Llama 3.3 70B Versatile
  • Real-world performance: Better reasoning than 8B while keeping Groq speed
  • Best for: When you need more intelligence than 8B but want Groq’s speed advantage
GPT-OSS Series (20B, 120B)
  • Real-world performance: 20B model is super fast on Groq hardware, similar to Llama speeds
  • Status: Open-weight OpenAI models with tool use support
  • Best for: Fast open-source alternative with function calling
Groq is perfect for: removing LLM bottlenecks when sub-800ms is critical and tasks are straightforward (qualification, routing, data collection).

Model Parameters

Click Model Parameters to access advanced configuration options that control how the model behaves.

Temperature

Controls randomness in responses (range: 0.0 to 2.0)
  • 0.0 (Recommended): Deterministic, consistent responses
    • Use for: Most voice agents, tool calling, action execution
    • Maximizes reliability for transfers, bookings, and API calls
    • Ensures consistent behavior and predictable responses
  • 0.1 - 0.3: Slightly varied but still highly consistent
    • Use for: Agents that need slight natural variation
    • Still reliable for tool calling
  • 0.4 - 0.7: More creative and varied
    • Use for: Personality-driven agents where creativity matters more than consistency
    • Tool calling reliability decreases
  • 0.8+: Highly creative, unpredictable
    • Avoid for production voice agents
    • Tool calling becomes unreliable
Default recommendation: Use 0.0 unless your agent needs more human-like creativity. Temperature above 0 reduces tool calling reliability (transfers, bookings, actions).

Choosing the Right Model

Decision Framework

Use this framework to select your model:

1. Start with the Right Default

For most use cases, start here:
  • GPT-4.1 Mini → Best balance of speed, reliability, and cost
  • Claude Haiku 4.5 → When you need faster responses or lower cost
Only upgrade if you need more intelligence:
  • GPT-4.1 → Complex multi-step reasoning required
  • Claude Sonnet 4.5 → Maximum conversational quality
Go faster/cheaper only if needed:
  • Groq Llama 3.1 8B → Sub-500ms speed is critical
Simple Routing / FAQ:
  • Groq Llama 3.1 8B (fastest)
  • Llama 3.3 70B (more intelligent)
Standard Customer Support (Most Common):
  • GPT-4.1 Mini ⭐ (recommended - best balance)
  • Claude Haiku 4.5 (faster, more conversational)
Complex Reasoning / Technical Support:
  • GPT-4.1 (when Mini isn’t enough)
  • Claude Sonnet 4.5 (maximum quality)
Personality-Critical / Brand-Sensitive:
  • Claude Sonnet 4.5 (richest, most conversational)
  • GPT-4.1 (when you need reasoning + personality)
Need GDPR-compliant EU hosting?
  • Azure OpenAI is the only provider with EU hosting
  • All GPT-4.1, GPT-4.1 Mini, and GPT-5 models available

Common Model Combinations

Many customers use different models for different agents:
Standard Support → GPT-4.1 Mini (best default for most agents)
High-Volume Routing → Groq Llama 3.1 8B (speed-critical, simple tasks)
Appointment Booking → GPT-4.1 Mini or Claude Haiku 4.5 (reliable tool calling)
Complex Troubleshooting → GPT-4.1 (when you need more reasoning)
Brand/Personality-Critical → Claude Sonnet 4.5 (richest conversations)

Testing Model Performance

A/B Testing Models

To compare models scientifically:
  1. Duplicate your agent in the dashboard
  2. Change only the model on one version
  3. Keep all other settings identical (instructions, voice, actions)
  4. Run identical test scenarios on both
  5. Compare:
    • Response quality and accuracy
    • Latency and speed
    • Conversation naturalness
    • Action trigger reliability

Evaluation Criteria

Rate each model on:
CriteriaWhat to Look For
AccuracyDoes it understand requests correctly?
Instruction AdherenceDoes it follow your system prompt rules?
LatencyHow quickly does it respond?
Context RetentionDoes it remember earlier conversation?
Action TimingDoes it trigger actions at right moments?
Error HandlingHow does it handle unclear requests?

Best Practices

For most voice agents, start with:
  • Model: GPT-4.1 Mini
  • Temperature: 0.0 (or 0.7 for more personality)
Only switch if testing shows you need more intelligence or faster speed.
Start small, upgrade only if needed:
  • Most use cases work great with GPT-4.1 Mini
  • Only upgrade to GPT-4.1 or Claude Sonnet 4.5 if Mini can’t handle your complexity
  • Use Groq for simple routing/FAQ where speed matters more than intelligence
Match capability to requirement—don’t pay for intelligence you don’t need.
Use analytics to track:
  • Average response time
  • Action success rates
  • Transfer rates (high transfers may indicate reasoning issues)
  • Customer satisfaction scores
Switch models if metrics degrade.
If serving global customers:
  • Use EU-hosted models for European callers (GDPR)
  • Consider regional Azure deployments for enterprise compliance
  • Factor in latency from model hosting region to customers
When changing models in production:
  • Note the date and reason in agent description
  • Monitor metrics for 24-48 hours after
  • Keep previous model ID documented for rollback
  • Test thoroughly before switching high-volume agents

Troubleshooting Model Issues

Agent Responses Are Too Verbose

Solutions:
  • Add to instructions: “Keep every response under 25 seconds”
  • Use temperature 0.0 for more focused, concise responses
  • Consider faster model that encourages brevity

Agent Misunderstands Requests

Solutions:
  • Switch to higher-capability model (GPT-4.1, Claude Sonnet 4.5)
  • Improve instructions with more specific examples
  • Add keyword boosting in transcriber settings
  • Review transcription accuracy first (may be STT issue, not LLM)

Agent Doesn’t Follow Instructions

Solutions:
  • Claude models typically better at instruction adherence
  • Simplify and clarify instructions
  • Use bulleted lists instead of paragraphs
  • Add explicit examples of correct behavior
  • Use temperature 0.0 for maximum consistency

High Latency / Slow Responses

Solutions:
  • Switch to faster model (Groq Llama 3.1 8B, Claude Haiku 4.5)
  • Check if issue is model or network latency (test with different providers)

Agent Repeats Same Phrases

Solutions:
  • Add instruction: “Vary your phrasing; avoid repetitive expressions”
  • Consider different model (some have better diversity)
  • Review if instructions inadvertently cause repetition

Model Updates and Versioning

Provider Model Updates

Model providers regularly update their offerings:
  • Minor updates often improve performance without breaking changes
  • Major version changes (e.g., GPT-4 → GPT-5) may require testing
  • itellicoAI notifies customers before automatic version updates

Controlling Model Versions

Some providers let you pin to specific versions:
  • Latest: Always use newest version (default, recommended)
  • Pinned: Stay on specific version (use if you’ve heavily optimized for that model)

Deprecation Policy

When providers deprecate models:
  1. itellicoAI notifies affected customers in advance
  2. Recommended migration path provided
  3. Agents automatically moved to successor model if no action taken
  4. Migration assistance available from support

Next Steps