Diagnostic Approach
When issues arise in production, use this systematic troubleshooting process:- Identify symptoms - What’s broken? Calls failing, agent behaving wrong, integrations down?
- Check scope - Affecting all calls or specific subset? Started when?
- Review logs - Look for error messages, patterns, stack traces
- Isolate cause - Is it config, integration, platform, or carrier issue?
- Apply fix - Deploy smallest change that resolves issue
- Validate - Test thoroughly before marking resolved
Common Issues & Solutions
No Calls Being Received
Symptom: Phone number isn’t ringing, calls go to dead air or “number not in service”Check phone number assignment
Check phone number assignment
Diagnostic:
- Go to Telephony → Numbers
- Find your number, check “Assigned Agent” column
- Verify it’s assigned to correct agent (not “Unassigned”)
- Click number → Select agent from dropdown → Save
- Wait 30 seconds for routing update to propagate
- Test by calling number again
Verify SIP trunk registration
Verify SIP trunk registration
Diagnostic:Look for
"status": "registered". If "unregistered" or "failed":Common Causes:- Wrong SIP credentials (username/password)
- IP not allowlisted with carrier
- Firewall blocking UDP 5060
- Carrier endpoint down
- Verify credentials with carrier
- Check carrier’s portal for IP allowlist settings
- Test with
curl -v sip:carrier.com:5060to check connectivity - Contact carrier if their endpoint is unreachable
Check number provisioning
Check number provisioning
Diagnostic:
- Number status shows “Pending” or “Failed”
- Number was just purchased <5 minutes ago
- Wait up to 10 minutes for provisioning to complete
- If stuck >30 min, click “Retry Provisioning”
- If still failing, contact support with number details
Carrier routing issues
Carrier routing issues
Diagnostic:
- Calls work when YOU call, but not from customer phones
- Specific area codes or carriers can’t reach you
- Check carrier’s routing table includes your number range
- Verify CNAM (Caller Name) registration
- Check for spam flagging (use free carrier lookup tools)
- May need to contact carrier to fix routing
Poor Call Quality
Symptom: Choppy audio, echo, robotic voice, long delaysAudio issues (echo, noise, cutting out)
Audio issues (echo, noise, cutting out)
Diagnostic:
Configuration:
- Open Conversations → Logs → Select affected call
- Check “Call Quality” metrics:
- Jitter: Should be <30ms
- Packet Loss: Should be <1%
- RTT (Round Trip Time): Should be <200ms
| Symptom | Likely Cause | Fix |
|---|---|---|
| Echo | Acoustic echo (speaker feedback) | Enable AEC in agent settings |
| Choppy/Robotic | Packet loss or jitter | Check network bandwidth, switch codec |
| Cutting Out | Firewall blocking RTP | Open UDP ports 10000-60000 |
| One-Way Audio | NAT traversal issue | Enable TURN relay |
| Background Noise | Noisy environment | Enable noise suppression |
High latency (agent responds slowly)
High latency (agent responds slowly)
Diagnostic:
- Check Dashboard → Metrics → Average Response Time
- Target: <500ms
- If >1000ms, investigate:
-
Model Provider Slowdown
- Check status.openai.com or status.anthropic.com
- Look for elevated latencies or outages
- Switch to fallback model if available
-
Large Context Window
- Long instructions or huge knowledge base retrievals
- Solution: Reduce instruction length, limit KB chunks to 3
-
External API Slowness
- Check Analytics → Actions for slow API calls
- APIs taking >2s will delay agent response
- Add timeout limits and fallback responses
-
Network Issues
- Check from multiple locations
- If specific region affected, may be ISP routing issue
- Contact support to investigate
- Switch to faster model (GPT-4 → GPT-3.5 Turbo)
- Reduce # of knowledge base chunks retrieved
- Increase API timeouts to avoid waiting
Transcription errors (wrong words)
Transcription errors (wrong words)
Diagnostic:Try Different Transcriber:
- Review transcript in Conversations → Logs
- Look for patterns:
- Technical terms misheard → Add to custom vocabulary
- Accents misunderstood → Try different transcriber
- Background noise → Enable noise suppression
- Deepgram: Best for accents, noisy environments
- Azure: Best for technical terms
- Whisper: Best for multilingual
- Use higher bitrate codec (Opus 64kbps)
- Enable noise suppression
- Test with different phone/mic for web calls
Agent Not Responding Correctly
Symptom: Agent gives wrong answers, ignores instructions, behaves erraticallyAgent doesn't follow instructions
Agent doesn't follow instructions
Diagnostic:2. Instructions Too Long (>500 words):
- Open agent settings → Conversation → Instructions
- Test in simulator with exact scenario
- Check if issue is:
- Always happening → Instruction problem
- Intermittent → Context or model issue
- Model loses focus on long prompts
- Solution: Break into clear sections, use bullet points
- Example: “Be concise” but also “Provide detailed explanations”
- Solution: Prioritize one behavior, remove conflict
- Run 10 test calls covering edge cases
- Review transcripts for compliance
- Iterate instructions based on failures
Agent hallucinates or makes up facts
Agent hallucinates or makes up facts
Diagnostic:2. Expand Knowledge Base:4. Use Citation Mode (if available):
- Agent says things not in knowledge base or instructions
- Provides wrong product details, prices, or policies
- Knowledge base not comprehensive enough
- Instructions don’t emphasize “only use provided info”
- Model too creative (temperature too high)
- Review “I don’t know” responses in logs
- Add missing content to KB
- Test retrieval with sample queries
- Forces agent to cite KB sources
- Makes it obvious when info not in KB
Agent gets stuck in loops
Agent gets stuck in loops
Diagnostic:2. Improve Intent Recognition:
- Agent repeats same question 3+ times
- Conversation goes in circles
- Caller gets frustrated
- Agent doesn’t recognize caller answered
- Transcription failed (didn’t hear response)
- Instructions don’t handle edge case
- Add examples of edge case responses to instructions
- Use structured responses (DTMF for critical info)
- Enable “I’m not sure” fallback to human
- Silent caller
- Rambling caller
- Caller with thick accent
- Noisy environment
Integration Failures
Symptom: Actions don’t trigger, API calls fail, transfers don’t workAPI action fails
API action fails
Diagnostic:
Testing:If manual curl works but agent fails, check:
- Open Conversations → Logs → Select failed call
- Navigate to Actions tab
- Look for error message (e.g., “API timeout”, “401 Unauthorized”)
| Error | Cause | Fix |
|---|---|---|
401 Unauthorized | Wrong API key | Regenerate key, update agent config |
403 Forbidden | Permissions issue | Grant agent access to resource |
404 Not Found | Wrong endpoint URL | Verify URL in action configuration |
408 Timeout | API too slow | Increase timeout (default 5s → 10s) |
500 Internal Server Error | External API down | Check API status page, add retry logic |
SSL Certificate Error | HTTPS issue | Verify certificate valid, not expired |
- IP allowlisting (agent’s IPs may need to be allowlisted)
- Rate limiting (agent may hit limits faster than manual testing)
Transfer fails
Transfer fails
Diagnostic:
- Agent says “Transferring…” but call drops or nothing happens
- Check Logs → Actions for transfer status
- Check number format (must be E.164: +1234567890)
- Verify number is reachable (call it manually)
- Some carriers don’t support SIP REFER for transfers
- Solution: Use “attended transfer” instead of “blind transfer”
- Or enable “call bridging” mode
- Check with carrier if transfers are enabled
- May need to upgrade SIP trunk to support
Knowledge base not retrieving
Knowledge base not retrieving
Diagnostic:Common Causes:1. KB Not Linked to Agent:
- Agent says “I don’t know” when answer is in KB
- Test retrieval manually:
- Check agent settings → Knowledge → Verify KB assigned
- Recently uploaded content takes 2-5 minutes to index
- Check KB status shows “Indexed” not “Processing”
- Customer asks “Can I get money back?” but KB says “Refund Policy”
- Solution: Add synonyms, rewrite KB content to match common phrasings
- Minimum similarity threshold too high (e.g., 0.9)
- Solution: Lower to 0.7-0.8
Capacity & Performance
Hitting concurrency limits
Hitting concurrency limits
Symptom: Busy signal, calls queued for minutes, “All agents busy” messageDiagnostic:
- Check Dashboard → Capacity for current/max concurrency
- Review call volume graph for peak times
- Enable Queueing: Let callers wait instead of busy signal
- Add Overflow Agent: Route to backup agent when primary at capacity
- Extend Business Hours: Spread volume over more hours
- Callback Offer: “We’ll call you back in 10 minutes”
- Upgrade Plan: Increase concurrent call limit
- Load Balance: Create multiple agents, distribute numbers
- Auto-Scaling: Enable dynamic capacity scaling (Enterprise)
- Off-Peak Incentives: Encourage calls during low-traffic times
High call volume overwhelming system
High call volume overwhelming system
Symptom: Calls taking longer to connect, queue building, errors spikingDiagnostic:
- Sudden traffic spike (3x+ normal volume)
- Check for:
- Marketing campaign launched?
- Media mention / viral post?
- System outage causing retry storm?
- Rate Limit: Temporarily reduce max concurrent calls to stabilize
- Queue Aggressively: Hold calls instead of dropping
- Fallback Message: “Unusually high volume. Please try again in 15 minutes.”
- Emergency Scaling: Contact support for temporary capacity increase
- Forecast traffic for known events (product launches, sales)
- Pre-scale capacity before expected spikes
- Set up auto-scaling triggers
Costs higher than expected
Costs higher than expected
Diagnostic:
- Check Billing → Usage for breakdown:
- Model API costs
- Telephony minutes
- Storage costs
- Premium features (ML-AMD, etc.)
- Average call >10 minutes (typical is 3-5 min)
- Cause: Agent too verbose, loops, doesn’t end calls
- Fix: Add call duration goals, enable timeout
- Using GPT-4 for simple FAQ calls
- Fix: Switch to GPT-3.5 Turbo (3x cheaper)
- Not using AMD, charging full minutes for voicemails
- Fix: Enable text-based AMD (free) or ML-AMD (+0.50+/VM)
- Using premium transcriber for all calls
- Fix: Use standard transcriber unless accuracy critical
- Enable AMD for outbound campaigns
- Set max call duration (e.g., 15 minutes)
- Use cheaper model for simple calls
- Compress audio recordings (lower bitrate)
- Auto-delete old recordings after 30 days
Debugging Tools & Techniques
Log Analysis
Accessing Logs:- Conversations → Logs → Use filters:
- Status: Failed
- Date range: Last 24 hours
- Agent: specific agent
- Error type: specific error
Live Debugging
Whisper Mode:- Join live call without customer hearing
- Guide agent by whispering instructions
- See transcript in real-time
- Open Conversations → Live Monitor
- Find active call
- Click Whisper
- Speak instructions (agent hears, customer doesn’t)
- Agent stuck → “Transfer to billing department”
- Agent about to give wrong info → “Check knowledge base for pricing”
- Edge case → Guide agent through unusual scenario
Testing in Production
Canary Calls:- Make test calls during business hours
- Use known scenarios
- Compare to expected behavior
- Tag as test for filtering:
metadata: {test_call: true}
- Automated test calls every hour
- Verify key flows still work
- Alert if tests start failing
When to Contact Support
Contact support@itellico.ai when:- Platform Issues: Widespread outages, API errors affecting all agents
- Security Incidents: Suspected breach, unusual access patterns
- Billing Disputes: Unexpected charges, usage discrepancies
- Carrier Issues: SIP trunk not registering, number porting problems
- Bug Reports: Clear platform bugs (not configuration issues)
- Agent ID and call IDs showing issue
- Error messages from logs
- Steps to reproduce
- What you’ve already tried
- Business impact (how many users affected)
- Critical (production down): <1 hour
- High (major feature broken): <4 hours
- Medium (workaround exists): <24 hours
- Low (feature request, question): <48 hours