Voice Calls
How It Works
The MCP server is infrastructure only — it relays text between the caller and your AI agent, but never generates AI responses itself. Your agent is the brain; the server is the telephone.
Live Voice Call Flow
Inbound Call
-> Twilio webhook hits /webhooks/:agentId/voice
-> Server returns ConversationRelay TwiML
-> Twilio opens WebSocket to /webhooks/:agentId/voice-ws
-> Human speaks -> Twilio STT -> Text
-> Text sent to your AI agent (via MCP sampling)
-> Agent responds with text
-> Text sent to Twilio -> Twilio TTS -> Human hears
Twilio handles STT and TTS. The server only passes text back and forth.
Three Response Paths
| Path | When | What Happens |
|---|---|---|
| Agent Sampling | Agent connected via SSE | Caller's speech goes to agent via MCP |
| Answering Machine | Agent not connected, Anthropic key set | Built-in Claude fallback collects message |
| Hard-coded Fallback | Agent not connected, no key | Plays "unavailable" message |
Making Outbound Calls
{
"agentId": "my-agent",
"to": "+15559876543",
"greeting": "Hi, this is your AI assistant calling about your appointment.",
"systemPrompt": "You are a friendly appointment reminder assistant."
}
Once answered, a live two-way conversation begins using the same ConversationRelay flow.
Answering Machine
When the AI agent is not connected (8-second timeout):
- Apologizes to the caller on behalf of the agent
- Asks for message and preferences (e.g., "call me back after 8 AM")
- Stores everything in the dead letter queue
When the agent reconnects, dead letters are automatically dispatched via comms_get_waiting_messages.
ANTHROPIC_API_KEY=sk-ant-... # Required for answering machine
Without the key, callers hear a hard-coded "unavailable" message.
Voice Messages (TTS)
Pre-recorded messages (not live conversations):
{
"agentId": "my-agent",
"to": "+15559876543",
"text": "Reminder: your appointment is tomorrow at 3 PM."
}
Generates TTS audio and delivers as a phone call.
Call Transfer
{
"agentId": "my-agent",
"callSid": "CAxxxxxxxx",
"to": "+15551234567",
"announcementText": "Connecting you to a human agent now."
}
Voice Configuration
DEFAULT_VOICE_GREETING="Hello, how can I help you today?"
DEFAULT_VOICE_ID=EXAVITQu4vr4xnSDxMaL
DEFAULT_VOICE_LANGUAGE=en-US
All settings can be overridden per call via tool parameters.
Compliance
- TCPA: No calls before 8 AM or after 9 PM local time
- DNC: Do Not Contact list checked before every outbound call
- Recording Consent: Two-party consent state detection
- Content Filter: Greeting text checked before the call starts