Voice Calls

How It Works

The MCP server is infrastructure only — it relays text between the caller and your AI client, but never generates AI responses itself. Your client is the brain; the server is the telephone.

Live Voice Call Flow

Inbound Call
-> Twilio webhook hits /webhooks/:agentId/voice
-> Server returns ConversationRelay TwiML
-> Twilio opens WebSocket to /webhooks/:agentId/voice-ws
-> Human speaks -> Twilio STT -> Text
-> Text sent to your AI client (via MCP sampling)
-> Client responds with text
-> Text sent to Twilio -> Twilio TTS -> Human hears

Twilio handles STT and TTS. The server only passes text back and forth.

Three Response Paths

Path	When	What Happens
Client Sampling	Client connected via SSE	Caller's speech goes to client via MCP
Answering Machine	Client not connected, Anthropic key set	Built-in Claude fallback collects message
Hard-coded Fallback	Client not connected, no key	Plays "unavailable" message

Making Outbound Calls

{
"agentId": "my-client",
"to": "+15559876543",
"greeting": "Hi, this is your AI assistant calling about your appointment.",
"systemPrompt": "You are a friendly appointment reminder assistant."
}

Once answered, a live two-way conversation begins using the same ConversationRelay flow.

Answering Machine

When the AI client is not connected (8-second timeout):

Apologizes to the caller on behalf of the client
Asks for message and preferences (e.g., "call me back after 8 AM")
Stores everything in the dead letter queue

When the client reconnects, dead letters are automatically dispatched via comms_get_waiting_messages.

ANTHROPIC_API_KEY=sk-ant-...    # Required for answering machine

Without the key, callers hear a hard-coded "unavailable" message.

Voice Messages (TTS)

Pre-recorded messages (not live conversations):

{
"agentId": "my-client",
"to": "+15559876543",
"text": "Reminder: your appointment is tomorrow at 3 PM."
}

Generates TTS audio and delivers as a phone call.

Call Transfer

{
"agentId": "my-client",
"callSid": "CAxxxxxxxx",
"to": "+15551234567",
"announcementText": "Connecting you to a huma client now."
}

Voice Configuration

DEFAULT_VOICE_GREETING="Hello, how can I help you today?" DEFAULT_VOICE_ID=EXAVITQu4vr4xnSDxMaL

DEFAULT_VOICE_LANGUAGE=en-US

All settings can be overridden per call via tool parameters.

Compliance

TCPA: No calls before 8 AM or after 9 PM local time
DNC: Do Not Contact list checked before every outbound call
Recording Consent: Two-party consent state detection
Content Filter: Greeting text checked before the call starts

← Home