Monitoring
Health Checks
Liveness: GET /health
Returns 200 if the server process is running.
{"status":"ok","uptime":123.45,"version":"0.1.0","environment":"development","demoMode":false}
Readiness: GET /health/ready
Checks all provider connections. Returns 503 if any is down.
Prometheus Metrics
GET /metrics
Counters:
| Metric | Labels | Description |
|---|---|---|
mcp_messages_sent_total | channel | Messages sent by channel |
mcp_tool_calls_total | tool | Tool invocations by name |
mcp_rate_limit_hits_total | limit_type | Rate limit triggers |
mcp_http_rate_limit_hits_total | — | HTTP rate limit triggers |
mcp_webhook_received_total | type | Inbound webhooks by type |
mcp_auth_failures_total | — | Authentication failures |
Gauges:
| Metric | Description |
|---|---|
mcp_active_voice_sessions | Current live voice calls |
mcp_active_agents | Provisioned active agents |
Structured Logging
JSON logs to stdout. Compatible with ELK, Datadog, Loki.
{"level":"info","timestamp":"2026-02-15T12:00:00Z","event":"send_message_success","agentId":"agent-001","channel":"sms"}
No PII in logs. Only routing metadata (agent ID, channel, direction).
Audit Log
Immutable append-only log with SHA-256 hash chain. Each entry links to the previous via prev_hash.
| Field | Description |
|---|---|
| id | UUID |
| timestamp | ISO timestamp |
| event_type | PROVISION, DEPROVISION, AUTH_FAILURE, etc. |
| actor | Agent ID or "system" |
| target | What was affected |
| details | JSON context |
| prev_hash | Hash of previous entry |
| row_hash | SHA-256 of current entry |
Any tampering breaks the hash chain.
Alert System
| Severity | Routing |
|---|---|
| CRITICAL | WhatsApp to admin + log |
| HIGH | WhatsApp to admin + log |
| MEDIUM | Log only |
| LOW | Log only |
Configuration
ADMIN_WHATSAPP_NUMBER=+15551234567
ADMIN_WHATSAPP_SENDER=whatsapp:+14155238886
Alert Triggers
| Trigger | Severity |
|---|---|
| Rate limit exceeded | MEDIUM |
| Auth failure / brute force | HIGH |
| Anomaly detected | MEDIUM-HIGH |
| Spending approaching limit (80%) | MEDIUM |
| Provider error | HIGH |
Dashboard
GET /admin/dashboard
Web UI: system health, provider status, active agents, usage summary, recent alerts. Auto-refreshes every 30 seconds.
GET /admin/api/dashboard
JSON API for dashboard data.