What is SIP Trunking?
Session Initiation Protocol (SIP) is the foundational signaling standard that powers virtually every phone call on the modern internet. It is how two endpoints agree to start, modify, and end a real-time communication session.
For Voice AI developers, SIP is the vital bridge between the global telephone network (PSTN) and modern application infrastructure. It translates standard phone calls into digital streams your AI agents can interact with.
The Golden Rule of Telephony
SIP is a signaling protocol only. It handles call routing and setup, but carries absolutely zero audio. The actual voice data travels completely separately over RTP (Real-time Transport Protocol). Grasping this split is critical.
How SIP Works
SIP operates similarly to HTTP, utilizing text-based requests and responses to negotiate settings like audio codecs via SDP (Session Description Protocol). It typically runs over UDP for speed, but supports TCP and TLS for secure, reliable signaling.
SIP: The Signaling Layer
- Runs via UDP, TCP, or TLS
- Text-based messages (like HTTP)
- Negotiates codecs and ports
- Handles ringing, answering, hanging up
- Zero audio bytes ever pass through SIP
RTP: The Media Layer
- Always runs over UDP (prioritizes speed)
- Binary packets, 20ms audio frames
- Carries G.711 µ-law or G.722 audio
- Direct path between endpoints
- Fully distinct from SIP servers
"When you hit the classic 'one-way audio' bug during testing, SIP negotiation succeeded flawlessly. The isolated failure is in the RTP path, likely blocked by symmetric NAT or restrictive UDP firewall rules."
The SIP Handshake Flow
SIP Trunking for Voice AI
A SIP Trunk acts as a virtual, intensely scalable telephone line linking your infrastructure directly to carrier networks like Vobiz. When external users dial your standard phone numbers, Vobiz routes SIP INVITEs mapped directly to your Voice AI hosting endpoints.
Endpoint Traversal Path
Advantages
Universal Access
It is the absolute global standard for telephony.
Native Transfer Support
Includes REFER techniques allowing invisible, live human agent hand-offs.
Enterprise Integration
Connects directly into massive corporate PBX, Teams, and Cisco deployments.
Wideband G.722 Audio
Permits utilizing richer 16kHz audio inputs boosting STT exactitude.
Friction Points
Setup Latency
The handshakes naturally append 1+ second initial delays to connection times.
Firewall Traversal
Asymmetrical UDP routing demands intricate, often painful port allowances.
Protocol Imbalance
SIP concepts do not perfectly map onto fluid AI interruptions or token streams.
Silent Failures
Complex ACL and URI routing bugs present purely as dropped signals rather than explicit error logs.
Platform Compatibility
Runs a bespoke SIP Service that absorbs trunks cleanly, outputting standard SIP participants into your WebRTC rooms invisibly. Retains native encryption, Krisp noise filtering, and rapid referrals natively.
Exposes unified SIP destinations permitting BYO trunks seamlessly. Supports deep custom headers permitting dynamic AI template injection from CRM lookups straight from the raw INVITE.
Delivers twin paths encompassing full elastic network control bridging Vobiz directly or employing dynamic API registration hooks for highly-controlled application flow bindings prior to SIP connectivity.
ElevenLabs Conversational AI connects directly to PSTN phone calls via SIP trunking — no intermediary platform required. Point a Vobiz SIP trunk at ElevenLabs' SIP endpoint and inbound calls are answered directly by an ElevenLabs AI agent. ElevenLabs also provides standalone TTS, STT, and voice cloning services that other SIP-based pipelines (LiveKit agents, etc.) can consume as a voice synthesis layer.
Developer Pitfalls
Dashboard display names do not represent rigid SIP URI authentication digest targets. Utilizing your dashboard profile name inside the SIP pipeline will universally trigger 401 Unauthorized faults.
Statically trusting platform IPs via firewall ACL logic is frail. SAAS endpoints utilize vast dynamic egress gateways meaning silent disconnections arrive whenever internal load balancing resets.
If signaling negotiates via TCP smoothly but your restrictive firewall blindly halts unstructured UDP streams mid-flight, callers plunge into total asymmetrical silence.
Dialing distinct international destination arrays universally triggers 403 blocks unless complex geographical permissions have specifically been overridden natively at the primary carrier level.
When To Choose SIP
LiveKit Topology
Requires Direct Handoff Transfers
Enterprise PBX Links
Regulated Data Trafficking
Managed SaaS Interacts
Immense Scale Routing
Want pure raw data throughput?
If custom network programming bypassing robust platforms appeals directly to your operational strategy, consider reviewing raw direct streams.