Skip to content
Telephony Foundation

What is SIP Trunking?

Session Initiation Protocol (SIP) is the foundational signaling standard that powers virtually every phone call on the modern internet. It is how two endpoints agree to start, modify, and end a real-time communication session.

For Voice AI developers, SIP is the vital bridge between the global telephone network (PSTN) and modern application infrastructure. It translates standard phone calls into digital streams your AI agents can interact with.

The Golden Rule of Telephony

SIP is a signaling protocol only. It handles call routing and setup, but carries absolutely zero audio. The actual voice data travels completely separately over RTP (Real-time Transport Protocol). Grasping this split is critical.

How SIP Works

SIP operates similarly to HTTP, utilizing text-based requests and responses to negotiate settings like audio codecs via SDP (Session Description Protocol). It typically runs over UDP for speed, but supports TCP and TLS for secure, reliable signaling.

SIP: The Signaling Layer

  • Runs via UDP, TCP, or TLS
  • Text-based messages (like HTTP)
  • Negotiates codecs and ports
  • Handles ringing, answering, hanging up
  • Zero audio bytes ever pass through SIP

RTP: The Media Layer

  • Always runs over UDP (prioritizes speed)
  • Binary packets, 20ms audio frames
  • Carries G.711 µ-law or G.722 audio
  • Direct path between endpoints
  • Fully distinct from SIP servers

"When you hit the classic 'one-way audio' bug during testing, SIP negotiation succeeded flawlessly. The isolated failure is in the RTP path, likely blocked by symmetric NAT or restrictive UDP firewall rules."

The SIP Handshake Flow

SIP Packet Timeline
PSTN CALLERYOUR ENDPOINT
Caller━━INVITE (SDP offer)━━▶Endpoint
Caller◀━━100 Trying━━━━━━━━Endpoint
Caller◀━━180 Ringing━━━━━━━Endpoint
Caller◀━━200 OK (SDP answer)Endpoint
Caller━━ACK━━━━━━━━━━━━━━━▶Endpoint
Caller◀══RTP AUDIO STREAM (UDP)══▶Endpoint
Caller━━BYE━━━━━━━━━━━━━━━▶Endpoint
Caller◀━━200 OK━━━━━━━━━━━━Endpoint

SIP Trunking for Voice AI

A SIP Trunk acts as a virtual, intensely scalable telephone line linking your infrastructure directly to carrier networks like Vobiz. When external users dial your standard phone numbers, Vobiz routes SIP INVITEs mapped directly to your Voice AI hosting endpoints.

Endpoint Traversal Path

PSTN
Human dials +1 (555) 001-1234
VOBIZ
Provider captures, authenticates, proxies INVITE
PLATFORM
LiveKit / Vapi instantly answers signaling layer
AI AGENT
RTP Audio processed through STT → LLM → TTS stream

Advantages

Universal Access

It is the absolute global standard for telephony.

Native Transfer Support

Includes REFER techniques allowing invisible, live human agent hand-offs.

Enterprise Integration

Connects directly into massive corporate PBX, Teams, and Cisco deployments.

Wideband G.722 Audio

Permits utilizing richer 16kHz audio inputs boosting STT exactitude.

Friction Points

Setup Latency

The handshakes naturally append 1+ second initial delays to connection times.

Firewall Traversal

Asymmetrical UDP routing demands intricate, often painful port allowances.

Protocol Imbalance

SIP concepts do not perfectly map onto fluid AI interruptions or token streams.

Silent Failures

Complex ACL and URI routing bugs present purely as dropped signals rather than explicit error logs.

Platform Compatibility

LiveKit
Primary SIP Hub
View Implementation

Runs a bespoke SIP Service that absorbs trunks cleanly, outputting standard SIP participants into your WebRTC rooms invisibly. Retains native encryption, Krisp noise filtering, and rapid referrals natively.

VAPI
Unified Telephony
View Implementation

Exposes unified SIP destinations permitting BYO trunks seamlessly. Supports deep custom headers permitting dynamic AI template injection from CRM lookups straight from the raw INVITE.

Retell AI
Elastic Integration
View Implementation

Delivers twin paths encompassing full elastic network control bridging Vobiz directly or employing dynamic API registration hooks for highly-controlled application flow bindings prior to SIP connectivity.

ElevenLabs
Native SIP Integration
View Implementation

ElevenLabs Conversational AI connects directly to PSTN phone calls via SIP trunking — no intermediary platform required. Point a Vobiz SIP trunk at ElevenLabs' SIP endpoint and inbound calls are answered directly by an ElevenLabs AI agent. ElevenLabs also provides standalone TTS, STT, and voice cloning services that other SIP-based pipelines (LiveKit agents, etc.) can consume as a voice synthesis layer.

Developer Pitfalls

01Assumed Authentication Labels

Dashboard display names do not represent rigid SIP URI authentication digest targets. Utilizing your dashboard profile name inside the SIP pipeline will universally trigger 401 Unauthorized faults.

02IP Egress Rotation

Statically trusting platform IPs via firewall ACL logic is frail. SAAS endpoints utilize vast dynamic egress gateways meaning silent disconnections arrive whenever internal load balancing resets.

03The "One-Way Audio" Effect

If signaling negotiates via TCP smoothly but your restrictive firewall blindly halts unstructured UDP streams mid-flight, callers plunge into total asymmetrical silence.

04Inbound Geographical Locks

Dialing distinct international destination arrays universally triggers 403 blocks unless complex geographical permissions have specifically been overridden natively at the primary carrier level.

When To Choose SIP

LiveKit Topology
Requires Direct Handoff Transfers
Enterprise PBX Links
Regulated Data Trafficking
Managed SaaS Interacts
Immense Scale Routing

Want pure raw data throughput?

If custom network programming bypassing robust platforms appeals directly to your operational strategy, consider reviewing raw direct streams.

Read WebSocket Docs