IVR DTMF Voice Engine
Explore the Implementation
git clone https://github.com/vobiz-ai/Vobiz-Livekit-IVR-DTMF-Example.git
cd Vobiz-Livekit-IVR-DTMF-ExampleThis blueprint demonstrates a multi-level IVR menu that dynamically accepts both resilient DTMF key presses (push-button) and fluid spoken vocal responses using an explicit IVRState tracker logic natively tied to the session boundary. Beyond just receiving inputs, this agent is also capable of dialing out and sending DTMF tones upstream using robust RFC 4733 byte payloads to navigate legacy SIP structures automatically.
How It Works
In production environments, agents must reliably handle scenarios where callers are in loud environments and prefer to press numbered keystrokes rather than shouting at their phones, bridging legacy SIP specifications securely into the AI lifecycle.
- Inbound DTMF Trapping: The LiveKit worker natively binds a callback listener to
@ctx.room.on("sip_dtmf_received"). Whenever the caller punches a physical digit into their hardware, Vobiz captures it, packages it as an RFC 2833 packet, and LiveKit triggers the handler completely autonomously. - State-Machine Management: When a digit fires, the agent immediately interrogates the active
IVRState.menu_levelstruct. Based on the tier ("main" vs "support"), the handler executes isolated branch logic and generates a precise text payload usingsession.generate_reply()deterministically independent of the LLM. - Unified Spoken Inputs: To unify the interface, an
@llm.function_toolnamedroute_choice(digit: str)is explicitly exposed to the underlying OpenAI Prompt. This allows the model to map user phrases like "Give me hardware support" directly to the underlying string digit "1", triggering the identical deterministic routing protocol seamlessly. - Outbound DTMF Emulation: Conversational AI frequently needs to automate dialing into static external systems (e.g. "To speak with a representative, press 1"). By calling
local_participant.publish_dtmf(code, digit), the LiveKit agent synthetically forces audible keypress tones back out into the trunk stream programmatically leveraging a strictly mapped DTMF sequence dictionary.
Implementation Code
By directly attaching event listeners against the raw LiveKit Room, analog keypad tones successfully decouple explicitly interrupting speech generation loops effortlessly:
import asyncio
from livekit.rtc import Room
from typing import Callable
def bootstrap_room_listeners(room: Room, get_agent: Callable):
"""Binds strict RFC 2833 listeners natively before subscribing to Audio layers."""
@room.on("sip_dtmf_received")
def _on_dtmf(dtmf_payload):
print(f"User Pressed Key: {dtmf_payload.digit}")
# Manually interrupt any currently flushing TTS speech queues immediately
agent = get_agent()
if agent:
agent.interrupt()
# Navigate programmatic logic deterministically completely bypassing LLM inference
if dtmf_payload.digit == "1":
asyncio.create_task(route_tier_sales())
elif dtmf_payload.digit == "2":
asyncio.create_task(route_tier_billing())
elif dtmf_payload.digit == "0":
asyncio.create_task(repeat_main_ivr_menu())