IVR DTMF Voice Engine

Explore the Implementation

Bash

git clone https://github.com/vobiz-ai/Vobiz-Livekit-IVR-DTMF-Example.git
cd Vobiz-Livekit-IVR-DTMF-Example

This blueprint demonstrates a multi-level IVR menu that dynamically accepts both resilient DTMF key presses (push-button) and fluid spoken vocal responses using an explicit IVRState tracker logic natively tied to the session boundary. Beyond just receiving inputs, this agent is also capable of dialing out and sending DTMF tones upstream using robust RFC 4733 byte payloads to navigate legacy SIP structures automatically.

How It Works

In production environments, agents must reliably handle scenarios where callers are in loud environments and prefer to press numbered keystrokes rather than shouting at their phones, bridging legacy SIP specifications securely into the AI lifecycle.

Inbound DTMF Trapping: The LiveKit worker natively binds a callback listener to @ctx.room.on("sip_dtmf_received"). Whenever the caller punches a physical digit into their hardware, Vobiz captures it, packages it as an RFC 2833 packet, and LiveKit triggers the handler completely autonomously.
State-Machine Management: When a digit fires, the agent immediately interrogates the active IVRState.menu_level struct. Based on the tier ("main" vs "support"), the handler executes isolated branch logic and generates a precise text payload using session.generate_reply() deterministically independent of the LLM.
Unified Spoken Inputs: To unify the interface, an @llm.function_tool named route_choice(digit: str) is explicitly exposed to the underlying OpenAI Prompt. This allows the model to map user phrases like "Give me hardware support" directly to the underlying string digit "1", triggering the identical deterministic routing protocol seamlessly.
Outbound DTMF Emulation: Conversational AI frequently needs to automate dialing into static external systems (e.g. "To speak with a representative, press 1"). By calling local_participant.publish_dtmf(code, digit), the LiveKit agent synthetically forces audible keypress tones back out into the trunk stream programmatically leveraging a strictly mapped DTMF sequence dictionary.

Implementation Code

By directly attaching event listeners against the raw LiveKit Room, analog keypad tones successfully decouple explicitly interrupting speech generation loops effortlessly:

Python

import asyncio
from livekit.rtc import Room
from typing import Callable

def bootstrap_room_listeners(room: Room, get_agent: Callable):
    """Binds strict RFC 2833 listeners natively before subscribing to Audio layers."""
    
    @room.on("sip_dtmf_received")
    def _on_dtmf(dtmf_payload):
        print(f"User Pressed Key: {dtmf_payload.digit}")
        
        # Manually interrupt any currently flushing TTS speech queues immediately
        agent = get_agent()
        if agent:
            agent.interrupt()
        
        # Navigate programmatic logic deterministically completely bypassing LLM inference
        if dtmf_payload.digit == "1":
            asyncio.create_task(route_tier_sales())
        elif dtmf_payload.digit == "2":
            asyncio.create_task(route_tier_billing())
        elif dtmf_payload.digit == "0":
            asyncio.create_task(repeat_main_ivr_menu())

Example Gallery