Examples Gallery/Machine Detection (AMD)

Answering Machine Detection (AMD) Agent

Explore the Implementation

Bash

git clone https://github.com/vobiz-ai/Livekit-Vobiz-Machine-Detection-Agent-example.git
cd Livekit-Vobiz-Machine-Detection-Agent-example

A pure Python implementation of Answering Machine Detection (AMD) utilizing nothing besides LiveKit's native VAD (Voice Activity Detection) arrays and an ultra-lightweight LLM evaluation wrapper. Completely bypassing the need for expensive dedicated standalone AMD bridging servers, this pattern isolates the stream into a silent initial inference loop, categorizing the answering party deterministically in milliseconds.

How It Works

This heuristic LLM-based protocol dynamically swaps the active context pointers mid-session by interpreting the semantic signature of the first phrase spoken by the answered endpoint.

The Silent Classifier: The core MachineDetectionAgent initiates the room identically to standard agents but overrides the playback pipeline setting tts=None. It strictly monitors the Silero VAD buffer, hooking directly into session.on("user_speech_committed") to siphon the first chunk of transcribed Deepgram audio natively into a generic asyncio.Future.
Hard Timeout Degradation: If absolute silence dominates the bridge, an asyncio.wait_for timeout threshold specifically calibrated (e.g. 4.0 seconds) violently trips to conclusively categorize the receiver as a MACHINE.
Semantic Inference: Instantly parsing standard strings ("Hello?", "Please leave a message"), the buffer passes directly into a stripped-down isolated system prompt locked onto a rigid one-word inference matrix exclusively: HUMAN or MACHINE.
Hotswapping Pointers: The second a designation dynamically validates, the agent inherently fires a severe system override using self.session.update_agent()—immediately discarding the silent AMD shell natively and mutating seamlessly into either the VoicemailAgent or the comprehensive HumanAnswerAgent interchangeably.

Implementation Code

This pythonic control flow manages the strict initial polling constraints enforcing the lightweight AMD listener buffer prior to dispatching expensive Voice LLMs:

Python

import asyncio
from livekit.agents import JobContext

async def run_amd(ctx: JobContext):
    # Connect silently first without deploying the standard TTS greeting
    await ctx.connect(auto_subscribe=True)
    
    try:
        # VAD Sweep: Wait for speech activity bounds
        await asyncio.wait_for(wait_for_first_speech(ctx.room), timeout=4.0)
        
        # Audio triggered: run fast classifier via GPT-4o-mini
        result = await classify_intent(ctx.room)
        
        if result == "MACHINE":
            await trigger_voicemail_drop(ctx.room)
        else:
            # Human validated: Spin up the real computationally heavy session
            await trigger_human_agent(ctx)
            
    except asyncio.TimeoutError:
        # Absolute silence 4s matched: Conclusively assume Machine Voicemail
        await trigger_voicemail_drop(ctx.room)

Example Gallery