Answering Machine Detection (AMD) Agent
Explore the Implementation
git clone https://github.com/vobiz-ai/Livekit-Vobiz-Machine-Detection-Agent-example.git
cd Livekit-Vobiz-Machine-Detection-Agent-exampleA pure Python implementation of Answering Machine Detection (AMD) utilizing nothing besides LiveKit's native VAD (Voice Activity Detection) arrays and an ultra-lightweight LLM evaluation wrapper. Completely bypassing the need for expensive dedicated standalone AMD bridging servers, this pattern isolates the stream into a silent initial inference loop, categorizing the answering party deterministically in milliseconds.
How It Works
This heuristic LLM-based protocol dynamically swaps the active context pointers mid-session by interpreting the semantic signature of the first phrase spoken by the answered endpoint.
- The Silent Classifier: The core
MachineDetectionAgentinitiates the room identically to standard agents but overrides the playback pipeline settingtts=None. It strictly monitors the Silero VAD buffer, hooking directly intosession.on("user_speech_committed")to siphon the first chunk of transcribed Deepgram audio natively into a genericasyncio.Future. - Hard Timeout Degradation: If absolute silence dominates the bridge, an
asyncio.wait_fortimeout threshold specifically calibrated (e.g.4.0seconds) violently trips to conclusively categorize the receiver as aMACHINE. - Semantic Inference: Instantly parsing standard strings ("Hello?", "Please leave a message"), the buffer passes directly into a stripped-down isolated system prompt locked onto a rigid one-word inference matrix exclusively:
HUMANorMACHINE. - Hotswapping Pointers: The second a designation dynamically validates, the agent inherently fires a severe system override using
self.session.update_agent()—immediately discarding the silent AMD shell natively and mutating seamlessly into either theVoicemailAgentor the comprehensiveHumanAnswerAgentinterchangeably.
Implementation Code
This pythonic control flow manages the strict initial polling constraints enforcing the lightweight AMD listener buffer prior to dispatching expensive Voice LLMs:
import asyncio
from livekit.agents import JobContext
async def run_amd(ctx: JobContext):
# Connect silently first without deploying the standard TTS greeting
await ctx.connect(auto_subscribe=True)
try:
# VAD Sweep: Wait for speech activity bounds
await asyncio.wait_for(wait_for_first_speech(ctx.room), timeout=4.0)
# Audio triggered: run fast classifier via GPT-4o-mini
result = await classify_intent(ctx.room)
if result == "MACHINE":
await trigger_voicemail_drop(ctx.room)
else:
# Human validated: Spin up the real computationally heavy session
await trigger_human_agent(ctx)
except asyncio.TimeoutError:
# Absolute silence 4s matched: Conclusively assume Machine Voicemail
await trigger_voicemail_drop(ctx.room)