Skip to content
Examples Gallery/Vobiz Core XML Socket Buffer

Vobiz All XML Base WebSocket Layer

Explore the Implementation

Clone RepositoryView on GitHub
Bash
git clone https://github.com/vobiz-ai/Vobiz-All-XML.git
cd Vobiz-All-XML

An incredibly dense foundational codebase entirely bypassing all third-party SDK dependencies (no LiveKit, no Pipecat). This is the absolute bare-metal architecture for writing custom, ultra-low latency machine learning loops. It operates by instructing Vobiz via standard TwiML/XML <Connect><Stream> payloads to establish a raw, bidirectional WebSocket containing unmodified 8kHz/16kHz uLaw audio frames.


How It Works

When milliseconds represent critical structural limits, removing intermediary proxy servers like LiveKit reduces end-to-end latency natively. This requires manually handling binary packet sequences, Voice Activity Detection (VAD), and network jitter.

  • XML Routing Directives: An inbound Vobiz webhook immediately triggers a standard Python FastAPI endpoint. The server responds not with JSON, but with strict XML (<Response><Connect><Stream url="wss://..."/></Connect></Response>) dictating exactly where Vobiz should open a raw TCP WebSocket.
  • Raw Binary Unpacking: The underlying connection natively shunts base64-encrypted audio payloads. The Python server manually parses these JSON frames ({"event": "media", "media": {"payload": "..."}}) and decodes the G.711 PCMU uLaw arrays into flat byte streams.
  • Concurrent AI Processing: Those discrete byte chunks are fed linearly into Deepgram's streaming STT WebSocket securely. As text fragments yield, they stream into OpenAI's ChatCompletions API asynchronously.
  • Encrypted Base64 Response Iterations: The generated LLM tokens hit a TTS engine (like ElevenLabs or OpenAI) which synthesizes the audio back into uLaw arrays. These physical arrays are Base64 re-encoded and immediately blasted back down the exact same Vobiz WebSocket natively, completing the conversation loop autonomously and securely with zero external library overhead.

Implementation Code

Under extreme real-time scaling scenarios, the baseline architecture simply leverages standard Python web frameworks returning raw Vobiz XML routing instructions pointing immediately against discrete internal WebSocket targets:

Python
from fastapi import FastAPI, WebSocket
from fastapi.responses import HTMLResponse

app = FastAPI()

@app.post("/answer")
async def answer_call():
    """Returns absolute baseline XML payload dictating Vobiz to dynamically connect via WebSocket."""
    return HTMLResponse(
        content='''<?xml version="1.0" encoding="UTF-8"?>
        <Response>
           <Connect>
               <Stream url="wss://YOUR-SERVER.com/ws/session" />
           </Connect>
        </Response>''',
        media_type="application/xml"
    )

@app.websocket("/ws/session")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    # 8kHz bi-directional base64 byte streams launch autonomously and natively here...