Detecting speech inputs

Gather's automatic speech recognition (ASR) feature is ideal for accepting both unstructured and structured speech input from users. Structured inputs, in the form of keywords and commands, are suited for use cases that have a finite set of distinct operations for users to choose from, such as interactive voice response (IVR). Adding speech detection to DTMF-driven IVR menus can improve conversions by offering users an easier alternative to navigate through menus, as in this first example.

Examples

Structured Input with DTMF and Speech

XML Response
<Response>
    <Gather inputType="dtmf speech" action="<action url>">
        <Speak>Press 1 or say New Appointment to schedule an appointment. Press 2 or say Cancel Appointment to cancel an existing appointment.</Speak>
    </Gather>
</Response>

Conversational AI with Speech Input

Real-time transcription of fuzzy inputs such as complete sentences, on the other hand, helps to build conversational AI-driven experiences.

XML Response
<Response>
    <Gather inputType="speech" action="<action url>">
        <Speak>Welcome to Mary's Hair Salon. How can I help you today?</Speak>
    </Gather>
</Response>

An easy way to build AI conversational interfaces is by passing transcribed speech received through the Gather XML element to AI chatbot platforms such as Google Dialogflow for NLP-based intent extraction. Also read about how the Vobiz Speak XML element's Speech Synthesis Markup Language (SSML) engine can be used to make your bot's responses sound natural.