AI Case Study

Google trials Duplex as a natural conversation agent and assistant to automate scheduling activities

Google is trialling Duplex, a voice assistant program which automates scheduling activities such as booking appointments over the phone. The underlying AI involves a deep neural network method called WaveNet, with Duplex working on voice-to-voice interactions with humans.



Internet Services Consumer

Project Overview

From Venture Beat: "Google today shared a new demo of Duplex, its conversational AI that makes phone calls on behalf of Google Assistant users, and revealed more details about how the AI will work when speaking to businesses and customers. Initial use cases will involve making hair salon appointments and restaurant reservations... Tests of the experimental Duplex service will begin in the coming weeks. Google uses DeepMind’s WaveNet speech synthesizer to generate a human-like voice and speaking rhythm and incorporate the sorts of informal words often found in human speech, like 'um,' 'uhh,' and 'mhmm.'"

However, as The Verge Reports, "Duplex can’t handle everything, and so it will be paired with a bank of human operators who can take over a call if it goes sideways... Those operators serve two purposes: they handle calls that Duplex can’t complete and they also mark up the call transcripts for Google’s AI algorithms to learn from. There are a few states where Duplex won’t work... but it should start making calls in the vast majority of the US soon. Duplex only works in English for now, but Google has worked to ensure that it is able to understand lots of dialects and accents."

Reported Results

As reported by Venture Beat, a spokesperson for Google claims that "Duplex has been able to handle 80 percent of calls prior to public rollout without the need for human intervention".


Duplex utilises the WaveNet technology developed by Google DeepMind, which is "a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones" according to the arXiv research paper. "Because models with causal convolutions do not have recurrent connections, they are typically faster to train than RNNs, especially when applied to very long sequences. One of the problems of causal convolutions is that they require many layers, or large filters to increase the receptive field. In this paper we use
dilated convolutions to increase the receptive field by orders of magnitude, without greatly increasing computational cost... Although WaveNet was designed as a generative model, it can straightforwardly be adapted to discriminative audio tasks such as speech recognition. WaveNets can be conditioned on other inputs in a global (e.g. speaker identity) or local way (e.g. linguistic features). When applied to TTS [text-to-speech] WaveNets produced samples that outperform the current best TTS systems
in subjective naturalness. Finally, WaveNets showed very promising results when applied to music audio modeling and speech recognition."


R And D

Product Development


According to The Verge, "Duplex is the AI system designed to make human-sounding voice calls on your behalf so as to automate things like booking restaurant tables and hair appointments."



Raw audio waveform