If you have spent much time on YouTube recently, you might have noticed the influx of videos with computer synthesized voices. For whatever reason, people (or more likely bots) have been feeding news stories and articles through a text-to-speech engine, and turning the output into a video that gets uploaded to YouTube and given a clickbait title. Perhaps the most annoying thing about these videos (aside from their complete lack of original content) is that the synthesized voice sounds completely robotic. Words are often mispronounced, and there is a total lack of any sort of humanlike speech pattern variation.
Like all things in technology, however, text-to-speech engines improve over time. Today’s text-to-speech engines, in spite of sounding robotic, are vastly superior to the ones that I grew up with in the 1980s. So with that in mind, it seems completely plausible that computer speech could eventually become indistinguishable from authentic human speech. But let’s take things one step further. What if that completely realistic synthesized voice were paired with an AI engine that allowed it to learn to have a conversation with a human? And what if it could do all of this over the phone? This is the near future according to Google.
Of course, it is not exactly unusual for tech companies to give us futuristic predictions of how their technologies will shape the world. This video from Microsoft illustrates how the company once envisioned the future of business travel. In the case of Google, however, having a humanlike virtual assistant make calls on your behalf is not one of those futuristic concepts that may never see the light of day but rather is based on technology that already exists and that will be publicly available in the near future.
The underlying technology that can make all of this happen is something that Google calls Duplex. There are a few different parts to Google Duplex, but the first is a truly natural sounding speech engine. The speech processor not only uses voice inflection but also inserts things like ums and ahs. These types of linguistic imperfections go a long way toward making speech sound more natural.
Rumor has it that Google may have initially tested its Google Duplex technology using a more robotic-sounding speech engine but found that people were unwilling to interact with it over the phone. I think that it’s probably safe to say that most of us have a natural aversion to robocalls. Even if a robotic voice is being used to do something as innocuous as booking a restaurant reservation, the robot voice adds an air of illegitimacy to the process. The recipient of the call may be quick to dismiss the call as being fraudulent.
Of course, the other big piece of the puzzle was making it so that Duplex is able to carry on a conversation with a human. Google had of course already laid the groundwork for this when it created Google Now. After all, a computer absolutely cannot carry on an intelligent conversation with someone unless it is able to understand what the person is saying.
The flip side to this, of course, is that the computer has to be able to formulate an intelligent response based on the speech input that it has received. This part of the process would probably be impossible without the use of machine learning.
How well does Duplex work?
By now you are probably wondering how well Google Duplex works. I haven’t yet had an opportunity to try out Duplex myself, but I have heard from various sources that it works exceptionally well. I have been told that it is not only really hard to tell that you are talking to a computer, but that Google Duplex does a good job of recognizing what is being said, and responding appropriately.
Surprisingly, Google Duplex even seems to know how to handle a really difficult phone call. In a demo that was recorded earlier this year, Google shows how Duplex goes about booking an appointment at a hair salon and then providing the user with a notification when the booking is complete. As if that demo were not impressive enough, Google demonstrates Duplex trying to make a restaurant reservation when “the call actually goes a bit differently than expected.” Assuming that this demo call was real (and some have doubted its authenticity), Duplex handled the call better than some humans probably would have.
This raises a point that I have yet to hear anyone address — phone manners. When Google Duplex places a phone call on your behalf, it reflects on you. After all, Google Duplex is presumably making an appointment or a reservation in your name. Growing up, my parents taught me to always say please and thank you, and to show respect for whomever I am talking to. I have tried to continue doing this even as an adult. As such, I would be really uncomfortable with Google Duplex placing a call on my behalf if there was a chance that Duplex might treat the recipient of the call rudely. Thankfully, the demos seem to indicate that Duplex is programmed to be polite.
What can Google Duplex do?
If you watched the demo video that I linked to earlier, then you have seen that Google Duplex can be used to schedule a hair appointment or to make a restaurant reservation, but you may be wondering what else Duplex is and is not capable of.
It probably goes without saying, but while Google Duplex can place calls on your behalf, it cannot impersonate your voice. Hence, you won’t be able to use Duplex to call that obnoxious relative and listen to their political rants so that you don’t have to.
So what can Duplex do? From what I have heard, Duplex will initially be able to do things like making a hair appointment, making a restaurant reservation, or enquiring about a business’ operating hours. Because of the complexity of having a computer interact with a human in a natural way, it will take some time before Google Duplex will be able to make other types of calls.
An amazing future
The Google Duplex demos that I have seen have been nothing short of amazing, and I think that the technology holds enormous potential. In the future, for example, a Duplex-like technology may be able to call EMS and relay key information such as location and medical history if sensors that have been surgically implanted in the body detect that a heart attack is happening.
Of course, the opposite is true. Every new technology gets abused. I can just imagine the future YouTube videos in which people use Duplex to troll various businesses. Trolling may seem unlikely since Duplex is the one who is controlling the call, but I have little doubt that a Duplex SDK will be released eventually, and then it’s game-on for the trolls.
As ironic as it may be, one of the most useful things that Google could eventually do with Duplex is to make it so that Duplex figures out how to navigate all those seemingly endless telephone prompts for us so that we can speak to an actual person. Granted, this might not be in Google’s plans, but it would be helpful. Maybe Duplex can even be designed to wait on hold for us so that we don’t have to.
Featured image: Shutterstock