A few months ago, I was driving through a construction zone in heavy traffic, while also towing a large trailer behind my truck. It was one of those high-stress situations in which I really didn’t want to take my attention off of the road. While I was dealing with the chaos, the digital assistant in my truck informed me that I had just received a text message from my wife and asked if I wanted to read it or ignore it. Since that wasn’t exactly a good time to be texting with someone, I verbally responded to the digital assistant by saying the word “Ignore.” Rather than simply ignoring the message as I had instructed it to do, the device took it upon itself to tell my wife that I was ignoring her. Seriously. The personal digital assistant actually texted my wife a message that said “ignore.” I had fun trying to explain that one.
As amazing as personal digital assistants may be, they can also be kind of stupid at times too. I’m not just talking about a device that decides to rat me out to my wife. We’ve all probably seen situations in which a personal digital assistant responds to a question with an answer that is incorrect, unhelpful, or way out in left field. But why is this? Why aren’t these AI devices more intelligent?
When it comes to artificial intelligence, conventional wisdom states that a device will get “smarter” as it receives more training. The voice dictation software that I am using right now to write this article, for example, has become more accurate over time because it accumulates more training data every time that I use it.
As important as ongoing training may be, however, it may not be the ultimate answer for making AI more intelligent. Training helps make an AI engine, such as those used by personal digital assistants, better able to predict certain behaviors, but it does not enable the engine to establish a true understanding of a situation. Let me give you an example.
‘Smart’ suggestions that are just plain dumb
When you type a message on your smartphone, the onscreen keyboard offers shortcuts in an attempt to reduce the amount of typing that you have to do. These shortcuts work by attempting to predict which word is most likely to come next, based on what has already been typed. The problem with this is that the shortcuts are based on mathematical probabilities, rather than on an understanding of what you are really trying to say. Because of this, the shortcut suggestions are sometimes very different from what a person might intend to type. Yesterday, for instance, I sent someone a message in which I intended to say that “I cannot commit to that schedule.” When I typed the word “commit,” my phone did not offer the word “to” as a suggestion. Instead, its suggestions included all kinds of lovely words such as suicide, perjury, and murder.
The point is that the AI engine that drives keyboard shortcut suggestions does not actually understand what you are trying to say. Instead, it uses statistical probabilities to try to predict the next word that needs to be typed.
Long before the era of artificial intelligence, I had a conversation with someone at Microsoft in which I pitched the idea of using a very similar algorithm in Microsoft Word, not for generating typing suggestions, but rather as the basis for an integrated fact checking tool.
In a time when most people had not yet heard of the Internet, Microsoft offered a product called Encarta. Encarta was designed to act as an electronic encyclopedia. My idea was that it could be possible to build a fact checking engine that would find key phrases within a Word document, and compare those phrases against the data from Encarta. If for example, someone were to type the phrase “Samuel Adams was the first president of the United States,” then an Encarta search based on an algorithm much like the one used for generating typing suggestions, would be able to discover that the statement was false, and that it was George Washington who was the first president.
Ultimately, my fact-checking project never got off the ground. Even so, I find it fascinating that the way in which modern personal digital assistants work really isn’t all that different from the algorithms used to generate search suggestions or to perform automated fact checking. I’m greatly over simplifying things, but when someone asks a personal digital assistant a question, the device uses a speech recognition engine to turn the speech into text. From there, it tries to figure out what the person is asking, by comparing the text string against queries from other people who have used similar words. That is probably a big part of the reason why such a device will occasionally give a response that has absolutely nothing to do with the question that was asked.
So with all of that in mind, what would it take for a personal digital assistant and its underlying AI engine to become more intelligent. More training clearly is not the answer, because the ubiquitous nature of these devices means that they receive huge volumes of data every single day, and that data is already being used for training.
In my opinion, the key to making these devices more intelligent is to move beyond predicting word-use and focus instead on understanding the relationship between words. Here is an example. Right now there is an energy drink sitting on my desk in front of me. That drink is an object (linguistically, it’s a noun), but there are attributes that could be used to describe it. A few descriptive words that come to mind are blue, cold, full, carbonated, wet, and the list goes on. An AI engine that truly understands the relationship between words would need to do more than recognize that an energy drink could be described as cold. It would need to know what cold really means, and that other things can also be cold. This might include things like ice cubes, the water in a swimming pool, snow, or perhaps even the cold shoulder that I was afraid my wife was going to give me when I ignored her text.
As such an AI engine begins to more firmly grasp the English language, it could begin to make inferences based on its knowledge. It might for example realize that an energy drink has similarities to a swimming pool because both are cold and wet, but that you probably don’t want to drink from a swimming pool or go swimming in energy drinks.
Making smarter personal digital assistants
I am completely convinced that the key to creating smarter personal digital assistants is to create a backend AI engine that truly understands the meaning and context of words. The big question of course, is how such an AI engine could be created.
While I don’t claim to have all of the answers, I do have an idea. A few years ago, I used a popular language learning application to teach myself French. Early on, the application displays a word in French, followed by about half a dozen pictures. Your job is to figure out which picture matches the word. As time goes on, the application moves away from focusing on single words and begins to present you with simple phrases (a red apple, three children, hot coffee, etc.). Once you have mastered those simple phrases, the application presents you with ever more complex phrases, sentences, and eventually paragraphs.
The point is that this particular application helps you to learn words, and their relationships to one another. There is no reason why an AI engine could not use a similar approach. This would allow the engine to better understand what is being asked of it, and to be more capable of giving good answers to difficult questions.
Featured image: Shutterstock