They Are Listening: How Machines Understand Your Voice

Ask Siri to call your Mom. Ask Alexa to play Footloose at your Kevin Bacon party. Ask Google what ointment you should use for that rash. We are in constant conversations with our machines. Ask and you shall receive. They are listening. But, how?

Siri has been in our lives for less than a decade. Alexa and Google home only took up residence on living room side tables within the last five years.  Still, many do not know the way these machines take our voices, requests, incessant questions and occasionally lonely, “Hi, Siri, how are you?” and turn them into the answers we seek.


Artificial Intelligence (AI) is a tricky concept to grasp. (Just ask our AI Students.) AI is a machine that demonstrates human characteristics and behaviors, often with varying degrees of autonomy.

To understand how machines use AI to understand our voices, you must understand AI at its basic levels. (Also, it’s 2019, so we should all know how AI works… since it will probably (probably…) take our jobs.) (But, that’s a conversation for a different blog post.)

AI is the “smart” we put in “smartphones.” You wouldn’t call an analog clock “smart” the way you would call an Apple Watch “smart.” Why is that? Don’t they both do a thing that humans do: keep track of time?

Image via HIT Consultant

The Apple watch can tell you when to go to sleep, while monitoring your heart rate, while using your voice to text a friend that you can’t make it to their Mommy Mixer tomorrow because you have “a cold” when you’re really up watching Friends reruns on Nick at night, eating ice cream on an unhealthy binge, which, lets be honest, your Apple Watch probably knows that, too, while it took note of the limited amount of steps you took today. The analog clock will keep ticking. And, it’s been ticking the entire time it took you to read that. Guess what, it’s still just ticking.

AI is a field that divides into different smaller areas, sometimes referred to as “problems” or focuses, of study and implementation. These varying focuses tend to align with different human characteristics we are trying to make machines possess.

Image via Francesco Marconi (Twitter)

It doesn’t take an MIT Computer Scientist to realize that Siri, Alexa, and Google devices all work using advances in the area of Speech. But, in all actuality, it is a combination of focuses (machine learning, natural language processing, and speech) that help our machines take our voices and search the internet with the accuracy of a savvy millennial.


Ask your dog to sit. Will he sit? If he is trained to, yes. Training. That’s the secret to dogs and machines.

Does a dog know what it means to sit? Could it abstractly recognize the action of sitting as something that involves bending your legs, putting your butt on something, staying still and then explain it in a How To Essay for the Beagle, Rover, too?


But, they know the sound of the word “Sit” paired with the behavior of sitting and the reward for sitting, therefore, they know how to “Sit. (That is a hard sentence to follow, but, it makes sense. We checked. We’re humans. We can understand sentences like that.)

Image via The Straits Times

A dog in Spain, for example, may know how to sit, but, for them it is “Siéntate.” They don’t understand Spanish. They don’t speak Spanish. They just know the association between the syllables/phonemes (sounds of words) used in “Siéntate” with the action of sitting through training.

Does Siri know what “Where is the nearest gas station” means? No. But, she does know what the sound of those syllables. She knows those syllables create words that she has learned/been trained to recognize. She knows how to put those words into a text form so she can search the internet. She knows how to read back the responses.

She can know Spanish, too. But, in the same way Spanish puppies do.


There are, of course, complicated aspects of machines. They are not as straightforward as a Dog and Gas Station examples. The mathematics involved in creating speech recognition algorithms is by no means elementary. As it has been explained by other excellent posts on the topic, once anyone, regardless of the computer science background or not, pulls the wool back on our smart devices, we see machines aren’t as miraculous as we think they are.

Humans, on the contrary, still amaze with their ability to create machines that ultimately help us understand and compare the functionality of our own incredible brains and our other natural counterparts (Rover and his animal kingdom friends).

Our Artificial Intelligence course works to provided this blended understanding of the human brain, our cognition, animal and artificial intelligence. It is one thing to know how to program AI, it is another to know how to live and adapt with it and ourselves.

That’s something you can’t ask Siri how to do.

Image via Attivio.com