Yanny vs. Laurel, and how Tricky Speech Recognition REALLY is

 

As most folks may already know, this last week saw a new version of the infamous dress debate, in the form of translating a poor-quality recording of a name.

Thankfully, this problem is a bit easier to solve with the original source file than it was with the dress, and you can achieve this with a small amount of audio post-processing. Run the audio through a high-pass filter, and you’ll tend to hear “Yanny”. Run it through a low-pass filter, and you get “Laurel”. But there are other factors that were analyzed as well, such as:

  • The inflection used by the speaker
  • The frequency spikes on particular consonants

In addition, you must take into account a psychological element—your subconscious interpretation of what you hear!

Everything that you perceive, audio included, gets filtered through your brain before you’re consciously aware of it. For example, you can choose which sounds to pay attention to. This is how you’re able to hear someone talking to you at a loud party without noticing any other conversations, but can also switch over to listening in to the woman standing behind you. You’re choosing which sounds to pay attention to.

https://www.popsci.com/yanny-laurel-scientific-evidence#page-3

 

Engineering Speech Recognition

I found this debate of particular interest this week, because I have also worked with a home studio for over 10 years, and have done both digital and analog recording. I even have the reel-to-reel to prove it! I also work in an industry that has seen a boom in audio recognition algorithms and software applications for speech recognition. With Alexa, Cortana, Siri, and whatever Android’s version of Siri is (we call her Giri at home), I wondered—do programmers have to worry about this same confusion?

The short answer is no – as a user, you can usually get them to hear you—and mainly you—just fine. When using a smart home device, always try to remember:

  • If they have any introductory training programs to run, use them – they’ll help the device to understand your inflection and pronunciation.
  • If they have confirmation steps, such as “is this what you wanted?” – answer them. They provide feedback that helps train the algorithm globally.
  • Speak clearly! Don’t expect Alexa to know every nuance of your speech patterns.

In the end, remember that speech recognition and smart devices are still a young technology, and they will continue to improve!

Eric is a Senior Full-Stack Engineer for ClearView Social. He has worked in the web development world for more than 15 years. He’s done time as an author and speaker, and is the organizer of the JavaScript Meetup Group, BuffaloJS.