Voice assistants stumble over regional accents
January 27, 2017
Updated: January 27, 2017 7:23pm
Despite all the current hype about the rise of voice-assisted devices using Alexa and Siri, linguistics researcher Rachael Tatman found people complaining on social media that the technology still doesn’t understand them.
Tatman, a doctoral candidate with the University of Washington’s linguistics department, was one of the speakers at a two-day Virtual Assistant Summit, which wrapped up Friday at the Park Central Hotel. The conference, and an adjacent Deep Learning Summit, drew about 600 people representing companies that are working on advancing artificial intelligence, machine learning and robotics.
Lots of people are already using those technologies in programs like Apple’s Siri. They are moving beyond mobile phones and into smart-home devices like Amazon’s Echo. Other fast-changing technologies include artificial intelligence and consumer robotics, especially with self-driving cars on the horizon.
But there’s still work to be done in each of those areas, speakers said. For example, artificial speech recognition technologies still fall short of the way humans can quickly learn and discern speech patterns from each other in “as little as two sentences,” Tatman said.
But that’s because humans take into account other factors, such as the gender of person talking or whether they’ve previously met someone from the the same region, she said.
Tatman examined YouTube’s automatic captioning program, which can translate spoken words into text in several languages. She found that more errors showed up in translations from speakers who had a Southern accent than from people who lived in California.
“The South is the largest demographic region in the United States,” she said. “If you’re using a voice-based virtual assistant and you can’t deal with Southern speech, you’re going to have problems reaching this market.”
For businesses trying to serve those markets, speech recognition technology could be crucial to future revenue, said Stephen Scarr, CEO of search services Info.com and eContext.
With 20 percent of all searches already done through voice, “this is really important, this is No. 1 on your radar,” Scarr told the developers.
As an example of the challenge, a recent YouTube video showed Amazon’s Alexa misunderstanding a young boy’s request to play a song, and instead offering to play an audio porn channel.
The conference touched on more than just speech technologies. Alonso Martinez, a Pixar Animation Studios technical director, said robot developers could take cues from the ways animators create deep emotional connections with audiences.
“When you’re thinking about a robot, don’t think about it as a generic, faceless thing,” said Martinez, who developed characters in “Up” and “Inside Out,” two of the Emeryville company’s hit movies. “You need to ask what makes them admirable. What are the values that they have that I wish that I had in myself?”
Elena Corina Grigore of Yale University’s Social Robotics Lab said robots now used in manufacturing can work by themselves because they are easily trained to perform specialized, repetitive tasks. But robots are not well-equipped to collaborate with humans, she said.
That’s slowly changing with advances in artificial intelligence. As an example, Grigore played a video of a robot trained to help a person with what can be a complex and maddening task — assembling a chair from Ikea.
Still, Grigore said, “We’re not getting replaced by robots anytime soon. We’re not at a point where the robots have the intelligence or the physical capabilities necessary to perform all of these actions on their own. Anything that is related to common sense or creativity or types of thinking that require on-the-spot flexibility in a dynamic and changing environment is still very hard to achieve for us.”