Speech recognition technology gets smarter

By Yoon Ja-young

Speech recognition technology, which hasn’t seen much use in the decades since its debut, is starting to be applied in various fields. Major improvements in network and computing technology are being coupled with speech recognition to make our life more convenient.

The technology was first developed in the early 1950s, but it remained almost unused until the early 2000s when it began being applied to cell phones and home automation. Even then, it had only limited use due to poor accuracy.

Choi Eun-jeong, a research fellow at Samsung Economic Research Institute, said in a report that better networks and computing technology has pulled speech recognition technology onto the central stage. “In speech recognition technology, it searches for statistically similar speech. Hence a large compilation of voice data enhances accuracy. The network and computing technology, capable of storing and processing a large amount of voice data, is crucial,” Choi said.

Google boasts 95 percent accuracy in this field, owing it to its huge database of 230 billion English words, categorized by gender, age and even dialect. According to Google, mobile speech input grew six times during the past year. It is making use of this data to understand how users speak. Hence, Google is enjoying a virtuous circle where more users lead to more data accumulated, providing better accuracy in the service. The better service, of course, is drawing more people to the service.

Choi also points out that IT products have become complicated and more diverse. Therefore the demand for voice recognition, where people can control them easily with words, is increasing. “Speech recognition enables users to control devices with various functions without having to study or receive training,” she said. From the simplest form of turning the device on and off the technology has reached the level where users can search for TV programs through voice functions.

The researcher said that the technology enables people to input information while moving or working.

“This enhances safety and productivity.”

In cars people can control multimedia or navigation systems while their hands hold the steering wheel, making driving safer.

The technology fits services that should be tailored for individuals. “As the voice tells identity, psychology, health and language ability, the technology can be used to provide tailored services for individuals in security, medical areas and education,” Choi said.

It is also useful for real-time processing of information as the data will be input much faster than by typing. At the Shinhan Card call center, the customer says which service he wants instead of listening to all the menus. They are then automatically directed to the person they need to speak to.

Choi points out that the technology is broadening its scope beyond IT, being applied in cars and medical, broadcasting and education services. She cites statistics according to which 150,000 doctors in the United States were using electronic health records with speech recognizing capability as of 2010. Another report predicts that over 47 percent of cars released in the global market in 2015 will have speech recognition capabilities.

Choi said businesses are using the technology to boost IT demand, expanding the customer base to those marginalized by advanced devices. Products and services easily controlled by voice, especially targeting children, senior citizens and the disabled are being developed. Mitsubishi Electronics’ “touchless call,” lets the user summon an elevator and select the floor he wants with words.

“More than anything else, businesses should invest in speech recognition products for senior citizens to cope with the aging society,” the researcher advised.