Is Voice Recognition the Next Thing in Kiosk Technology?
Kiosks continue to integrate the latest in technology to improve the customer experience, reduce time on task, and extend the various tasks a kiosk can be used to complete. One such technology is Voice Recognition. What are the benefits of Voice Recognition, and can this technology improve kiosk accessibility?
What is Voice Recognition?
Voice Recognition has become a staple in our homes and customer service interactions. It is often interchanged with “Speech Recognition”, though they are two different technologies, more clarification on the differences to follow. Voice Recognition has been around since the 1960s and is defined by Russell Adams as “the technology by which sounds, words or phrases spoken by humans are converted into electrical signals, and these signals are transformed into coding patterns to which meaning has been assigned.” (Sourcebook of Automatic Identification and Data Collection.) For the purposes of kiosk usage, Voice Recognition is typically used as a form of Voice Input, automatically translating a human voice into a computer input, or used as a way to initiate a computer action or command. In our post-COVID-19 reality, Voice Recognition serves as a method for creating a contactless or touchless kiosk experience. However, we cannot assume that all users are capable of or are willing to use a touchless solution.
How Voice Recognition Works
From a technical standpoint, Voice Recognition can process speech in one of two ways. It can leverage preloaded commands and responses, or it can connect to the cloud and various resources available in a larger database and respond according to learned behaviors and previous user interactions.
Speaker-dependent voice recognition requires the system to learn the user’s voice as they read a series of words and phrases. Speaker-independent Voice Recognition on the other hand, recognizes most voices with no training.
Speech Recognition Versus Voice Recognition
Some individuals identify Voice Recognition as the “learning type” of technology and Speech Recognition as the “word identification type of technology.” (Voice Recognition vs Speech Recognition)
As it relates to a kiosk deployment, the independent (word recognition) technology most readily applies, due to the vast number of kiosk users interacting with the kiosk and the limited duration for which they will use the kiosk.
Speech Recognition technology has come a long way since its inception. Modern-day systems such as Google Assistant, Amazon Alexa, Microsoft Cortana, and Apple’s Siri showcase current speech recognition technology capabilities. Across these modern-day systems, speech recognition accuracy rates vary. They have been reported to be as high as 95% with word accuracy (for Google and Cortana). Unfortunately, word accuracy rates vary with a distinct bias around gender and race, showing greater accuracy for users who are male and white. Accents can also lower the accuracy rate.
The ability to accurately answer questions is a different metric and varies based on the number of questions asked. That number ranges from 70-90% depending on the response rate. Accuracy improves when the system opts to answer fewer questions.
Accuracy is of critical importance for kiosks using Speech Recognition, so kiosk deployers should pay close attention to the precision of a solution and the application’s ability to respond correctly to various voices, accents, and commands.
Are Users Comfortable with Voice Commands?
Accuracy aside, there is also valid concern around a lack of comfort in using voice commands in public. Kiosks are only able to use voice or speech recognition in response to voice commands. To date, publicly available studies have explored how people feel about voice commands on a cell phone, but not specifically on a kiosk.
Though it has become more common to do so, some people do not feel comfortable speaking to an inanimate object – such as a cell phone – in public. One study on using voice commands on a phone in public found that some people felt annoyed when others used voice commands around them, and many found it uncomfortable to do so themselves in restaurants, at the gym, and in various other public locations. The study found that the level of comfort with this technology varied by age and gender with lower comfort among women and people who are non-white.
While the study referenced above did not specifically explore user comfort with voice recognition on kiosks, the results indicate it is worth considering the location and environment around the kiosk before implementing any type of speech or voice recognition technology. The environment should be one that allows people to feel comfortable speaking out and providing voice commands.
Voice Recognition and Privacy
Not without due cause, there are concerns around privacy when it comes to voice recognition technology for kiosk deployments in certain environments. Using voice recognition in a healthcare setting, for example, may not offer adequate privacy for a user to comfortably provide voice commands when selecting options or entering data. Banking, Social Security, and other environments that often require entering private information may also find that voice commands on a kiosk are less useful. Conversely, restaurant kiosks may enjoy a lower privacy threshold as the information being submitted is not sensitive. These factors should be considered when determining if voice recognition makes sense for a specific kiosk deployment.
Privacy is also a consideration when it comes to the always on nature of voice command technology. This is known as “always on, always ready” or, more nefariously, “always on, always listening.” Users expect privacy when having conversations – even in public spaces – but the presence of voice technology can eliminate any expectation of privacy.
To be compliant with privacy laws around the globe users must be notified in some way that a voice listening device (voice recognition technology) is present. GDPR (General Data Protection Regulation), for instance, requires that user consent must be given for any data collection to occur.
For voice recognition on kiosks, users need to be alerted that the technology is present, not only to protect privacy, but to communicate it exists so they can take advantage of its convenience. Storm Interface has suggested that kiosks with voice technology use a universally recognized symbol to indicate the presence of speech command/voice recognition technology. Additional best practices for the use of voice recognition are identified in Storm Interface’s working document (PDF, 703KB).
Background Noise and Voice Directionality
One of the major concerns with voice recognition on kiosks is the effect of ambient noise on voice accuracy. There are ways to mitigate background noise, however. If the area around the kiosk is highly trafficked or in a loud room, the microphone’s features are particularly important. Microphones used for speech command must support noise cancellation, noise suppression technology, and potentially direct the voice reception zone toward where the user will be located.
The Benefits of Voice Recognition Technology
Voice Recognition technology can be used as a method for issuing voice commands. This capability is particularly useful for people with disabilities that impact their mobility or dexterity. Using Voice Recognition for issuing voice commands allows users who are blind or who have low vision, who are paralyzed, and those with limited hand or finger mobility the option of using their voice to direct kiosk navigation and enter data. This makes voice recognition and the resulting voice command capabilities a valuable tool to add to kiosks to improve accessibility for users with many types of disabilities.
Voice Recognition can also serve to minimize the amount of contact a user has with the physical kiosk. In a post-COVID-19 reality, minimizing the spread of the virus has become a driving force behind kiosk adoption, and adding options for navigating the kiosk application without contact will reduce the chance of spread even more.
Is Voice Recognition a Solution for Everyone?
Voice Recognition is not a one-size-fits-all solution. Consider, for instance, deaf or hard-of-hearing users and users who are deaf-blind or nonverbal. They are unable to use Voice Recognition technology as a method for data input and will need more traditional navigation and refreshable braille display options for interacting with a self-service kiosk.
Alternatives to Voice Recognition Technology
Voice recognition does not eliminate the need for external input devices or screen reader technology such as JAWS.
Blind users and users with low vision may find voice commands helpful for data input but will still require screen reader software to understand what information is being communicated on the screen. Even with Voice Recognition as a method of voice command for data input and navigation, text to speech technology is needed to confirm that a user has done what they intended to do and that their selection is correct. Also, an alternative input method must be available to those not comfortable with or capable of voice input.
Voice Recognition is a useful technology with the potential to improve access and usability for users with different disabilities. It must be provided as an option, not a full solution – and attention must be paid to accuracy, results, privacy, and microphone quality.