Voice User Experience

With the rise of Amazon’s Alexa, Google Home, Siri, Cortana, and other voice systems, voice is becoming an important part of pretty much every company’s user experience. Even if your website or app isn’t accepting voice input (yet), your visitor might have spoken to Google or another service in order to conduct the search that ultimately brought them to your website. As well, people might be listening to the text on your website instead of reading it themselves.

And, the chances are good that at some point your website or app will need to be powered by voice and/or you will need to actually build your website to speak to visitors. It hasn’t happened yet for most websites, but I generally suspect it is a question of “when” and not “if” before we see more voice-based interactions instead of tap- or click-based interactions.

With that in mind, what questions do we need to be asking? How does this affect our websites or apps? There are some concerning aspects to working with voice-based input but some opportunities as well.

Accepting Vocal Input

With the exception of a few quiet locations, many interactions we have are in noisy environments. There is background noise in an average office—from various machines to co-workers conversing. If a visitor to your website is browsing while at a grocery store, there may be far too much background noise for your website to hear what the visitor would be saying or for them to hear what your website is saying to them.

This might mean we need to accept simpler vocal input because phrases like “yes” or “no” might be easier to distinguish even if there is a lot of noise in the background. It might mean our websites need to communicate more clearly to users, with shorter or simpler phrases in response to their questions. But even simple responses may not work if there is too much noise. But can everything be reduced to such simple responses?

On the flip side, we have situations where visitors can’t speak to their phone no matter the background noise. If you visitor is browsing for information on your website while also attending a meeting, they wouldn’t be able to talk. If you are on an airplane crowded in with other passengers, you might not be as interested in speaking to your phone or laptop to browse the web. And chances are you may not like it if the passengers seated next to you were speaking to their phones while you are trying to sleep on the plane ride.

What I think this means is that voice needs to be introduced as one of many options. In some circumstances, tapping, clicking, or typing instead of speaking might be a more effective form of input. But in other circumstances, voice interactions will be preferred. For instance, somebody driving a car could greatly benefit from using voice as a means of navigation so that they can keep their hands on the wheel.

The world is noisy, but not all the time or everywhere. So, as we consider how to integrate voice, an interesting opportunity is how voice and tapping can work together. For example, you could accept voice input on form fields to give your visitors an option of speaking or typing a response. This gives visitors more control based on the situation the visitor finds themselves in.

Security Pros and Cons

Voice adds some interesting complications regarding security. Namely, when you speak, not only can your phone hear you but so can everybody around you. Do you really want to speak your banking password while strangers are around? I’d rather not. Do you want the bank’s website to read your information back to you in public? No thank you.

As we think about working with voice, it becomes important to consider when we’ll actually allow vocal input. Allowing input on a password form may not be appropriate and it would make more sense for people to type in their information. At a minimum, even if we make vocal entry of a password a possibility, we may want to warn our visitors about a potential security risk.

Of course, voice may open up new opportunities for new ways to collect passwords and authentication. If passwords are a critical part of your website or your app, then this may be something for you to consider in future updates.

Human Considerations

Moving beyond the logistical challenges of working with voice, there are more fundamental human problems to deal with:

First, we remember visual or tactile stimulation better than auditory stimulation. Just listening to something being read back to you will more often than not be far less effective than actually seeing it or interacting with it. If the goal of your website or app is to help people conduct research about some topic—a product, an event, a concept, or whatever—reading it to them will have reduced long-term impact. As well, these types of results suggest people will be more likely to remember your company based on your visuals instead off something that is read to them. Of course, the opportunity is to mix the two by using visuals and audio together to create an even better experience for your visitors.

The second consideration is that everybody talks differently. Likely, there are at least a few words in this blog post that I’ll pronounce one way and you’ll pronounce another. A fundamental rule of good user experience is to speak the way your users speak. Generally, this means writing to the appropriate background, education, status, or similar aspects of your audience. With this voice, this now also means how words are annunciated if your website or apps speaks to the visitor. It is a challenge, but also a new method of connecting with your visitors.

Final Thoughts

We’re in the early days of voice user experience. Technologically, many of the problems I discussed can and will be overcome—background noise can be filtered out, regional dialects accounted for, and security issues will be addressed. As well, with technological changes, new opportunities will present themselves. Ultimately, we need to start figuring out how to integrate voice interactions into our websites and apps to best support our visitors.