At first sight, a yes/no question seems like a simple dialog state for a voice user interface designer to craft. The possible responses are just yes or no, right? Well, no, actually. As with many issues in dialog design, surface simplicity masks a deeper complexity. This article describes the complexity of confirmation states, and outlines strategies recommended to improve the caller's experience. Looking at data from many of our deployed applications, we observe that while a large number of users do respond to confirmation questions with yes, no, or one of their colloquial synonyms (such as yup, that's right or no it isn't), a significant number don't. For example: System: Is there anything else I can help you with? As within the context of many live conversations, the user goes beyond yes to specify what they'd specifically like help with. Frequently, users even omit the yes (only specifying the next task). In one of our customer service applications, more than 20% of responses to this question fell into this category. To provide the best user interface, the voice user interface designer needs to handle this type of behavior using a carefully deliberate combination of grammar coverage, effective prompting, and error handling, plus additional steps to avoid endless loops and improve the time to task completion. Grammar Coverage | |
Here, it makes sense to say no and then give an alternate value, but it wouldn't make sense to say something like "Yes. Fulton, California." Once again, the exact design of the grammar is very case-specific and depends on the context. Effective Prompting In our experience, no. The problem is that those prompts place too large a cognitive burden on the user. The prompts ask the user to choose from different types of actions: either saying yes, or naming another city and state. In normal conversation, we are ordinarily given choices that are of the same type but representing logical opposites: yes or no, for here or to go, apples or oranges. In one application that used this type of prompt, we found an increased number of no speech errors (where the user just didn't say anything), fewer users saying 'yes', and other evidence of user confusion. In fact, even when users were presented with the correct city and state, many users repeated that city, instead of just saying 'yes': System: Fremont, California. Say 'yes', or name another city and state. This is definitely not the desired behavior. It increases time to task completion and reduces user satisfaction. So, sometimes simpler is better. We changed the prompt to "Fremont, California. Is that right?" and user confusion was measurably reduced. There was a slight decrease in the number of users that named another city and state (with a corresponding increase in no responses), but it is natural enough to name another city and state in that context that many users do it without prompting. Error Handling System: Getting movies for Fulton, California. Is that right? In this example, the user says an out-of-grammar utterance (by not specifying the state), which correctly gets rejected, but the error prompt isn't helpful. Instead of asking for a city and state, it asks if she wants Fulton, California. Since users can either say yes or no, or name another city and state, what should we do if an utterance is rejected? The user might have been trying to say yes or no, in which case an appropriate error prompt would be "Sorry, I didn't understand. If you want Fulton, California, say 'yes'. Otherwise, say 'no'." Or, the user might have been trying to change the city, in which case an appropriate prompt would be "Sorry, I didn't understand. Please say a city and state, like Boston, Massachusetts." Fortunately, if an utterance is rejected in this yes/no state, we can make an educated guess about what the user wanted to do. Yes/no grammars have low rejection rates, while city-state grammars are significantly more complex and have higher rejection rates (available data shows rates as much as four times higher). So, if an utterance is rejected, it is likely to be because the user named a city and state. Based on this reasoning, when a reject error occurs, we can assume that the user doesn't want to stay in the yes/no state, and we should ask them for a city and state: System: Getting movies for Fulton, California. Is that right? When the user's utterance gets rejected, the system compensates by guessing that she is trying to name a city and state. It takes the reject as an implicit 'no', and takes her to the basic city-state question. This type of approach enables the creation of efficient, effective error handling stemming from a thoughtful, data-driven design approach. Avoiding Endless Loops System: What city and state are you in? If the user is persistent, they could keep saying 'Rockridge, California' and then say 'no' to the confirmation all day, and never get anywhere. To limit such unproductive behavior, the system should keep track of the number of errors, and switch to another strategy after some threshold has been reached. This alternative strategy might consist of capturing the information in another way (such as by zip code), transferring to an agent, or skipping the step if the information is optional. When counting the number of errors, it is crucial to include various different types of behavior, including no match errors, no input errors, saying 'no' in confirmations (as in the example above), and saying 'go back' in confirmations. In some cases, the system should also count other indications of trouble, such as asking for help or requesting to speak with an agent. By keeping track of the total error count and providing an appropriate fallback mechanism, we can help keep errors from spiraling out of control. With these changes, the city-state dialog would look this: System: What city and state are you in? In this example, there are three total errors: first, the user says 'no' in the confirmation, then has a no input error, then a no match error. The threshold for total errors is set to three, here, so after the third error we switch to the fallback strategy of capturing the zip code instead of the city and state. Confidence-Based Confirmations If the confidence is very low (below the standard rejection threshold), the system probably did not recognize the utterance correctly, so the system will play a no match error as usual, like "Sorry, I didn't understand. Please say your city and state again". If the confidence is medium (above the standard rejection threshold but below some other threshold), there is a reasonable chance that the system recognized the utterance correctly, but it's not very certain. So, the system will do an explicit yes/no confirmation to be sure. That yes/no confirmation is the same one we have been describing all along. However, if the confidence is high (above this other threshold), it is likely that the utterance was recognized correctly. Since we're already pretty sure we got the right result, we don't need to do a yes/no confirmation. Instead, we can save time with an alternative flow, such as just echoing back the recognized item, or not echoing it back at all. As an example, let's compare the behavior for a medium-confidence recognition and a high-confidence recognition, in the main menu state of an application : Medium-Confidence Behavior: High-Confidence Behavior: In the medium-confidence case, the system uses a yes/no confirmation, while in the high-confidence case, the system skips the confirmation and goes directly to read out the account balance. Conclusion About Dr. Lisa Guerra: About Dr. Ryan Bush: About BeVocal: |
Published: Friday, February 4, 2005
I am checking out all the amazing and daily updated content on ContactCenterWorld.com and networking with professionals worldwide
Send To Friends Post On My Wall