What are the most common applications for: speech recognition and speech verification?
Right now, the most common applications for speech recognition are improved versions of older touch-tone applications, such as dial-by-name directories, telephone banking, and travel reservations. However, applications are rapidly expanding to just about any service where routine calls can be automated—those as disparate as company and school absentee reporting, order tracking, employee benefits enrollment, and off-track horse race betting.
Speaker verification has not seen the same penetration so far as speech recognition. The most common application is for password resets. Security is a high priority area today, so speaker verification will gain much more widespread acceptance, but it will probably take some time before people become familiar with the technology and it's viewed as a proven security tool.
In your opinion, where and how will we be using speech technology in say 5 years time and what challenges will we need to overcome to get there?
The major applications of speech technology will continue to be telephone-based services. There will also be more adoption of multi-modal voice-graphical devices such as speech-enabled PDA's and automobile control systems. The challenges are less technical—the technology works pretty well—and more a matter of application designers and users converging on standards and conventions for voice user-interfaces. The graphical user-interface is a good analogy. Ten or fifteen years ago, user-interface designers were presented with a variety of menus, buttons, dialog boxes, etc. that they could use to create an application. They put them together in some way, and users then had to figure out how to use the application. Today, the Windows UI, for example, is pretty standard, and most people can figure out how to use a new application without too much trouble. The same familiarity and standardization will happen with speech interfaces.
What are the common misconceptions of speech recognition?
The most common misconception about speech recognition among the general public is that the goal of the technology is to allow free-form conversation between people and devices, as has been portrayed in literature and films. That may happen eventually, but for the foreseeable future, speech will be used for much more limited and structured dialogs that will nevertheless prove very convenient and useful.
Technically, how advanced are we today with verification and speaker independent recognition?
The technology works very well today for limited, structured dialogs. Dial-by-name directories with tens of thousands of names, for example, are pretty successful at recognizing the right name. But two major technical limitations remain. The first is dealing with conversational speech. The challenge here is not so much a problem of recognizing words, where the technology works well, but more a matter of understanding the meaning expressed by those words. Advances in this area will come more from the field of artificial intelligence. The second limitation is the ability to detect words that are outside the recognizer's vocabulary. Speech recognizers work by matching what they hear against the words and phrases contained in their grammars. If the speaker says something that isn't in the grammar, the system tries to make the best match and may come up with something in its grammar that sounds like what the speaker said, but is wrong. Like a name directory where you ask for new employee Jane Allen, who hasn't yet been added to the system, and instead of saying that it doesn't recognize that name, the recognizer comes back with similar-sounding employee John Alden.
How will these technologies change the way contact centers function?
When deployed to their best advantage, speech technologies will automate routine calls and collect data for screen pops on non-routine calls. They'll increase efficiency tremendously and make agents' jobs more interesting and rewarding because agents will deal less with routine and more with varied and complex issues. This can also shift the role of the agent more towards personalized service, cross-selling, and up-selling. For example, I'll use the automated system to check my bank balance, but if I'm interested in refinancing my mortgage, I'll want to speak with my personal agent, whom I trust to give me good advice.
What are the biggest mistakes contact center managers sometimes make when choosing or deploying speech technology?
The two biggest mistakes contact center managers make when choosing speech technology are, first, not understanding its capabilities and limitations, and second, not getting personally involved enough in design of the user-interfaces and the "sound and feel" of their applications.
Mangers often think of speech technology narrowly as a means to "self-service". In fact, it can also be used to streamline their whole contact center operations by allowing better skills-based call routing and collecting caller data for screen pops. On the other hand, the technology has some limitations, so not all transactions may be good candidates for voice self-service.
The second mistake is treating speech applications like any other software project, and just handing them over to IT. The finished application will determine the impression the company makes on customers when they call. The quality of the voice interface— its ease-of-use and even the sound and "personality" of the voice used to record the prompts—can have a huge effect on customers' perceptions of the company and their satisfaction with their contact center experiences. Contact center managers can and should be involved in the design process and make sure the finished application gives the impression they want to convey.
Will the technology replace the role of the agent and why?
The technology will never replace the role of the agent because its merely the front-end for automated systems. Automated systems will never completely replace agents because there'll always be some people who don't want to deal with the automation, and there'll always be interactions that the automated systems aren't programmed to handle.
About Mark Levinson :
Mark Levinson is president of VoxMedia Consulting, where he helps clients analyze their call processes and develop voice automation and CTI requirements. He writes RFP’s, designs system architectures, and creates implementation plans. He also designs, tests, and tunes voice applications. He has held Director positions at Speechworks (now ScanSoft), and previously worked at GTE and AT&T Bell Laboratories. He holds SB and Sc.D degrees from MIT, and an MBA from Boston University.
About VoxMedia :
Through technical and business process consulting services, VoxMedia works with its clients to determine if, where, and how voice technologies can benefit them. Then make it happen. An independent outlook and real-world experience — from business case analysis to system architecture to post-deployment tuning, from RFP’’s to application code — means VoxMedia knows where these technologies make sense and where they don’t, how to make them work, and where the pitfalls lie.