You're making a call for information on cinema listings. The voice that answers sounds like an automated Jonathan Ross, right down to the trademark lisp. You're intrigued. Surely a machine couldn't lisp? Or could it? The simple answer is yes. In fact, it could answer in any accent, language, style or manner you can think of. Casual or formal, precise or imperfect, the automated voice on the other end of the phone is capable of mimicking a wide range of human pronunciations and styles of speech. Technology has brought us a long way from the Dalek-style delivery typical of early efforts at speech synthesis. While most systems won't fool you for long that you're listening to a real person, they are intelligible and they're getting more and more natural. |
|
With the help of text-to-speech (TTS) technology – which can read any text out loud - it is now possible to access textual information over the telephone. Texts might range from simple messages such as cinema listings or local event information to huge databases. Text-to-speech technology is now moving ahead fast, with human-sounding voices replacing the early robotic delivery of information. We now have a usable and useful technology at our fingertips that's ideal for systems handling large amounts of rapidly changing information. Without doubt, there has been a huge improvement in the quality of synthetic speech over the last few years. In fact, it is generally expected that within 20 years TTS will be an accepted part of our daily routine, with voice-enabled homes, cars, domestic appliances and computers interacting with us in increasingly natural ways, using artificial intelligence, automatic speech recognition and TTS technologies. Some of this is with us now as we already have voice controlled PCs, where speech recognition is used to control the desktop instead of the conventional mouse. But we will have to wait before the type of artificial intelligence that allows for failsafe interaction between man and machine becomes a reality with completely interactive homes and appliances. For business and commercial purposes, advances in speaker verification and natural language understanding mean that new applications can be realised. For example, in the fields of banking, insurance and share dealing, these new technologies enable over the phone transactions to be carried out with appropriate levels of identification and security. But all of these stop short of us being able to have a meaningful conversation with the machine. Automatic speech recognition and artificial intelligence have not yet reached that level. The area where TTS can be used effectively now, however, is the telephone. We can play information over the phone such as local information, train timetables, stock prices, weather reports, etc. If we need input from the caller, we can get them to push the buttons on their phones – or use connected word advanced speech recognition (ASR). Instead of just listening to pre-recorded information, callers can select what they want to hear. They can navigate e-mail accounts, select and listen to messages, book and order tickets and listen to web-content over the phone rather than reading it off a rolling screen. TTS makes economic sense too. Any dynamically changing information source can benefit from TTS. There's no need for information to be re-recorded, to hire professional speakers or to pay studio costs. Just update the input text file and the job is done. Most call centres currently operate through the use of human agents rather than virtual agents. Economic factors, however are providing the impetus for more automated solutions. For a start, TTS technology allows call centres to save money whilst improving their service to customers. It is ideal in situations where information to be read out is either frequently updated or too extensive to record or re-record. Costs can also be saved when updating information that customers get access to over the telephone, as there's no need to employ a professional speaker for recorded prompts or regularly take up the valuable time of a member of staff to record and re-record the updated information. Information can be readily updated in real time, presenting the customer with the most up to date data all of the time. This is ideal for stock quotes, web pages, product information, customer data, spares inventories, goods delivery times and time limited special offers. It's also suitable for menus for restaurants that can be booked on line, weather reports, traffic reports, news bulletins, sports results, cinema showings and betting odds – the list of applications is almost limitless. And bear in mind too that customers often prefer TTS technology as it allows them to access information without feeling under pressure from sales people. From the call centre perspective, TTS technology offers access to more customers. Because this provides another and alternative channel for customers to use to contact them, companies make themselves more readily accessible to their customers, so they have the opportunity to sell more - and generate more revenue. The technology also enables call centres to increase productivity and efficiency for a relatively low increase in costs. For example, companies could use the technology to respond to (typically) 30% of callers – and those calls will cost 90% less to handle. That doesn't mean that companies are likely to get rid of agents, however. Instead, they're more likely to use existing staff to provide higher levels of service, removing the mundane tasks, increasing morale and especially staff retention, a high priority for call centre managers. So, TTS sounds a great application for call centres. But what should you look out for when considering today's product offerings? There are basically two core issues to bear in mind: Firstly, is the speech quality good? Quality is subjective, but to be honest, if people can't understand what's being said, the TTS is absolutely no use at all. This is especially true for more prolonged passages of text. So, it is consistency of output that is important, without placing undue stress on the listener to comprehend what's being said. Obviously, there are natural limitations of speech over the telephone, which affect real human speech as well as synthesised speech, and good quality TTS needs to be tuned to cope with the narrow telephony bandwidth and noisy listening conditions. Secondly, channel density. If lots of people can't access the information at the same time, the application's worthless. Equally, for those seeking to develop and sell applications using speech technology like TTS, channel density is also important in terms of margins and ultimately value for money to their customers. Take into account the licensing fees charged by the majority of the speech recognition and TTS companies for using their technology and the hardware costs of the systems or server platforms required and there are more parts to this equation than just the TTS software. Don't forget the telephony and speech processing requirements on the hardware side too. At Aculab, we have recently released new TTS software that features five major languages, including our new addition Latin American Spanish as well as the world's first variants in voice styles. Providing several pre-configured versions for each voice, Aculab's TTS technology allows software developers to choose from six or eight stylistic variants. These include formal, casual, and Aculab's world first "international" English language voice, which has been carefully designed to give clearer and less noticeably regionalist output, making it wholly acceptable and intelligible to all English speaking customers. In fact, Aculab's TTS resources played a vital role in the development of the world's first interactive dialogue system that can interpret and pronounce Maori. Developed by Sydney-based VeCommerce, a partner of both Aculab and Nuance, the VeCab system has been created for New Zealand's leading taxi company, Auckland Co-op Taxis, which has 75 call centre staff handling approximately 250,000 calls per month. The VeCab system allows customers to place bookings over the phone, without the need for an operator to enter the details into the taxi despatch system. When customers call the VeCab system, their voice is recorded and simultaneously sent to the speech recogniser for processing. The system uses a series of prompts to obtain information, such as the name of the caller and the destination. If the computer doesn't understand something, it automatically prompts the callers to repeat that part of the message. TTS technology from Aculab is then used to confirm the requirements such as pick up location back to the caller. Because the Maori language uses many pronunciations not found in English, New Zealand place names can be a problem for speech technology systems. VeCommerce were able to use Aculab's LexMan dictionary manager, which allows developers to create, update and extend multiple TTS lexica for custom pronunciations, providing a unique phonetic vocabulary for the Maori pronunciations and making them available to the application. TTS really comes into its own here, as recording every street, road and suburb name in New Zealand would have been uneconomical. The result is the world's first combination of natural language speech recognition and TTS synthesis application that can interpret and pronounce Maori and other unique names correctly. For Auckland Co-Op Taxis, the system will reduce call handling costs, increase the ability to handle busy periods more effectively, reduce the need for complex rostering and enable the company to expand its booking capabilities without increasing labour costs. Less than five years ago, TTS conversion was little more than an amusement for most people. Now the systems are intelligible and they are becoming more and more natural. Although you can't expect to hear your local cinema listings read out to you by an exact replica of Jonathan Ross, the system is capable of lisping…………… it's becoming more and more human. Not everyone welcomes new technology and it's important to provide callers with the option to speak to a real person. Tomorrow's best call centre solutions will be those which can neatly combine the best of all media, and TTS plays a vital role in that solution. About the Author About the Company |
Published: Monday, December 2, 2002
I am checking out all the amazing and daily updated content on ContactCenterWorld.com and networking with professionals worldwide
Send To Friends Post On My Wall