Cookie Preference Centre

Your Privacy
Strictly Necessary Cookies
Performance Cookies
Functional Cookies
Targeting Cookies

Your Privacy

When you visit any web site, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences, your device or used to make the site work as you expect it to. The information does not usually identify you directly, but it can give you a more personalized web experience. You can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, you should know that blocking some types of cookies may impact your experience on the site and the services we are able to offer.

Strictly Necessary Cookies

These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site may not work then.

Cookies used

Performance Cookies

These cookies allow us to count visits and traffic sources, so we can measure and improve the performance of our site. They help us know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies, we will not know when you have visited our site.

Cookies used

Google Analytics

Functional Cookies

These cookies allow the provision of enhance functionality and personalization, such as videos and live chats. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies, then some or all of these functionalities may not function properly.

Cookies used




Targeting Cookies

These cookies are set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant ads on other sites. They work by uniquely identifying your browser and device. If you do not allow these cookies, you will not experience our targeted advertising across different websites.

Cookies used


This site uses cookies and other tracking technologies to assist with navigation and your ability to provide feedback, analyse your use of our products and services, assist with our promotional and marketing efforts, and provide content from third parties


Here are some suggested Connections for you! - Log in to start networking.

Article : Speech Recognition In The Call Center: The Impact Of Telephony Boards

Call center managers spend millions of dollars annually in training their agents, and millions more are spent on quality management systems in order to promote a high level of customer service. It is taken for granted that a call center's competitive advantage lies in retaining and raising the performance of its agents.

But what about the automated systems employed to speed up call time and make agents more efficient? How does the call center manager raise the performance of the IVR or the self service systems to the level of a human agent? How can these systems be designed to deliver a satisfying user experience while maintaining cost efficiency?

This article describes how to improve the accuracy and performance of automated speech-based services in order to meet the high expectations of today's demanding users.

Today's Automatic Speech Recognition (ASR) technology is extremely proficient at recognition. In fact many vendors claim an accuracy of 95% and more. However, these claims are largely based on tests run in the lab, under controlled conditions. In the real world, automated systems have to contend with poor quality connections, background noise, echo, and non-speech utterances from the caller.

In order to make an automated speech system as useful and effective as possible, tight integration is needed between the platform handling the incoming audio from the telephony network, the Voice User Interface (VUI) that manages the prompts, grammars and callflow, and the underlying ASR engine itself. Choosing the right audio/telephony platform, along with careful design of the VUI, can compensate for imperfections in the ASR technology and improve the user experience. This article will address the issues relating to audio integration with the ASR system.

A common problem in speech recognition is identifying when a caller begins or ends speaking. Most audio/telephony platforms use a general form of Voice Activity Detection, (VAD) which is based on a simple audio-level threshold used in voice-activated recording or a supervised call transfer, but this method cannot discriminate real spoken audio from loud background noise, or non-verbal audio (e.g. coughing). In applications that require interaction with the caller, such as self service, a provision must be made to allow callers to interrupt the outgoing prompt. This enables more experienced users to get rapidly through the system to get to the information that they want. This feature is known as barge-in. The trick with barge-in is to allow only a spoken voice to interrupt the prompt, otherwise any cough, clank or other extraneous noise will cut off the prompt and the caller will be left in limbo. So a general purpose VAD approach is not going to work with barge-in and in many cases it needs to be disabled for interactive applications. For VAD to work effectively, it must recognize the difference between the essential speech versus noise, silence and echo.

During a typical IVR session, the caller actually speaks for only a small percentage of the total time. The majority of the call typically consists of prompts being played and pauses while the user considers the options. Systems without VAD will send a continuous audio stream to the ASR engine, which must then process the entire signal, including unnecessary sounds such as silence and noise. In a typical client-server speech system, where the client resides in the telephony server and performs pre-processing of the audio signal before it is sent to the recognition servers, a bottleneck soon develops as the client struggles to dispatch more and more audio samples across the network.

So is there a trade-off between barge-in and scaleability? Not if you consider speech-enabled VAD: Speech-enabled VAD filters out most of the background noise and non-speech audio, only passing valid audio data to the ASR client. By reducing the traffic through the client, more simultaneous calls can be processed by the host CPU, resulting in increased system capacity. In a recent test (detailed in Appendix A), the speech-enabled VAD reduced the number of audio samples transferred to the speech server by 84%, which resulted in a 30% reduction in host CPU load. These additional CPU resources can be used to improve recognition accuracy or to run additional applications (such as VoiceXML gateways) on the host system. Also, because speech-enabled VAD algorithms do discriminate between valid and non-valid utterances, they assist the ASR client in processing barge-in requests.

The next problem for audio processing in speech systems is echo. In contrast to the echo cancellation schemes used in the telephony network and telephone handsets, the purpose of an echo canceller used in ASR systems is to improve barge-in performance. Because of echoes generated in telephone hybrid circuit, audio data passed to the ASR client includes not only speech audio from the caller, but also an echoed signal of the system prompt. For example, when a speech system plays a prompt ("What city please?") and the user responds ("Austin"), the prompt echo can mix with the user's voice, which can trigger an inaccurate recognition ("Boston" instead of "Austin"). By using long-tail echo cancellation, where a replica of the outbound audio data is subtracted from the caller speech data, echo is all but eliminated resulting in more accurate recognition of user responses.

By selectively eliminating noise, silence and echo from the audio stream, the speech-enabled VAD and long-tail echo cancellation capabilities of speech-enabled telephony platforms, particularly where these features are embedded with the telephony interface, can clearly improve both recognition accuracy and overall system performance. Does your speech platform recognize the difference?


Improving Client Performance: How Telephony Boards Make a Difference
In a client-server configuration, line termination and speech processing are effectively decoupled, allowing each to be scaled independently. The client typically runs on the system where the calls terminate, sending preprocessed speech data over a network to one or more servers where the actual speech recognition is performed. This architecture requires careful system engineering to avoid potential latency issues that would impact responsiveness during barge-in, and to efficiently manage the distributed system resources. This networked approach simplifies the addition of recognition servers, and the client system ultimately becomes the limiting factor in determining speech system scalability and performance.

In a typical client-server speech system, the client provides connectivity to the telephone network and performs preprocessing of the audio signal before it is sent to the recognition server(s). This preprocessing reduces the echo caused by outgoing prompts, and performs "endpointing" so that only voice signals – not signals containing just residual echo, background noise, or silence – are transmitted over the network for detailed analysis by the expensive server resources. The DSP resources on the telephony board typically perform the echo cancellation. Although it is critical that the ASR client software performs endpointing to ensure recognition accuracy, a speech-enabled VAD algorithm acting as a first-pass endpointer on the board will significantly reduce both host and bus loading, improving client performance, throughput and thereby, scalability.

A speech-compatible VAD filters out most non-speech audio from each channel. It will eliminate the vast majority of those consisting only of noise or silence - and because the VAD can be integrated with the onboard echo cancellation, residual echo can be eliminated as well. Because the amount of audio data processed by the client is no longer a function of the number of active channels, maximum system channel density can be increased (Figure below).


Designing a Speech-Enabled VAD
In order to provide compatibility with speech systems, several specific features must be incorporated into the design. These are needed to provide flexible operating modes, ensure unaffected operation of the client endpointer, compensate for the limitations of the onboard VAD algorithm, and allow for varying levels of background noise.

Multimodal Operation
To provide the functionality needed to address a variety of applications, the operating modes of the VAD should be configurable in real time. For example:

  • Mode 0: VAD is disabled; a continuous audio signal stream is sent to the host

  • Mode 1: VAD is enabled; voice start/finish events and continuous audio signal stream is sent to the host

  • Mode 2: VAD is enabled; voice start/finish events and windowed audio signal stream is sent to the host

  • Mode 3: VAD is enabled; voice start/finish events and windowed audio signal stream is sent to the host; board stops prompt immediately when barge-in detected by onboard VAD (for faster prompt cutoff)

  • Mode 4: VAD is enabled; voice start/finish events and windowed audio signal stream is sent to the host; VAD threshold adjustable by application


At the start of each call, the client endpointer will typically need to set an initial noise threshold based on an estimate of the background noise. The onboard VAD must pass audio continuously for approximately one second at the start of the call to allow the software to adapt its noise threshold properly.


An energy-based VAD will generally be effective at detecting spoken vowels, but may often miss the beginning or end of utterances that terminate with short, soft, relatively high-frequency consonants, resulting in misrecognition ("Austin" vs. "Boston"). A back-up buffer, typically containing audio samples for the 100-300 msec. prior to the detection of speech by the VAD, makes sure that the weak onset of speech is preserved. A similar hold buffer, approximately 1 second of audio samples following the end of detected speech, allows the client endpointer to adapt to changing levels of background noise.


Adjustable Detection Threshold
Because of the wide variability in the levels of background noise, it's important to allow application control over the VAD detection threshold. In general, the speech client's endpointer will be more effective at discriminating between speech and noise, especially as background noise levels increase.


Telephony Boards Do Matter
Choosing the right telephony/audio platform for automated speech-based services is non-trivial. In order to get the accuracy and performance needed to meet the high expectations of today's demanding users, only platforms that support speech-enabled voice activity detection and long-tail echo cancellation should be considered as the foundation for your speech-based solutions.

About the Authors

Paul Jackson is senior market manager for Brooktrout's enterprise voice and speech market where he is responsible for the development and introduction of enterprise voice and speech products. Keith Byerly is currently senior market development manager for Brooktrout Technology where he is responsible for market development of Brooktrout's speech business segments.

About the Company
Brooktrout Technology is a supplier of media processing, network interface, call control and signal processing products that enable the development of applications, systems and services for both the New Network (packet-based) and the traditional telephone (TDM) network. Their strategy is to partner with customers and collaborate closely with them to help accelerate their delivery of new applications and services, increase their existing business, and expand into new markets.

Today's Tip of the Day - Buying New Technology

Read today's tip or listen to it on podcast.

Published: Friday, September 12, 2003

Printer Friendly Version Printer friendly version

2022 Buyers Guide Business Continuity


CallGuard Remote
A flexible way to take secure, PCI DSS compliant payments from home or remote locations. It’s quick to deploy needs no changes to processes or systems.

CallGuard Remote prevents agents from seeing, hearing or recording card details so, the agent, their screen, and any call recordings are removed from the scope of PCI DSS.

This simple approach means the customer effectively types their own payment information into the agent’s payment screen, but with the card details being shielded from the agent’s view. It’s simple, and highly effective.

OpsTel Services

The SPEED solution solves for service level issues while cost optimizing the environment with automation.

Provides an enhanced way to speed up & optimize invoking temporary agent skills configuration changes into the contact center environment.

Speed allows you to schedule both future changes & temporary changes that auto-revert back to the original state when scheduled time expires.

Speed features:

*Automated / Scheduled Temporary Agent Skills Configuration Management
*Immediate Temporary or Reoccurring Schedule Skills Configuration Changes
*Easy to Use/Operations Administration Focused
*Descriptive Monitoring Activity Dashboard
*Detailed “End to End’ Audit Trail and Perfor...
(read more)


VADS Business Continuity Plan
VADS provides a business continuity plan by providing full outsource services and manage services. we've provided this to several clients. You can contact us for a detailed study case.

Teckinfo Solutions Pvt. Ltd.

InterDialog UCCS
Adapting to the new normal contact center industry has to be ready for work from anywhere agents to maintain business continuity. Even when working from remote locations, the work from home agents or remote agents need to be monitored for smooth customer service operations or effective tele sales.

InterDialog UCCS with its work from home agent ready call center software helps you to have complete control over your contact center operations. Agents can log in from any where , home, office or any other place where they are through their mobile phone or desktop , or even through our ID mobile app . With centralized recording & reporting , you gain visibility of all contact center metrics , and you can manage your center the same way as you were doing when working from office.

About us - in 60 seconds!

Submit Event

Upcoming Events


It is time for 'the Most Prestigious Awards' in the contact center industry!

The Global Top Ranking Performers Awards Are OPEN!

It’s well-known people love recognition. Join us as we share details of the 18th Annual Global ... Read More...
Showing 1 - 1 of 2 items

Newsletter Registration

Please check to agree to be placed on the eNewsletter mailing list.

Latest Americas Newsletter
both ids empty
session userid =
session UserTempID =
session adminlevel =
session blnTempHelpChatShow =
session cookie set = True
session page-view-total = 1
session page-view-total = 1
applicaiton blnAwardsClosed =
session blnCompletedAwardInterestPopup =
session blnCheckNewsletterInterestPopup =
session blnCompletedNewsletterInterestPopup =