Cookie Preference Centre

Your Privacy
Strictly Necessary Cookies
Performance Cookies
Functional Cookies
Targeting Cookies

Your Privacy

When you visit any web site, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences, your device or used to make the site work as you expect it to. The information does not usually identify you directly, but it can give you a more personalized web experience. You can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, you should know that blocking some types of cookies may impact your experience on the site and the services we are able to offer.

Strictly Necessary Cookies

These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site may not work then.

Cookies used

ContactCenterWorld.com

Performance Cookies

These cookies allow us to count visits and traffic sources, so we can measure and improve the performance of our site. They help us know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies, we will not know when you have visited our site.

Cookies used

Google Analytics

Functional Cookies

These cookies allow the provision of enhance functionality and personalization, such as videos and live chats. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies, then some or all of these functionalities may not function properly.

Cookies used

Twitter

Facebook

LinkedIn

Targeting Cookies

These cookies are set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant ads on other sites. They work by uniquely identifying your browser and device. If you do not allow these cookies, you will not experience our targeted advertising across different websites.

Cookies used

LinkedIn

This site uses cookies and other tracking technologies to assist with navigation and your ability to provide feedback, analyse your use of our products and services, assist with our promotional and marketing efforts, and provide content from third parties

OK
[HIDE]

Here are some suggested Connections for you! - Log in to start networking.

EXECUTIVE MEMBER
Ikhwal Sidiq
Assistant Manager Trade and Remittance Services
408
MEMBER
Andres Barrios
Cloud Campus Regional Director
2
MEMBER
Thamer Noori
Director of Industrial Security and Safety Dept.
13
MEMBER
David Chacon
Global Growth & New Operating Models Director
50
MEMBER
Jason Taylor
Officer of County 311 Services
0

Article : Squeezing The Tube Harder – Speech Recognition

Earlier this summer I stood on the roof of Lernout and Hauspie's headquarters in Ieper, together with Jo Lernout, surveying the Flanders language valley. It led me to ponder on how far the industry has come in the twenty or so years I have been involved in it.

The Flanders Language valley contains not only a large HQ for L&H, but also several smaller buildings for start up ventures, clustered around a training centre. This visionary enterprise symbolises a number of stages of a maturing industry. The size of the HQ for one of the first generation speech and language companies illustrates the maturing of a fledgling industry into the mainstream of Information Technology and Telecommunications. As is often the case with a maturing industry, it spawns other companies tackling specific market and technological opportunities, represented by the cluster of small buildings in the Flanders Language Valley, many yet to be completed and occupied. One of the first to be occupied will be the Ecommerce joint venture between L&H and Intel.

The training centre initiative is a bold one and is a recognition of the specific skills needed to make speech and language technology a success. The industry has reached a stage where the need now is not so much for research staff as for application engineers. The need is for people who understand the particular requirements for good interface design and knowledge engineers who can extract and encode the vast knowledge bases which are typical of most advanced speech and language applications. This activity of course has to be replicated in the language of each target market. So concerned are Jo Lernout and Pol Hauspie about the lack of expertise available, they are investing their own money in the establishment of a number of training and education centres around the world.

The maturing of the speech technology industry has in part been brought about by a combination of dramatic advances in computing power, memory capacity and decreasing costs, together with perhaps less dramatic incremental improvements in the algorithms. In fact looking back over several decades it is possible to chart a number of step function improvements that have occurred in speech recognition performance, almost entirely due to the availability of adequate memory for speech training data storage and processor power to cope with statistically based algorithms.

In one of my earlier columns I commented on how some researchers regarded text to speech synthesis (TTS) development as like squeezing toothpaste out of a tube and recognition development as more like putting it back again. This was a reflection on how difficult it seemed to create large vocabulary continuous speech recognition systems compared to the achievements of unlimited vocabulary TTS. At the time I challenged this view on the basis of the lack of progress made in improving the naturalness of speech synthesis systems, or putting it another way, achieving that last 10% needed to create really usable TTS that members of the public would be happy to listen to. The lack of naturalness has been largely responsible for the limited application of TTS to date. I believe that a step function improvement has however recently been achieved in speech synthesis, and in a way similar to speech recognition, it is almost entirely due to the availability of cheap memory.

Speech synthesis was originally based on the pioneering work of researchers such as Gunar Fant and Dennis Klatt. This relied on a model of human speech production where a very small number of parameters could be used to drive an electronic synthesiser that mimicked the characteristics of the human vocal tract. Rules are required to translate ordinarily spelt sentences into a string of these parameters. More recent techniques have used small speech segments, so called diphones, derived from real speech recordings. These diphones are in fact pieces of sound which span two halves of the basic units of speech – phonemes. By including the sound which occurs at the boundary of two sounds such as two different vowels, problems of synthesising these transitions are overcome. Because there are many different transitional sounds, depending on the context, a large inventory of diphones is needed. These diphone segments are concatenated together, using pronunciation rules, to produce a word or sentence. This approach requires much more memory that the original synthesis techniques, since encoded real speech segments have to be stored.

The availability of large and low cost memory has pushed this approach a step further and now a new generation of synthesiser is being launched that significantly improves naturalness. A much larger inventory of speech segments than is used in the diphone approach, are joined together to produce the required word or sentence. The use of larger as well as different real speech segments has resulted in a much more natural sounding TTS than the widely adopted diphone approach. L&H have released the first commercial product that I am aware of using this approach, and the improvement in speech quality is impressive. The initial release is for American English with other languages to follow. AT&T has an interactive demonstration of their own technology, also using a similar approach on their web site.

Although some work is still required on the pronunciation rules, particularly names, the improvement in quality that this approach delivers should result in an important step forward for the public deployment of TTS in both PC and network applications. A growing number of companies are now offering over the phone e-mail reading products and services as well as information from the internet. With the large number of subscription free ISP's in the UK market, all seeking to differentiate their products, improved TTS quality could be one significant factor in customer acceptance for those wishing to offer such services.

Jeremy Peckham has over 20 years experience in the voice processing industry as a scientist and consultant and latterly as a businessman and entrepeneur. He began his career with the Royal Aircraft Establishment, spent 12 years with Logica and then founded and ran the UK speech specialist Vocalis, floating the company on the London Stock Exchange in 1996. Currently managing director of Strategis Consulting Ltd and Chairman of The Speech Recognition Company Ltd.

Today's Tip of the Day - Keep Cost In Perspective

Read today's tip or listen to it on podcast.

Published: Monday, December 2, 2002

Printer Friendly Version Printer friendly version

About us - in 60 seconds!

Join Our Team

Industry Champion Award Leaderboard

Most active award (top 10) entrants in the past 48 hours! - Vote for Others / About Program
Submit Event

Upcoming Events

The 19th AMERICAS Annual Best Practices Conferences are here! Meeting Point for the World's Best Contact Center & CX Companies Read More...
 31813 
Showing 1 - 1 of 3 items

Newsletter Registration

Please check to agree to be placed on the eNewsletter mailing list.
both ids empty
session userid =
session UserTempID =
session adminlevel =
session blnTempHelpChatShow =
CMS =
session cookie set = True
session page-view-total = 1
session page-view-total = 1
applicaiton blnAwardsClosed =
session blnCompletedAwardInterestPopup =
session blnCheckNewsletterInterestPopup =
session blnCompletedNewsletterInterestPopup =