Cookie Preference Centre

Your Privacy
Strictly Necessary Cookies
Performance Cookies
Functional Cookies
Targeting Cookies

Your Privacy

When you visit any web site, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences, your device or used to make the site work as you expect it to. The information does not usually identify you directly, but it can give you a more personalized web experience. You can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, you should know that blocking some types of cookies may impact your experience on the site and the services we are able to offer.

Strictly Necessary Cookies

These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site may not work then.

Cookies used

Performance Cookies

These cookies allow us to count visits and traffic sources, so we can measure and improve the performance of our site. They help us know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies, we will not know when you have visited our site.

Cookies used

Google Analytics

Functional Cookies

These cookies allow the provision of enhance functionality and personalization, such as videos and live chats. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies, then some or all of these functionalities may not function properly.

Cookies used




Targeting Cookies

These cookies are set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant ads on other sites. They work by uniquely identifying your browser and device. If you do not allow these cookies, you will not experience our targeted advertising across different websites.

Cookies used


This site uses cookies and other tracking technologies to assist with navigation and your ability to provide feedback, analyse your use of our products and services, assist with our promotional and marketing efforts, and provide content from third parties


Here are some suggested Connections for you! - Log in to start networking.

How Does a Computer Make Voices? - Brett Clawson - Blog

How Does a Computer Make Voices?

Computers that people can have a conversation with have been a staple of science fiction stories for decades. Modern computers have not yet reached the stage of being able to hold a conversation; however, they can produce speech. How do they do it?

Speech Synthesis

The process of producing speech on a computer is called speech synthesis. Many modern computer applications, from automated GPS navigators to business VoIP services, take advantage of this ability. Speech synthesis is a type of computer output. The computer or other device producing the speech takes words input into the system and reads them back to the user. 

Sponsor message - content continues below this message

2022 '17th annual' Global Contact Center World Awards NOW OPEN

Enter your Center, Strategy, Technology Innovation, Teams and Individuals into the ONLY TRULY GLOBAL awards program - regarded by many as being like the Olympics for the Contact Center World! Join the best from over 80 nations and compete for the most prestigious awards out there!


Content continues ….

Speech Synthesis Step One: Pre-processing

Speech synthesis is a three-step process. The first step is called pre-processing or normalization. In this stage, the computer analyzes the different ways the given text could be read and determines the correct one for the context. Numbers, times, dates, abbreviations, special characters and acronyms are turned into words. Because computers don't have the same ability as humans to decide how to pronounce something based on context, neural networks or statistical probability techniques are used. For example, if a computer is trying to determine whether a number represents a year or a quantity, it may look for clues in the text, such as the word "year." 

Additionally, the computer must attempt to determine the correct pronunciation for homographs, which are words that look the same but are pronounced differently depending on what they mean. To accomplish this, the computer looks for context clues, such as whether a sentence is written in the present or past tense.

Speech Synthesis Step Two: Phonemes

In this step, the speech synthesizer determines which sounds make up the words that need to be spoken. These sounds are called phonemes. A basic approach to this step is to provide the computer with a list of dictionary words and accompanying phonemes; however, this method does not produce very natural sounding speech, because when humans speak sentences, the phonemes may sound differently based on several factors. This is a concept called prosody. 

An alternative method is to divide words into graphemes, which are the individual letters or syllables contained in a word, and produce phonemes based on a simple ruleset for each grapheme. This has the advantage of making it possible for the computer to read any word, including made-up words, words in foreign languages, proper names and technical terms. The main drawback is that some languages, such as English, have many words that are pronounced differently from how they are written.

Speech Synthesis Step Three: Sound

Computers produce speech sounds in three main ways. The first is to use a recording of a human speaking the phonemes. In the second, the computer generates the phonemes by using basic sound frequencies. Finally, some computers can mimic the human voice.

Speech synthesizers that use recordings of the human voice are preloaded with short clips of human sounds that the computer arranges to form words. This is called concatenative speech synthesis. This is the most natural-sounding type of speech synthesis; however, it is limited to a single voice and language. 

Formant speech synthesizers generate speech based on the 3-5 key sound frequencies generated by the human voice. These synthesizers can say anything because they are not limited to a pre-loaded library of sounds.  

Articulatory synthesizers model the human voice. This is the most complex method and should be capable of producing the most natural-sounding speech; however, computer technology has not yet reached the level where machines can model the human vocal apparatus well enough to produce natural-sounding speech.

New uses for speech synthesizers are being invented all the time and as the technology improves, talking computers are likely to become a more common part of everyday life. Today, even the most natural-sounding of computer-generated speech is usually distinguishable from the real thing, but someday that will likely no longer be the case.

Publish Date: November 30, 2021 2:10 PM

About us - in 60 seconds!

Newsletter Registration

Please check to agree to be placed on the eNewsletter mailing list.

Latest Americas Newsletter
both ids empty
session userid =
session UserTempID =
session adminlevel =
session blnTempHelpChatShow =
session cookie set = True
session page-view-total = 1
session page-view-total = 1
applicaiton blnAwardsClosed =
session blnCompletedAwardInterestPopup =
session blnCheckNewsletterInterestPopup =
session blnCompletedNewsletterInterestPopup =