Speaker verification and identification (or SVI) used to seem like the stuff of science fiction. But the truth is, it is now fast becoming science fact. SVI is part of the growing area of biometrics, the measurement/analysis of biological characteristics e.g. fingerprints and retina scans.
When a user enrols with the SVI system, a 'voiceprint' is created that is unique to that person. This is subsequently used to identify people, or more commonly, to verify that a person is who they claim to be. It's worth pointing out at this stage that SVI should not be confused with speech recognition where the words spoken are recognised – the identity of the speaker is irrelevant.
However there will be cases where the two technologies are used together. For example, a name and password can be recognised while at the same time the person speaking the words has their identity verified. But we will come back to this later.
SVI – how accurate is it?
To start with, SVI performance is indicated by an 'equal error rate' (or EER). This measurement gauges the accuracy of a system tuned to give an equal number of false acceptances (passing an impostor) and false rejections (failing a legitimate user).
In reality the system will be skewed towards giving fewer false acceptances or fewer false rejections depending on how it is being used. A military application may demand that the system is tuned towards minimising false acceptances, and this will naturally have the effect of increasing the number of false rejections.
Conversely, a telephone bank may want to reduce the number of false rejections (because they are annoying to customers) and in doing so will accept a higher proportion of false acceptances.
SVI – how secure is it?
What happens if someone records the speaker with a high quality audio device and then plays it back to the system? The solution here is to combine speaker verification with automatic speech recognition in a dynamic security process. A speaker might be asked to repeat a randomly chosen set of utterances, and these would have to be recognised by the speech recognition system before being passed on to the speaker verifier. In this way the impostor with the tape recorder is beaten.
SVI – how does it work?
Let's look at them in a little more detail.
Enrol – this is when a new speaker is introduced to the system. The speaker is given a unique identifier (such as a PIN) and some information is collected that can be used to aid identification in the future. This information is stored in a database.
The words and/or phrases spoken during the above procedure are recorded and this speech is used to create an initial voiceprint. The speaker can then go through an iterative training process with the objective of converging the voiceprint on the speaker's unique voice characteristics. During this process the speaker will talk a number of times and after each utterance the voiceprint is updated. Enrolment should take no more than a few minutes.
Recognise – after enrolment the speaker is 'known to the system' and may be identified or verified by it. To use the system for verification purposes, the speaker will make an identity claim. Once the system knows who the speaker is claiming to be, it will compare their speech with the voiceprint associated with that identity. If the match is close enough the speaker will pass. Identification is different to verification in that the speaker does not make an identity claim. The system compares their speech with all the voiceprints in the database and returns the identity of the closest match.
Modify – the voiceprint can be tuned incrementally and with regular use of the system an existing speaker's voiceprint will be continually updated with new speech data. This will prevent the voiceprint from becoming outdated as the speaker ages, since it will adapt to any gradual changes in the speaker's voice characteristics.
SVI - who will use it?
The financial sector is an obvious market for customers checking account balances and making transfers for example, but it can also be used for a host of other applications. For example, it's ideal for companies that need to give employees secure access to intranets, extranets and corporate applications. It also has great potential in centralised government e.g. it can be used to give certain staff access to sensitive information and for parolee tracking.
Under some conditions of parole, an individual may be required to call into an operator to confirm their whereabouts. The offender would then be asked to speak a randomly selected series of digits, which is matched with the existing voiceprint.
Using randomly selected digits prevents the individual from recording the password sequence and playing it back. This also removes the requirement for human interaction – i.e., a live agent confirming the parolee's identity.
SVI – can it save money?
In fact, recent research shows that an organisation handling one million calls a month could be incurring as much as £2.2m in user authentication costs annually. By reducing the amount of time that staff spend actually taking callers through the identification process and confirming that they are who they say they are, companies could make vast savings. Then staff can be used for other revenue generating exercises.
No wonder that speaker verification is taking off. After all, it's relatively easy to steal a PIN. It's practically impossible to steal a voice.
About the Author
About the Company
Published: Tuesday, September 9, 2003
I am checking out all the amazing and daily updated content on ContactCenterWorld.com and networking with professionals worldwide
Send To Friends Post On My Wall