The Natural Language Processing is the field of knowledge of Artificial Intelligence that deals with the investigation of how to communicate machines with people through the use of natural languages, such as Spanish, English or Chinese.
Virtually any human language can be treated by computers. Logically, limitations of economic or practical interest mean that only the most spoken or used languages in the digital world have applications in use.
Think of how many languages Siri (20) or Google Assistant (8) speak. English, Spanish, German, French, Portuguese, Chinese, Arabic and Japanese (not necessarily in this order) are those that have more applications that understand them. Google Translate is the one with the most languages, exceeding the hundred ... but there are between 5,000 and 7,000 languages in the world.
Human languages can be expressed in writing (text), orally (voice) and also by signs. Naturally, the PLN is more advanced in the treatment of texts, where there is much more data and are easier to obtain in electronic format.
The audios, although they are in digital format, must be processed to transcribe them in letters or characters and, from there, understand the question. The answer process is the reverse: first the sentence is elaborated and then the voice is synthesized.
By the way, the artificial voice every time sounds more human, with tonal and prosodic inflections that imitate the human production.
Models for natural language processing
Treating a language computationally implies a process of mathematical modeling. Computers only understand bytes and digits and computer code programs using programming languages such as C, Python or Java.
Treating a language computationally involves a mathematical modeling process
Computational linguists are responsible for the task of "preparing" the linguistic model so that computer engineers can implement it in an efficient and functional code. Basically, there are two general approaches to the problem of linguistic modeling:
Logical models: grammars
Linguists write rules of recognition of structural patterns, using a specific grammatical formalism. These rules, in combination with the information stored in computer dictionaries, define the patterns that must be recognized to solve the task (search for information, translate, etc.).
These logical models are intended to reflect the logical structure of language and emerge from the theories of N. Chomsky in the 1950s.
Probabilistic models of natural language: based on data
The approach is the other way around: linguists collect collections of examples and data (corpus) and from them, the frequencies of different linguistic units (letters, words, sentences) and their probability of appearing in a given context are calculated. By calculating this probability, one can predict what the next unit will be in a given context, without the need to resort to explicit grammatical rules.
It is the paradigm of "automatic learning" that has been imposed in the last decades in Artificial Intelligence: the algorithms infer the possible answers from the data observed previously in the corpus.
Components of natural language processing
Next, we see some of the components of natural language processing. Not all the analyzes that are described are applied in any PLN task, but it depends on the objective of the application.
Morphological or lexical analysis. It consists of the internal analysis of the words that form sentences to extract slogans, inflectional features, compound lexical units. It is essential for basic information: syntactic category and lexical meaning.
Syntactic analysis. It consists of the analysis of the structure of the sentences according to the grammatical model used (logical or statistical).
Semantic analysis It provides the interpretation of the sentences, once the morphosyntactic ambiguities have been eliminated.
Pragmatic analysis It incorporates the analysis of the context of use to the final interpretation. This includes the treatment of figurative language (metaphor and irony) as the knowledge of the specific world needed to understand a specialized text.
A morphological, syntactic, semantic or pragmatic analysis will be applied depending on the purpose of the application. For example, a text-to-speech converter does not need semantic or pragmatic analysis.
Publish Date: June 19, 2018 1:28 PM