Natural Language Generation (NLG), also referred to as text generation, is a subfield of natural language processing (NLP; which includes computational linguistics). For those not familiar with these areas, this page provides a brief overview of what NLG is (and is not), and some of the benefits provided by the type of application that CoGenTex develops.
Natural language processing is a technology which involves converting spoken or written human language into a form which can be processed by computers, and vice versa. Some of the better-known applications of NLP include:
Some of the greatest challenges in NLP center around the analysis of speech or text, which both possess inherent ambiguities - for example, the phrases "night rate" and "nitrate" sound very similar to a computer, and a written word such as "bank" has different meanings (a financial institution, or the edge of a river) which can be difficult to choose between even with contextual clues.
The quality of results obtained in NLP applications often depends on the richness of the representation that the system builds for the language it is processing. For example, most commonly available text-to-speech synthesizers produce spoken output that can be difficult to follow, because it does not mimic the varied intonations that humans use when speaking. Higher-quality systems will perform a syntactic and/or semantic analysis of the text, in order to generate intonations as humans would, according to the "information structure" of the text.
Natural language generation is, in a sense, the opposite of NLP applications such as voice recognition and grammar checking, since it involves converting some form of computerized data into natural language, rather than the other way around.
NLG is to be distinguished from superficially similar techniques, usually referred to by names such as "report generation", "document generation", "mail merging", etc. These techniques involve simply plugging a fixed data structure such as a table of numbers or a list of names into a template in order to produce complete documents. Due to their limited flexibility, they tend to produce rigid text, often containing grammatical errors (for example, "you have one selections remaining").
NLG, on the other hand, uses some level of underlying linguistic representation of the text, in order to ensure that it is grammatically correct and fluent. Most NLG systems include a syntactic realizer (our RealPro product is an example), which ensures that grammatical rules such as subject-verb agreement are obeyed; and a text planner (such as one created in our Exemplars framework), which decides how to arrange sentences, paragraphs, and other components of a text coherently. Practically any variety or style of text can be produced, ranging from a few words to an entire document, and various output formats can be generated, such as RTF text or HTML hypertext. Text generators can even produce equivalent texts in multiple languages simultaneously, making them an excellent alternative to automatic translation in many domains. A text generator can also produce synthesized or phrase-concatenated speech with a "concept-to-speech" approach, which uses semantic information to generate correct intonation, unlike most current text-to-speech technology.
Perhaps the first use of NLG was in machine translation systems, which analyze a text from a source language into a grammatical or conceptual representation, then use that to generate a corresponding text in the target language. Another early application was in expert systems, where the formal representations of rules and facts could be used to generate texts which explained the system's reasoning.
CoGenTex was founded on the realization that NLG could be the basis of a somewhat wider range of applications than these traditional ones. We have identified and pursued applications in several domains, which involve generation from diverse types of data, but which usually possess most of these key characteristics:
Not all potential applications have these characteristics - for example, certain labor-intensive documents are probably best written by hand if they only need to be produced in small numbers, and never need to be updated. And alternatives to text, such as graphical displays, can be more suitable for some types of information and some audiences.
As with all types of automation, NLG should be applied so as to allow humans and computers to complement each other by each doing what they are best at. Just as a spell-checking program can excel at spotting routine misspellings, yet be annoying to use with documents containing proper names or obscure vocabulary, an NLG system can be more of a hindrance than a help if it tries to solve all of the problems all of the time. This is why CoGenTex's methodology for building text generators is based on these fundamental guidelines:
As computer technology reaches ever more users in ever more domains, we anticipate many more interesting and practical applications for NLG. The advent of the World Wide Web, in particular, presents a wealth of opportunities, as more people can go to more places for more kinds of information than ever before. NLG technology will allow the most up-to-date information to be communicated to the broadest audience, with the highest accuracy and quality of presentation.