NLG, or natural language generation, is a subfield of Artificial Intelligence and Computational Linguistics. Since NLG technology enables the automation of routine document creation, it is an essential part of the Immersive Automation project. Mark Granroth-Wilding is a research associate at the Department of Computer Science at the University of Helsinki, as well as one of the experts on the Immersive Automation (IA) team. As he specialises in Artificial Intelligence, and in particular Natural Language Processing, he will define the basics of NLG in this blog post.
“NLG consists of techniques to automatically produce human-intelligible language, most commonly starting from data in a database. It can be thought of as a process of turning a symbolic representation of data into human language,” Mark Granroth-Wilding explains.
The essential idea of the Immersive Automation research project is to create means to produce news in a way that humans cannot do, for example hundreds or thousands of articles all at once. NLG provides the tools to produce language or text in such a large volume.
“You could of course just supply data to audiences in a raw format – without NLG – but we want to present information in an easier, more understandable format.”
In recent years, we have seen a massive growth in the use of statistical methods and machine learning, including in NLG. However, Granroth-Wilding points out that this has not yet been seen in many of the practical applications of NLG.
“This is what makes NLG a hot topic, and this is also the reason why we are looking into this in the IA-project.”
“Our focus is to work out how state-of-the-art statistical NLG methods can be incorporated into real journalistic processes.”
While some forms of news automation have been introduced into newsrooms around the world, the systems have so far been language dependent and template based. This means that the systems rely heavily on human contribution and focus mainly on languages spoken by large groups of people. One of the most widely used systems is Wordsmith, developed by Automated Insights in Durham, North Carolina. Associated Press, among others, uses the system.
Improving the state-of-the-art
“Wordsmith would probably be the most prominent example of NLG in automated text production. However, what we want to do here is something even more sophisticated. There are no such examples currently where a system is capable of independently producing highly variable news texts.”
Currently, the automatically produced news is also limited to areas with large amounts of numeric data, such as sports news and earnings reports. The numeric data is easy to combine with text templates. However, the purpose of the Immersive Automation project is to take NLG and news automation even further.
“Our focus now is to work out how state-of-the-art statistical NLG methods can be incorporated into real journalistic processes. Working out how these techniques can be made intelligible to newsrooms, as well as reliable in accurately conveying their source data, is the big challenge that we’re undertaking in this project.”