A summary of the past year

The project is moving forward and our team has been busy with their research. As the academic year is ending, we want to provide you with some examples of where our work has been presented during the past year.

Presentations during the past year

Last fall, we gave presentations at several conferences, including Verkkoviestijöiden päivät 21.–22.9.2017 in Vantaa and the IPTC seminar on news automation in Barcelona 8.11.2017. We also attended the conference Computation + Journalism at Northwestern University in Chicago in October.

In September 2017, two peer-reviewed academic papers were presented on the technical aspects of automated news production. The first of these papers, Data-Driven News Generation for Automated Journalism was presented on September 4–7 in the 10th International Natural Language Generation conference (INLG) in Santiago de Compostela, Spain. INLG is the conference of the Association for Computational Linguistics (ACL) Special Interest Group on Natural Language Generation (SIGGEN). The second paper, Finding and Expressing News from Structured Data was presented September 20–21 at the 21st International Academic Mindtrek Conference in Tampere, Finland.

The election news generation system was also presented to an audience of Nordic news media industry personnel at the 5th NxtMedia Conference in Trondheim, Norway on November 15th. It was further demonstrated to domestic press as well as industry and academic personnel on the AI Day organized jointly by University of Helsinki and Aalto University.

On 28 February 2018, Carl-Gustav Lindén gave a lecture on news automation to around 60 students and staff at the Asian College of Journalism in Chennai, India. The active discussion afterwards revealed that news automation is known as a concept in India, but there are no applications used at the moment. There was also the usual discussion about journalists losing jobs because of automation. Carl-Gustav Lindén also moderated a panel at the Nordic Data Journalism Conference NODA2018 about news automation. The four participants came from Norway, Sweden and Finland  and the common conclusion was that that news automation is only in the beginning of the development.

Carl-Gustav Lindén presented the paper Creating value in the age of algorithms – a Finnish perspective on newsroom strategies at the World Media Economics and Management Conference 2018 (WMEMC2018) in Cape Town, South Africa in May 2018. The paper, with Stefanie Sirén-Heikel as the main author, was one of few dealing with algorithms at the conference, which attracted around 250 researchers from around the world.

What’s going on right now?

In 2018, we have been investigating methods for improving data analysis and the language generation process. The results of these methods will be demonstrated in Valtteri 2.0.  Valtteri 2.0 will exhibit a generation system for news on crime statistics. This system will exhibit an improved quality of language, improved analytical capabilities and the generation of more complex news articles from significantly richer data. This application domain also further highlights the computational power of our system as contrasted to human journalists. Researchers Asta Bäck and Magnus Melin have been occupied with testing Valtteri 2.0. Last submission day was 13th of May and an analysis of the results is on-going.

Melin held a presentation about Valtteri at VTT’s internal Friday madness and has presented his work at a VTT business area meeting, where we gained the insight that the topic has good potential, and comparatively few players on the market.

 

 

 

 

 

 

Sensor data and ambient news

If we are to believe data journalist and data consultant Marco Maas, the future of journalism lies in our surroundings.

Marco Maas and xMinutes believes that ambient news will have a key role for the future of news.

Most of us would probably not spend the profits from a few successful Bitcoin sales on equipping our homes with various sensors, but that is in fact what German data journalist Marco Maas did some time ago. In order to develop a new ground breaking way of providing news services, he also gave up most of his privacy, and allowed his colleagues to follow the data collected by the sensors in his home.

What Maas and his colleagues at the Hamburg based company xMinutes hope to achieve is a news service, which based on different data sets provides a selection of the most interesting news for its users.

“Our aim is to implement a context API, which can be used by ourselves but also publishers”, Maas says.

Currently xMinutes works together with the news agency Deutsche Presse-Agentur DPA analysing news produced by the agency and building an understanding of the news cycle.

“We are in the beginning of the process, but we can already tell that a lot of content can be left out. We are far away from our machines understanding the real topics, but this data helps us identify the content which can be left out.”

Another feature xMinutes is contemplating to develop is an added filter, which places various news outlets and journalists on a political spectrum.

“Although actually we have the most promising results from the simplest algorithmic solutions. So if a user is interested in politics, we just give them more news about politics in general.”

“I don’t think publishers with a regional concentration will gain success by simply launching an app. Their business model will have to include getting their content to other apps, and places where the attention already is”, Maas says.

He recommends making the content as portable as possible, publishing the content on as many platforms as possible, and trying to be as relevant as possible to the users. Maas also predicts that the Google assistant will become the most powerful platform.

“We just don’t see a point in trying to compete against Google or Facebook. You can certainly try, but I think we should try to focus on other topics, and work for a long term solution, which can then be incorporated to those big platforms as well.”

One of the insights that he has gained during the process is the importance of meta data.

“If you have a lot of meta data in your articles, even Facebook and such can understand your content better, and give it greater visibility. At least in Germany this is a problem, not having compatible meta data.”

xMinutes is also confident that so called ambient news will have a key role for the future of news. By ambient news Maas refers to the possibility of incorporating news or other preferred content in our surrounding.

“It can be speakers, displays, or something else”, Maas explains.

A key finding after meeting with a test group of fourteen people was that the audience does not want to be disturbed.

“They only wanted information when it is important for them, such as traffic information on their way to work.”

The test group also revealed that the audience is willing to give out their personal information to a news service if they receive something beneficial in return.

“If our system could give personalised suggestions based on what the user had been reading throughout the day, they were fine with giving their location and other information to us. The discussion within journalism however seems to be that the readers are not willing to give out their data.”

Maas encourages news organisations to a dialogue with the audience and users in order to truly understand their needs.

“Google and Apple are looking at the context, but journalists are not. We should look at new situations where our content can be interesting, say bathroom reading in the morning and short news breaks during the day.”

He says there are a thousand places to reach an audience – we just have not thought about them yet.

 

Text: Laura Klingberg

 

 

 

“Robot Journalism”: The damage done by a metaphor

Carl-Gustav Lindén

I am at yet another conference, hearing yet another presentation of “robot journalism”. I realise that my campaign against this metaphor for news automation was lost long ago. With the use of pictures depicting robots writing on keyboards, sometimes wearing a hat where it says, “Press”, graphical designers, editors, and researchers have managed to establish the most damaging mental picture. By framing news automation as robots coming to take the jobs of journalists, we have managed to maybe destroy, or at least delay, our move towards a future of augmented journalism, where smart machines are helping reporters do their jobs better. (This is what we call a robot.).

In this article, I’ll talk about augmented intelligence – not artificial intelligence, a divide that was described wonderfully by my friend John Markoff. I’ll draw upon my recent scientific articles, published by Digital Journalism, Decades of automation: Why are there still so many jobs in journalism?, and in the Journal of Media Innovations, Algorithms for journalism: The future of news work, where I claim that media and journalists should embrace computational thinking to be able to reap the fruits of new technology.

You might think that there is a lot going on, at least based on the massive publicity about the cooperation between the legacy news agency Associated Press in New York and the software company Automated Insights in Durham, North Carolina. Together, they have created a system that automatically generates earnings reports for AP’s customers. It is simple and works beautifully, relieving financial reporters from the boring and tedious work of digging through financial reports, and getting totally bugged-down by the “earnings seasons” that last for several weeks four times a year.

Francesco Marconi at AP has written a nice guide to automation for those who want to follow progress in this area and there are experiments going on at many news agencies where this actually saves many resources.

Marconi also states something that should be obvious to us all: not all journalistic work should be automated, but we would be stupid to not explore these new opportunities that advances in natural language generation gives us.

However – and this is a view based on dozens of research interviews I have done across Europe and the United States, as well as discussions at conferences such as GEN Summit, Computation + Journalism, Nicar, ICA, and WAN –IFRA Digital Media Europe – there is not much going on in this field. AP is one of the few cases. Another, is United Robots in Sweden, which provides newspapers companies with automated soccer coverage (have a look here). Narrative Science has also been doing this for several years now. But besides these showcases, not much.

My talks with representatives from service providers of news automation tools, who have negotiated with media companies, paint a depressing picture of the mental state of media companies. They are the worst possible customers. They are unable to make decisions, have no financial resources, are not prepared to invest in new technology, are always looking for the low-hanging fruit, and are content with the fact that their peers are not doing any investments either.

“Media companies are way behind in the race towards augmented intelligence and, in a situation where they really need to invest resources, they are holding back.”

It’s no wonder software companies are looking for better customers in the financial industry or ecommerce. At least they get a lot of free publicity, which is good for marketing towards other potential customers. And looking at the financial statements of media companies gives you a grave picture of how little investment is put into digital transformation. “We have given up on media customers”, “never again work without getting paid”, and so forth, are things I frequently hear. Media companies are way behind in the race towards augmented intelligence and, in a situation where they really need to invest resources, they are holding back.

If we take journalists, they are more than happy to dismiss the potential of news automation. This is somewhat strange, considering how many functions in any newsroom are automated already, beginning with word processing and photo editing. Walk in to a television studio and be amazed by the level of automation. In reality, the latest development should be regarded as just another step in the newsroom’s human–computer advancement. There are no signs that automation has taken away any journalist’s jobs; instead, journalists are performing tasks that previously were assigned to non-editorial specialists, such as typesetters, telephone operators and darkroom assistants, which now have all but disappeared from the editorial offices.

Automation or computer anxiety is certainly not a new thing in either knowledge work generally, or journalism in particular. Aristotle, Queen Elisabeth I, the Luddites, James Joyce and John Maynard Keynes were all concerned with the impact of technology on employment (for more information on automation anxiety, click here).

Automation anxiety should not be equated with worries about what smart machines will be able to perform in terms of surveillance, or coercion and the balance of power. These concerns are justified as the potential threats are real. However, that should be kept separate from the “jobs lost” debate. Here, the damage done by the metaphor “robot journalism” cannot be overstated. Please stop using it.

Carl-Gustav Lindén

The text was originally published on the website Data Driven Journalism.

Improved News Generation Bot, Valtteri 2.0 – Soon Reporting Both Crime and Election News

During early Summer of 2018, the Immersive Automation project will debut Valtteri 2.0. This will be an improved version from the election news generation bot that we released in April 2017 (see http://www.vaalibotti.fi). In addition to language improvements, Valtteri 2.0 will showcase its ability to generate news articles in a new domain, i.e., crime statistics, as well as be able automatically generate visualizations to go with the article. For example, Valtteri 2.0 will be able to write an article on the current state and on interesting trends of motor vehicle theft statistics in any municipality in Finland.

Crime has always interested the public.  On a typical day, crime and justice stories make up 15% of the reported news (Katz, 1987).

Often, however, crime news reported in newspapers might give readers misleading, exaggerated, or biased notions of crime. In other words,, news does not always present crimes in the proportions in which they are actually committed (Graber, 1979), whereas looking at data in context may give give a completely different picture.

Using data to paint an accurate picture motivated this second version of

Mockup of Valtteri 2.0 (Source of text and graphic is Statistics Finland)

Valtteri. The prototype is currently being implemented and is planned to be ready in early Summer of 2018. Similarly to Valtteri 1.0, version 2.0 will take in structured data, this time extracted from Statistics Finland, analyze the data, and generate  hundreds of thousands of news article  as a result – an impossible feat for human journalists. In addition, users will still be able to select the news they would like to read and be able to interact with the included visualizations.

Vatteri 1.0, showed us the possibilities and gave us experience in automatically generating natural language news articles from structured data. However, no systems exist which we know of, that automatically generate news from criminal offence statistics, let alone in  multiple languages. Stay tuned for the release of Valtteri 2.0!

References:

  • Katz, J. (1987). What makes crimenews’? Media, Culture & Society, 9(1), 47-75.
  • Graber, D. A. (1979). Is crime news coverage excessive? Journal of communication, 29(3), 81-92.

Will robots write the news a hundred years from now?

The original text for this blogpost is an article that post-doc researcher Lauri Haapanen wrote for The Institute for the Languages of Finland.

The automation of news has been in the pipeline for decades. However, it still humans that are writing the news. Why is that so?

The most apparent challenge for automation is language. Algorithms are already able to conjugate words successfully. However, the subtle nuances of human language do not conjugate in conditional sentences of “if A, then B”. This limitation makes the language stiff and in the long run rather monotonous.  An even bigger problem is content. Algorithms are producing numeric and highly structured result data from companies, sports and elections. This enables news about these subjects to be successfully automated, including in Finland and in Finnish. However, a real scoop is all about new, unexpected, and hard-earned information. A pre-coded algorithm cannot get the grip of such issues.

Lauri Haapanen

Thirdly, the hesitation of media companies and software developers is hindering the development. “One would imagine that there is a lot going on in the industry,” says news automation researcher Carl-Gustav Lindén, “but with a few exceptions, there really isn’t”. Technology itself is not a foreign issue in the field of editing, as the newsrooms are full of it. However, the talk of “robotic journalists” has frightened human publishers, although “there is no sign that development in automation would have reduced journalists’ work,” Lindén recalls. “We should rather see this development as a step forward in the co-operation between journalists and technology.”

It is certain that “robots” will not be writing analytical and engaging stories in a matter of years or even decades. It is also certain that the collaboration between people and software will be developed. A few Finnish editorial offices are already locating potential news topics from public protocols with the help of algorithms. Why could the same software not also compile background material, reveal hidden correlations between distant variables, produce copies and make different versions of ready texts for diverse distribution channels?

Given the speed at which technology is developing, predicting a hundred years in the future feels quite ridiculous. We are still going to need to select and communicate news a hundred years from now. Perhaps we’ll be transmitting news to human consciousness directly, without the use of verbal language, which we will think of as a useless bottleneck in the process.