In March, our industry report on news automation for WAN-IFRA (The world association of newspapers and news publishers) was published and immediately picked up by Harvard’s Nieman Journalism Lab. We as researchers consider this an exciting opportunity to reach a much larger audience. The report is written for members of WAN-IFRA, but will be published as open access in mid-June here. Stay tuned for that!
There will also be a WAN-IFRA webinar on news automation with Li L’Estrade from Mittmedia on April 24, 2019 here.
Researchers at the University of Helsinki are developing news automation and information retrieval from text masses in cooperation with five other universities and the Finnish News Agency STT.
How to automatically find the essential content of news in various languages? How might a computer produce news smoothly, and does technology adapt to small linguistic areas such as Finland?
These are among the challenges to be tackled by EMBEDDIA, a research project launching in 2019 with EU funding, with the University of Helsinki participating. The three-year project will be developing methods for automated text analysis and generation. This means that the Immersive Automation project will continue in another context, with a new name.
One of the goals of the project is to simplify searching for information from online news, regardless of its language.
“Combining news written in several languages widens the perspective on the subject at hand, while making it possible to find out what is written on the item in different languages and indifferent media. The goal is to improve people’s access to information,” says Professor of Computer Science Hannu Toivonen, whose research group is taking part in the project.
The University of Helsinki’s Swedish School of Social Science is also a project participant, focused on investigating the needs of media companies.
“This project opens fascinating avenues into developing entirely new solutions for media to utilise. Ensuring a genuine demand for them is also important,” says Docent Carl-Gustav Lindén, a researcher of media and journalism.
Computers have the capacity to report on every single game
Many media companies are already employing automated news for reporting on sports and elections. Using structured data, computers are able to write news articles. For example, ice hockey games are a comfortably regular phenomenon from a computational viewpoint: they consist of three periods, resulting in an unambiguous number of goals.
According to Toivonen, news automation is useful because it enables the production of a great amount of news from consistent data. Computers can write articles on local hockey games even for a handful of readers.
“In such cases, the audience of a single piece of news may be small, but when the number of articles is great, media businesses both achieve extensive coverage and respond to specific needs,” Toivonen explains.
For now, automated news is comprised of election and sports coverage, and the like, which is generated in a structured manner from structured data. In-depth profiles and news analyses produced by computers are still some way off in the future, since computers are yet unable to handle the linguistic and content variation of these text types.
“Reporters are still needed. The nature of the profession may evolve, and meta-editorial elements will be involved. For example, journalists may instruct computers on reporting various subject matter,” Toivonen says.
“Such developments don’t necessarily apply to all journalists, but everyone must understand the direction the world of media is taking and the possibilities generated by new technologies,” Lindén adds.
Increasingly creative content through metaphors
In the EMBEDDIA project, Toivonen’s group is focusing on how to make computers able to automatically produce news as efficiently as possible and in several languages. This is a continuation of the group’s earlier research on news automation (in Finnish and Swedish only).
Modern technology provides computers with the ability to create relatively smooth content on election results, but they are not yet good at writing vividly. A creative touch is now being sought for both text structures and word choice.
“Metaphors employ structures that can betaught to computers, at least to a degree. This is how we hope to put a littlecolour into the language,” says Toivonen.
Background: A partnership of universities and media businesses
The University of Helsinki is participating in EMBEDDIA, a research project to be launched in 2019, developing news automation across language boundaries.
In addition to the University of Helsinki, five other European universities are taking part in EMBEDDIA, as are the Finnish News Agency STT and three other media businesses.
The name of the project derives from machine learning technologies known as word embeddings, which learn relations between words based on the contexts of their occurrences. The multilingual word-embedding models to be developed in the project will help computers find connections between texts written in different languages.
After two years of hard work and excellent results, the Immersive Automation research project received a worthy final seminar with quests from all over Europe. One of the seminar’s most expected quests was David Caswell, executive product manager of BBC News Labs and founder of Structured Stories.
BBC News Labs was founded in 2012 andin the past six years it has grown from a few part-time staff to about 20 team members including journalists, developers, scientists, developer-journalists and broadcast craft experts. According to Caswell, one of its – as well as the media future’s – main goals is to restore a privileged position for newsrooms.
In his presentation Caswell, said that the media field needs to find new artefacts for news that can restore and maintain a privileged position in a ‘many-to-many’ communications environment.
“One-to-many news artefacts, such as articles and programmes, cannot maintain a privileged position in a many-to-many communication environment because anyone can create them. We have to find and create something that is not easy to copy”, Caswell said at the seminar.
As a solution, Caswell presents several options. One of the most common yet interesting is personalisation. According to Caswell, it is efficient use of attention because it reduces cognitive friction.
Caswell also talked about how authority is moving from authorship to evidence and the need to shift a ‘trust me’ attitude to ‘see for yourself’ approach. In addition, newsrooms have to provide context on top of content.
“Content is abundant and cheap. Context is rare and valuable, and it can be assembled from networks of information such as connected data and integrated automation. It enables journalists do things that ordinary people cannot do”, Caswell explained.
At BBC News Lab, these ideas have been taken into use by providing many different kinds of news artefacts of the same story. The audience can, for example, choose to read a short or a long version of a particular story, or a watch a video instead
In addition to Caswell, two Nordic pioneers – editor Magnus Aabech from the Norwegian News Agency NTB and CEO Sören Karlsson from the United Robots from Sweden – talked in the seminar. In his presentation Aabech followed Caswell’s view of that one size does not fit all, illustrating this with the following picture.
On his turn, Sören Karlsson from the United Robots talked about what news automation means to journalists’ work in practise. He presented the following picture, where an automatic chatbot asks the team leader for comments after the match.
“Using this kind of tools gives us reliable,
relevant and high quality data. It also increases the quality of the texts,
gives a continuous flow of news and personalized distribution”, Karlsson said.
Besides Caswell, Aabech and Karlsson, also business developer Maija Paikkala from the Finnish news agency STT and head of Yle News Lab Jukka Niva from the Finnish public broadcasting company talked about how they are experimenting with new forms of structured journalism.
The seminar ended with the presentation of professor Hannu Toivonen who presented the University onHelsinki Department of Computer Science’s news project Embeddia, followed by apanel discussion moderated by PhD student Stefanie Sirén-Heikel. During the seminar, Immersive Automation research project’s WAN-IFRAreport was also presented by docent Carl-Gustav Lindén and PhD student Hanna Tuulonen.
The Immersive Automation research project’s final seminar was held in Helsinki at the Swedish School of Social Science, University of Helsinki on the 28th of November 2018.
This is the latest addition to the research literature from our IA project, an article for IEEE – “No Landslide for the Human Journalist: An Empirical Study of Computer-Generated Election News in Finland” with Magnus Melin as the first author. https://ieeexplore.ieee.org/document/8424161/authors#authors
“In an age of struggling news media, automated generation of news via natural language generation (NLG) methods could be of great help, especially in areas where the amount of raw input data is big, and the structure of the data is known in advance. One such news automation system is the Valtteri NLG system, which generates news articles about the Finnish municipal elections of 2017. To evaluate the quality of Valtteri-produced articles and to identify aspects to improve, $n=152$ users were asked to evaluate the output of Valtteri. Each evaluator rated six preselected computer-generated articles, four control articles written by journalists, and four computer-generated articles of their own choice. All the articles were evaluated along four dimensions: credibility, liking, quality, and representativeness. As expected, the texts written by Valtteri received lower ratings than those written by journalists, but overall the ratings were satisfactory (average 2.9 versus 4.0 for journalists on a five-point scale). Valtteri’s best rating (3.6) was for credibility. The computer-written articles that the evaluators could freely select got slightly better ratings than the preselected computer-written articles. When looking at the results by demographic groups, males aged 55 or more liked the automatic articles best and females aged 34 or less liked them the least. Evaluators mistook 21% of the computer-written articles as written by humans and 10% of the human-written articles as computer-written. The share of users making these mistakes grew with the age. Overall, the male evaluators made less writer-identification mistakes than female evaluators did.”
What is the future of News Automation? This is the main question we will discuss at the final event of the Immersive Automation project (2016-2018) where we will sum up what we learned. Link to video feed here 8pm CET https://connect.funet.fi/autofuture/
Our keynote speaker is David Caswell, Executive Product Manager of BBC News Labs and founder of Structured Stories. David works at the intersection between technology (especially data-driven technology), knowledge engineering and digital media.
We also have other excellent experts talking about the future of news automation, among them two pioneers from the Nordic countries, Magnus Aabech, Editor, editorial development at the Norwegian News Agency, NTB, and Sören Karlsson, CEO of United Robots from Sweden. From the Finnish news agency STT, we have Maija Paikkala, Business Developer and Key Account Manager and from the Finnish public broadcasting company Jukka Niva, Head of Yle News Lab. They are both experimenting with new forms of structured journalism.
At this event, we will also present the industry report “News Automation – A WAN-IFRA guide to the field” that we have produced for the World Association of Newspapers and News Publishers (WAN-IFRA).
In the report, we present five examples of how news automation has been implemented in newsrooms around the world: Mittmedia and United Robots (Sweden), Radar (UK), Washington Post (US), Valtteri (Finland), and Xinhua and Caixin (China).
This is a public event.
When: November 28, 2018, 9am to 12noon
Where: Swedish School of Social Science, University of Helsinki, Snellmaninkatu 12.
9:00 Introduction: Carl-Gustav Lindén, Docent, Swedish School of Social Science at University of Helsinki
9:05 Keynote: The future of news automation. David Caswell, Executive Product Manager of BBC News Labs and founder of Structured Stories.
09:45 Presentation: News Automation: A WAN-IFRA guide to the field, Carl-Gustav Lindén and Hanna Tuulonen, PhD student, University of Helsinki
Magnus Aabech, Editor, editorial development, Norwegian News Agency, NTB
Sören Karlsson, CEO, United Robots
Maija Paikkala, Business Developer and Key Account Manager, Finnish News Agency, STT.
Jukka Niva, Head of Yle News Lab
11:00 Presentation: Embeddia, Professor Hannu Toivonen, University of Helsinki, Department of Computer Science
11:15 Panel discussion: David Caswell, Magnus Aabech, Maija Paikkala, Sören Karlsson, Jukka Niva. Moderator: Stefanie Sirén-Heikel, PhD student.
David Caswell is the Executive Product Manager of BBC News Labs. He has previously led product management for data and machine learning at Tribune Publishing, and was the Director of Product Management for Content Understanding at Yahoo!. Caswell is also the developer of the Structured Stories platform, which demonstrated the representation of journalistic events and narratives as structured data under editorial control, and has written extensively on structured and automated journalism. He has also researched structured journalism as a Fellow at the Reynolds Journalism Institute at the Missouri School of Journalism, and co-organizes an annual workshop on event and storyline representation for news.
The project is moving forward and our team has been busy with their research. As the academic year is ending, we want to provide you with some examples of where our work has been presented during the past year.
Presentations during the past year
Last fall, we gave presentations at several conferences, including Verkkoviestijöiden päivät 21.–22.9.2017 in Vantaa and the IPTC seminar on news automation in Barcelona 8.11.2017. We also attended the conference Computation + Journalism at Northwestern University in Chicago in October.
In September 2017, two peer-reviewed academic papers were presented on the technical aspects of automated news production. The first of these papers, Data-Driven News Generation for Automated Journalism was presented on September 4–7 in the 10th International Natural Language Generation conference (INLG) in Santiago de Compostela, Spain. INLG is the conference of the Association for Computational Linguistics (ACL) Special Interest Group on Natural Language Generation (SIGGEN). The second paper, Finding and Expressing News from Structured Data was presented September 20–21 at the 21st International Academic Mindtrek Conference in Tampere, Finland.
The election news generation system was also presented to an audience of Nordic news media industry personnel at the 5th NxtMedia Conference in Trondheim, Norway on November 15th. It was further demonstrated to domestic press as well as industry and academic personnel on the AI Day organized jointly by University of Helsinki and Aalto University.
On 28 February 2018, Carl-Gustav Lindén gave a lecture on news automation to around 60 students and staff at the Asian College of Journalism in Chennai, India. The active discussion afterwards revealed that news automation is known as a concept in India, but there are no applications used at the moment. There was also the usual discussion about journalists losing jobs because of automation. Carl-Gustav Lindén also moderated a panel at the Nordic Data Journalism Conference NODA2018 about news automation. The four participants came from Norway, Sweden and Finland and the common conclusion was that that news automation is only in the beginning of the development.
Carl-Gustav Lindén presented the paper Creating value in the age of algorithms – a Finnish perspective on newsroom strategies at the World Media Economics and Management Conference 2018 (WMEMC2018) in Cape Town, South Africa in May 2018. The paper, with Stefanie Sirén-Heikel as the main author, was one of few dealing with algorithms at the conference, which attracted around 250 researchers from around the world.
What’s going on right now?
In 2018, we have been investigating methods for improving data analysis and the language generation process. The results of these methods will be demonstrated in Valtteri 2.0. Valtteri 2.0 will exhibit a generation system for news on crime statistics. This system will exhibit an improved quality of language, improved analytical capabilities and the generation of more complex news articles from significantly richer data. This application domain also further highlights the computational power of our system as contrasted to human journalists. Researchers Asta Bäck and Magnus Melin have been occupied with testing Valtteri 2.0. Last submission day was 13th of May and an analysis of the results is on-going.
Melin held a presentation about Valtteri at VTT’s internal Friday madness and has presented his work at a VTT business area meeting, where we gained the insight that the topic has good potential, and comparatively few players on the market.
If we are to believe data journalist and data consultant Marco Maas, the future of journalism lies in our surroundings.
Most of us would probably not spend the profits from a few successful Bitcoin sales on equipping our homes with various sensors, but that is in fact what German data journalist Marco Maas did some time ago. In order to develop a new ground breaking way of providing news services, he also gave up most of his privacy, and allowed his colleagues to follow the data collected by the sensors in his home.
What Maas and his colleagues at the Hamburg based company xMinutes hope to achieve is a news service, which based on different data sets provides a selection of the most interesting news for its users.
“Our aim is to implement a context API, which can be used by ourselves but also publishers”, Maas says.
Currently xMinutes works together with the news agency Deutsche Presse-Agentur DPA analysing news produced by the agency and building an understanding of the news cycle.
“We are in the beginning of the process, but we can already tell that a lot of content can be left out. We are far away from our machines understanding the real topics, but this data helps us identify the content which can be left out.”
Another feature xMinutes is contemplating to develop is an added filter, which places various news outlets and journalists on a political spectrum.
“Although actually we have the most promising results from the simplest algorithmic solutions. So if a user is interested in politics, we just give them more news about politics in general.”
“I don’t think publishers with a regional concentration will gain success by simply launching an app. Their business model will have to include getting their content to other apps, and places where the attention already is”, Maas says.
He recommends making the content as portable as possible, publishing the content on as many platforms as possible, and trying to be as relevant as possible to the users. Maas also predicts that the Google assistant will become the most powerful platform.
“We just don’t see a point in trying to compete against Google or Facebook. You can certainly try, but I think we should try to focus on other topics, and work for a long term solution, which can then be incorporated to those big platforms as well.”
One of the insights that he has gained during the process is the importance of meta data.
“If you have a lot of meta data in your articles, even Facebook and such can understand your content better, and give it greater visibility. At least in Germany this is a problem, not having compatible meta data.”
xMinutes is also confident that so called ambient news will have a key role for the future of news. By ambient news Maas refers to the possibility of incorporating news or other preferred content in our surrounding.
“It can be speakers, displays, or something else”, Maas explains.
A key finding after meeting with a test group of fourteen people was that the audience does not want to be disturbed.
“They only wanted information when it is important for them, such as traffic information on their way to work.”
The test group also revealed that the audience is willing to give out their personal information to a news service if they receive something beneficial in return.
“If our system could give personalised suggestions based on what the user had been reading throughout the day, they were fine with giving their location and other information to us. The discussion within journalism however seems to be that the readers are not willing to give out their data.”
Maas encourages news organisations to a dialogue with the audience and users in order to truly understand their needs.
“Google and Apple are looking at the context, but journalists are not. We should look at new situations where our content can be interesting, say bathroom reading in the morning and short news breaks during the day.”
He says there are a thousand places to reach an audience – we just have not thought about them yet.
I am at yet another conference, hearing yet another presentation of “robot journalism”. I realise that my campaign against this metaphor for news automation was lost long ago. With the use of pictures depicting robots writing on keyboards, sometimes wearing a hat where it says, “Press”, graphical designers, editors, and researchers have managed to establish the most damaging mental picture. By framing news automation as robots coming to take the jobs of journalists, we have managed to maybe destroy, or at least delay, our move towards a future of augmented journalism, where smart machines are helping reporters do their jobs better. (This is what we call a robot.).
You might think that there is a lot going on, at least based on the massive publicity about the cooperation between the legacy news agency Associated Press in New York and the software company Automated Insights in Durham, North Carolina. Together, they have created a system that automatically generates earnings reports for AP’s customers. It is simple and works beautifully, relieving financial reporters from the boring and tedious work of digging through financial reports, and getting totally bugged-down by the “earnings seasons” that last for several weeks four times a year.
Francesco Marconi at AP has written a nice guide to automation for those who want to follow progress in this area and there are experiments going on at many news agencies where this actually saves many resources.
Marconi also states something that should be obvious to us all: not all journalistic work should be automated, but we would be stupid to not explore these new opportunities that advances in natural language generation gives us.
However – and this is a view based on dozens of research interviews I have done across Europe and the United States, as well as discussions at conferences such as GEN Summit, Computation + Journalism, Nicar, ICA, and WAN –IFRA Digital Media Europe – there is not much going on in this field. AP is one of the few cases. Another, is United Robots in Sweden, which provides newspapers companies with automated soccer coverage (have a look here). Narrative Science has also been doing this for several years now. But besides these showcases, not much.
My talks with representatives from service providers of news automation tools, who have negotiated with media companies, paint a depressing picture of the mental state of media companies. They are the worst possible customers. They are unable to make decisions, have no financial resources, are not prepared to invest in new technology, are always looking for the low-hanging fruit, and are content with the fact that their peers are not doing any investments either.
“Media companies are way behind in the race towards augmented intelligence and, in a situation where they really need to invest resources, they are holding back.”
It’s no wonder software companies are looking for better customers in the financial industry or ecommerce. At least they get a lot of free publicity, which is good for marketing towards other potential customers. And looking at the financial statements of media companies gives you a grave picture of how little investment is put into digital transformation. “We have given up on media customers”, “never again work without getting paid”, and so forth, are things I frequently hear. Media companies are way behind in the race towards augmented intelligence and, in a situation where they really need to invest resources, they are holding back.
If we take journalists, they are more than happy to dismiss the potential of news automation. This is somewhat strange, considering how many functions in any newsroom are automated already, beginning with word processing and photo editing. Walk in to a television studio and be amazed by the level of automation. In reality, the latest development should be regarded as just another step in the newsroom’s human–computer advancement. There are no signs that automation has taken away any journalist’s jobs; instead, journalists are performing tasks that previously were assigned to non-editorial specialists, such as typesetters, telephone operators and darkroom assistants, which now have all but disappeared from the editorial offices.
Automation or computer anxiety is certainly not a new thing in either knowledge work generally, or journalism in particular. Aristotle, Queen Elisabeth I, the Luddites, James Joyce and John Maynard Keynes were all concerned with the impact of technology on employment (for more information on automation anxiety, click here).
Automation anxiety should not be equated with worries about what smart machines will be able to perform in terms of surveillance, or coercion and the balance of power. These concerns are justified as the potential threats are real. However, that should be kept separate from the “jobs lost” debate. Here, the damage done by the metaphor “robot journalism” cannot be overstated. Please stop using it.
During early Summer of 2018, the Immersive Automation project will debut Valtteri 2.0. This will be an improved version from the election news generation bot that we released in April 2017 (see http://www.vaalibotti.fi). In addition to language improvements, Valtteri 2.0 will showcase its ability to generate news articles in a new domain, i.e., crime statistics, as well as be able automatically generate visualizations to go with the article. For example, Valtteri 2.0 will be able to write an article on the current state and on interesting trends of motor vehicle theft statistics in any municipality in Finland.
Crime has always interested the public. On a typical day, crime and justice stories make up 15% of the reported news (Katz, 1987).
Often, however, crime news reported in newspapers might give readers misleading, exaggerated, or biased notions of crime. In other words,, news does not always present crimes in the proportions in which they are actually committed (Graber, 1979), whereas looking at data in context may give give a completely different picture.
Using data to paint an accurate picture motivated this second version of
Valtteri. The prototype is currently being implemented and is planned to be ready in early Summer of 2018. Similarly to Valtteri 1.0, version 2.0 will take in structured data, this time extracted from Statistics Finland, analyze the data, and generate hundreds of thousands of news article as a result – an impossible feat for human journalists. In addition, users will still be able to select the news they would like to read and be able to interact with the included visualizations.
Vatteri 1.0, showed us the possibilities and gave us experience in automatically generating natural language news articles from structured data. However, no systems exist which we know of, that automatically generate news from criminal offence statistics, let alone in multiple languages. Stay tuned for the release of Valtteri 2.0!
Katz, J. (1987). What makes crimenews’? Media, Culture & Society, 9(1), 47-75.
Graber, D. A. (1979). Is crime news coverage excessive? Journal of communication, 29(3), 81-92.
The automation of news has been in the pipeline for decades. However, it still humans that are writing the news. Why is that so?
The most apparent challenge for automation is language. Algorithms are already able to conjugate words successfully. However, the subtle nuances of human language do not conjugate in conditional sentences of “if A, then B”. This limitation makes the language stiff and in the long run rather monotonous. An even bigger problem is content. Algorithms are producing numeric and highly structured result data from companies, sports and elections. This enables news about these subjects to be successfully automated, including in Finland and in Finnish. However, a real scoop is all about new, unexpected, and hard-earned information. A pre-coded algorithm cannot get the grip of such issues.
Thirdly, the hesitation of media companies and software developers is hindering the development. “One would imagine that there is a lot going on in the industry,” says news automation researcher Carl-Gustav Lindén, “but with a few exceptions, there really isn’t”. Technology itself is not a foreign issue in the field of editing, as the newsrooms are full of it. However, the talk of “robotic journalists” has frightened human publishers, although “there is no sign that development in automation would have reduced journalists’ work,” Lindén recalls. “We should rather see this development as a step forward in the co-operation between journalists and technology.”
It is certain that “robots” will not be writing analytical and engaging stories in a matter of years or even decades. It is also certain that the collaboration between people and software will be developed. A few Finnish editorial offices are already locating potential news topics from public protocols with the help of algorithms. Why could the same software not also compile background material, reveal hidden correlations between distant variables, produce copies and make different versions of ready texts for diverse distribution channels?
Given the speed at which technology is developing, predicting a hundred years in the future feels quite ridiculous. We are still going to need to select and communicate news a hundred years from now. Perhaps we’ll be transmitting news to human consciousness directly, without the use of verbal language, which we will think of as a useless bottleneck in the process.