What is the future of News Automation? This is the main question we will discuss at the final event of the Immersive Automation project (2016-2018) where we will sum up what we learned.
Our keynote speaker is David Caswell, Executive Product Manager of BBC News Labs and founder of Structured Stories. David works at the intersection between technology (especially data-driven technology), knowledge engineering and digital media.
At this event, we will present the industry report “News Automation – A WAN-IFRA guide to the field” that we have produced for the World Association of Newspapers and News Publishers (WAN-IFRA).
In the report, we present five examples of how news automation has been implemented in newsrooms around the world: Mittmedia and United Robots (Sweden), Radar (UK), Washington Post (US), Valtteri (Finland), and Xinhua and Caixin (China).
This is a public event.
When: November 28, 2018, 9am to 12noon
Where: Swedish School of Social Science, University of Helsinki, Snellmaninkatu 12.
Contact person: Adjunct Professor Carl-Gustav Linden, firstname.lastname@example.org
David Caswell bio
David Caswell is the Executive Product Manager of BBC News Labs. He has previously led product management for data and machine learning at Tribune Publishing, and was the Director of Product Management for Content Understanding at Yahoo!. Caswell is also the developer of the Structured Stories platform, which demonstrated the representation of journalistic events and narratives as structured data under editorial control, and has written extensively on structured and automated journalism. He has also researched structured journalism as a Fellow at the Reynolds Journalism Institute at the Missouri School of Journalism, and co-organizes an annual workshop on event and storyline representation for news.
The project is moving forward and our team has been busy with their research. As the academic year is ending, we want to provide you with some examples of where our work has been presented during the past year.
Presentations during the past year
Last fall, we gave presentations at several conferences, including Verkkoviestijöiden päivät 21.–22.9.2017 in Vantaa and the IPTC seminar on news automation in Barcelona 8.11.2017. We also attended the conference Computation + Journalism at Northwestern University in Chicago in October.
In September 2017, two peer-reviewed academic papers were presented on the technical aspects of automated news production. The first of these papers, Data-Driven News Generation for Automated Journalism was presented on September 4–7 in the 10th International Natural Language Generation conference (INLG) in Santiago de Compostela, Spain. INLG is the conference of the Association for Computational Linguistics (ACL) Special Interest Group on Natural Language Generation (SIGGEN). The second paper, Finding and Expressing News from Structured Data was presented September 20–21 at the 21st International Academic Mindtrek Conference in Tampere, Finland.
The election news generation system was also presented to an audience of Nordic news media industry personnel at the 5th NxtMedia Conference in Trondheim, Norway on November 15th. It was further demonstrated to domestic press as well as industry and academic personnel on the AI Day organized jointly by University of Helsinki and Aalto University.
On 28 February 2018, Carl-Gustav Lindén gave a lecture on news automation to around 60 students and staff at the Asian College of Journalism in Chennai, India. The active discussion afterwards revealed that news automation is known as a concept in India, but there are no applications used at the moment. There was also the usual discussion about journalists losing jobs because of automation. Carl-Gustav Lindén also moderated a panel at the Nordic Data Journalism Conference NODA2018 about news automation. The four participants came from Norway, Sweden and Finland and the common conclusion was that that news automation is only in the beginning of the development.
Carl-Gustav Lindén presented the paper Creating value in the age of algorithms – a Finnish perspective on newsroom strategies at the World Media Economics and Management Conference 2018 (WMEMC2018) in Cape Town, South Africa in May 2018. The paper, with Stefanie Sirén-Heikel as the main author, was one of few dealing with algorithms at the conference, which attracted around 250 researchers from around the world.
What’s going on right now?
In 2018, we have been investigating methods for improving data analysis and the language generation process. The results of these methods will be demonstrated in Valtteri 2.0. Valtteri 2.0 will exhibit a generation system for news on crime statistics. This system will exhibit an improved quality of language, improved analytical capabilities and the generation of more complex news articles from significantly richer data. This application domain also further highlights the computational power of our system as contrasted to human journalists. Researchers Asta Bäck and Magnus Melin have been occupied with testing Valtteri 2.0. Last submission day was 13th of May and an analysis of the results is on-going.
Melin held a presentation about Valtteri at VTT’s internal Friday madness and has presented his work at a VTT business area meeting, where we gained the insight that the topic has good potential, and comparatively few players on the market.
If we are to believe data journalist and data consultant Marco Maas, the future of journalism lies in our surroundings.
Most of us would probably not spend the profits from a few successful Bitcoin sales on equipping our homes with various sensors, but that is in fact what German data journalist Marco Maas did some time ago. In order to develop a new ground breaking way of providing news services, he also gave up most of his privacy, and allowed his colleagues to follow the data collected by the sensors in his home.
What Maas and his colleagues at the Hamburg based company xMinutes hope to achieve is a news service, which based on different data sets provides a selection of the most interesting news for its users.
“Our aim is to implement a context API, which can be used by ourselves but also publishers”, Maas says.
Currently xMinutes works together with the news agency Deutsche Presse-Agentur DPA analysing news produced by the agency and building an understanding of the news cycle.
“We are in the beginning of the process, but we can already tell that a lot of content can be left out. We are far away from our machines understanding the real topics, but this data helps us identify the content which can be left out.”
Another feature xMinutes is contemplating to develop is an added filter, which places various news outlets and journalists on a political spectrum.
“Although actually we have the most promising results from the simplest algorithmic solutions. So if a user is interested in politics, we just give them more news about politics in general.”
“I don’t think publishers with a regional concentration will gain success by simply launching an app. Their business model will have to include getting their content to other apps, and places where the attention already is”, Maas says.
He recommends making the content as portable as possible, publishing the content on as many platforms as possible, and trying to be as relevant as possible to the users. Maas also predicts that the Google assistant will become the most powerful platform.
“We just don’t see a point in trying to compete against Google or Facebook. You can certainly try, but I think we should try to focus on other topics, and work for a long term solution, which can then be incorporated to those big platforms as well.”
One of the insights that he has gained during the process is the importance of meta data.
“If you have a lot of meta data in your articles, even Facebook and such can understand your content better, and give it greater visibility. At least in Germany this is a problem, not having compatible meta data.”
xMinutes is also confident that so called ambient news will have a key role for the future of news. By ambient news Maas refers to the possibility of incorporating news or other preferred content in our surrounding.
“It can be speakers, displays, or something else”, Maas explains.
A key finding after meeting with a test group of fourteen people was that the audience does not want to be disturbed.
“They only wanted information when it is important for them, such as traffic information on their way to work.”
The test group also revealed that the audience is willing to give out their personal information to a news service if they receive something beneficial in return.
“If our system could give personalised suggestions based on what the user had been reading throughout the day, they were fine with giving their location and other information to us. The discussion within journalism however seems to be that the readers are not willing to give out their data.”
Maas encourages news organisations to a dialogue with the audience and users in order to truly understand their needs.
“Google and Apple are looking at the context, but journalists are not. We should look at new situations where our content can be interesting, say bathroom reading in the morning and short news breaks during the day.”
He says there are a thousand places to reach an audience – we just have not thought about them yet.
I am at yet another conference, hearing yet another presentation of “robot journalism”. I realise that my campaign against this metaphor for news automation was lost long ago. With the use of pictures depicting robots writing on keyboards, sometimes wearing a hat where it says, “Press”, graphical designers, editors, and researchers have managed to establish the most damaging mental picture. By framing news automation as robots coming to take the jobs of journalists, we have managed to maybe destroy, or at least delay, our move towards a future of augmented journalism, where smart machines are helping reporters do their jobs better. (This is what we call a robot.).
You might think that there is a lot going on, at least based on the massive publicity about the cooperation between the legacy news agency Associated Press in New York and the software company Automated Insights in Durham, North Carolina. Together, they have created a system that automatically generates earnings reports for AP’s customers. It is simple and works beautifully, relieving financial reporters from the boring and tedious work of digging through financial reports, and getting totally bugged-down by the “earnings seasons” that last for several weeks four times a year.
Francesco Marconi at AP has written a nice guide to automation for those who want to follow progress in this area and there are experiments going on at many news agencies where this actually saves many resources.
Marconi also states something that should be obvious to us all: not all journalistic work should be automated, but we would be stupid to not explore these new opportunities that advances in natural language generation gives us.
However – and this is a view based on dozens of research interviews I have done across Europe and the United States, as well as discussions at conferences such as GEN Summit, Computation + Journalism, Nicar, ICA, and WAN –IFRA Digital Media Europe – there is not much going on in this field. AP is one of the few cases. Another, is United Robots in Sweden, which provides newspapers companies with automated soccer coverage (have a look here). Narrative Science has also been doing this for several years now. But besides these showcases, not much.
My talks with representatives from service providers of news automation tools, who have negotiated with media companies, paint a depressing picture of the mental state of media companies. They are the worst possible customers. They are unable to make decisions, have no financial resources, are not prepared to invest in new technology, are always looking for the low-hanging fruit, and are content with the fact that their peers are not doing any investments either.
“Media companies are way behind in the race towards augmented intelligence and, in a situation where they really need to invest resources, they are holding back.”
It’s no wonder software companies are looking for better customers in the financial industry or ecommerce. At least they get a lot of free publicity, which is good for marketing towards other potential customers. And looking at the financial statements of media companies gives you a grave picture of how little investment is put into digital transformation. “We have given up on media customers”, “never again work without getting paid”, and so forth, are things I frequently hear. Media companies are way behind in the race towards augmented intelligence and, in a situation where they really need to invest resources, they are holding back.
If we take journalists, they are more than happy to dismiss the potential of news automation. This is somewhat strange, considering how many functions in any newsroom are automated already, beginning with word processing and photo editing. Walk in to a television studio and be amazed by the level of automation. In reality, the latest development should be regarded as just another step in the newsroom’s human–computer advancement. There are no signs that automation has taken away any journalist’s jobs; instead, journalists are performing tasks that previously were assigned to non-editorial specialists, such as typesetters, telephone operators and darkroom assistants, which now have all but disappeared from the editorial offices.
Automation or computer anxiety is certainly not a new thing in either knowledge work generally, or journalism in particular. Aristotle, Queen Elisabeth I, the Luddites, James Joyce and John Maynard Keynes were all concerned with the impact of technology on employment (for more information on automation anxiety, click here).
Automation anxiety should not be equated with worries about what smart machines will be able to perform in terms of surveillance, or coercion and the balance of power. These concerns are justified as the potential threats are real. However, that should be kept separate from the “jobs lost” debate. Here, the damage done by the metaphor “robot journalism” cannot be overstated. Please stop using it.
During early Summer of 2018, the Immersive Automation project will debut Valtteri 2.0. This will be an improved version from the election news generation bot that we released in April 2017 (see http://www.vaalibotti.fi). In addition to language improvements, Valtteri 2.0 will showcase its ability to generate news articles in a new domain, i.e., crime statistics, as well as be able automatically generate visualizations to go with the article. For example, Valtteri 2.0 will be able to write an article on the current state and on interesting trends of motor vehicle theft statistics in any municipality in Finland.
Crime has always interested the public. On a typical day, crime and justice stories make up 15% of the reported news (Katz, 1987).
Often, however, crime news reported in newspapers might give readers misleading, exaggerated, or biased notions of crime. In other words,, news does not always present crimes in the proportions in which they are actually committed (Graber, 1979), whereas looking at data in context may give give a completely different picture.
Using data to paint an accurate picture motivated this second version of
Valtteri. The prototype is currently being implemented and is planned to be ready in early Summer of 2018. Similarly to Valtteri 1.0, version 2.0 will take in structured data, this time extracted from Statistics Finland, analyze the data, and generate hundreds of thousands of news article as a result – an impossible feat for human journalists. In addition, users will still be able to select the news they would like to read and be able to interact with the included visualizations.
Vatteri 1.0, showed us the possibilities and gave us experience in automatically generating natural language news articles from structured data. However, no systems exist which we know of, that automatically generate news from criminal offence statistics, let alone in multiple languages. Stay tuned for the release of Valtteri 2.0!
Katz, J. (1987). What makes crimenews’? Media, Culture & Society, 9(1), 47-75.
Graber, D. A. (1979). Is crime news coverage excessive? Journal of communication, 29(3), 81-92.
The automation of news has been in the pipeline for decades. However, it still humans that are writing the news. Why is that so?
The most apparent challenge for automation is language. Algorithms are already able to conjugate words successfully. However, the subtle nuances of human language do not conjugate in conditional sentences of “if A, then B”. This limitation makes the language stiff and in the long run rather monotonous. An even bigger problem is content. Algorithms are producing numeric and highly structured result data from companies, sports and elections. This enables news about these subjects to be successfully automated, including in Finland and in Finnish. However, a real scoop is all about new, unexpected, and hard-earned information. A pre-coded algorithm cannot get the grip of such issues.
Thirdly, the hesitation of media companies and software developers is hindering the development. “One would imagine that there is a lot going on in the industry,” says news automation researcher Carl-Gustav Lindén, “but with a few exceptions, there really isn’t”. Technology itself is not a foreign issue in the field of editing, as the newsrooms are full of it. However, the talk of “robotic journalists” has frightened human publishers, although “there is no sign that development in automation would have reduced journalists’ work,” Lindén recalls. “We should rather see this development as a step forward in the co-operation between journalists and technology.”
It is certain that “robots” will not be writing analytical and engaging stories in a matter of years or even decades. It is also certain that the collaboration between people and software will be developed. A few Finnish editorial offices are already locating potential news topics from public protocols with the help of algorithms. Why could the same software not also compile background material, reveal hidden correlations between distant variables, produce copies and make different versions of ready texts for diverse distribution channels?
Given the speed at which technology is developing, predicting a hundred years in the future feels quite ridiculous. We are still going to need to select and communicate news a hundred years from now. Perhaps we’ll be transmitting news to human consciousness directly, without the use of verbal language, which we will think of as a useless bottleneck in the process.
Nearhood was a platform for hyperlocal journalism. While some scholars and analysts say media consumers are increasingly interested in hyperlocal news, why did Nearhood not become a success story?
During the Immersive Automation project’s second workshop for media partners, senior lecturer John Grönvall from Arcada University of Applied Sciences used a case study as a means to explain why good ideas do not always lead to successful results.
“Nearhood was a research project on the sharing economy in Helsinki, and it combined interesting open data with a multiplatform web service,” Grönvall explains.
By the sharing economy Grönvall refers to underutilised resources and peer-to-peer networks enabled by different platforms. While Uber and Airbnb represent global services built on an idea of a sharing economy, sites such as huuto.fi, events like Saunapäivä, and Facebook-groups such as Haaga kiertoon, are also examples of sharing economies.
“The idea is to facilitate peer-to-peer interaction and transaction, and to shift the communication flow from one-to-may towards many-to-many. The role and power of the middleman is reduced through technology.”
The platform never succeeded in attracting a critical mass, which meant that there were no network effects.
Nearhood aimed to combine hyperlocal social media and news aggregation for neighbourhoods in Helsinki. The application was supposed to work as a bulletin or notice board containing information from local businesses, residents, and municipalities.
“So, when I found an abandoned sofa close to my house, I could post a photo on Nearhood to let people in my area know that there was a piece of furniture up for grabs. While hyperlocal content was still largely unorganised in individual blogs or Facebook groups, everyone agreed that it had a huge potential. Now the metropolitan cities have begun to open up this data,” Grönvall says and mentions the Helsinki Region Infoshare (HRI) as an example.
The code and the database still exist, but the platform is no-longer available to the public.
HRI is a web service for open data sources in the cities of Helsinki, Espoo, Vantaa, and Kauniainen. The data can be used in research and development activities, decision-making, visualization, data journalism, as well as in the development of apps. Citizens, businesses, research facilities, and other actors can freely use the data at no cost.
Despite the promising starting point, Nearhood did not become a huge success.
“The platform never succeeded in attracting a critical mass, which meant that there were no network effects. It also failed to engage developers due to the mediocre user experiences, and the complexity of the system.”
Like many other similar platforms and apps, the dominance of Facebook was too hefty to compete against.
“The megaplatform, in this case Facebook, was too dominant to compete against.”
The code and the database still exist, but the platform is no-longer available to the public.
Grönvall conducted research interviews with the Nearhood executives. Although Nearhood never became a great success story, it still provided some valuable insights.
“One of the executives urged the developers to focus on one feature and do that well. That simplicity is the key thing. They also recommended focusing on developing an app instead of launching a platform on the web.”
As Finland moves towards a more human centric data management, the importance of dynamic consent increases.
The Immersive Automation project arranged its second workshop for media partners and journalists in September. Senior Adviser Taru Rastas from the Ministry of Transport and Communications gave a presentation of her take on data in media, and how the Ministry is working to increase citizens’ right to decide and monitor their personal information.
“Promoting digital business is a part of the government agenda. There are five key government projects, and big data and MyData are one of those,” Rastas explains.
The point is to encourage organizations holding personal data to give individuals control over this data, extending beyond their minimum legal requirements to do so.
The project includes for example data sharing practices, and data protection in digital business. MyData, in turn, refers to a new approach to personal data management, where the aim is to provide individuals with the practical means to access, obtain, and use datasets containing their personal information. This personal information might be for example medical records, financial information, or traffic data derived from various online services. The point is to encourage organizations holding personal data to give individuals control over this data, extending beyond their minimum legal requirements to do so.
“In human centric data management the individual is seen as both a connector and controller. This means that data sets concerning the individual can be connected, and the individual decides who can use and how they can use this data.”
In other words, individuals are empowered actors in the management of their personal lives both online and offline.
An essential requirement in order to carry out a more human centric management of data is that the access to personal data must be easy. This is why the MyData approach incorporates the ‘Open Data’ movement philosophy that providing access to information in a free and transparent format increases its usefulness and value. Open Data is technically and legally free for anyone to use, reuse, and distribute. Similarly, data collected about a person will meet the criterion of MyData if it is technically and legally available for the individual to use, reuse, and distribute as they wish.
“MyData can as such be used to create new services which help individuals to manage their lives. The providers of these services can then create new business models, and economic growth to the society.”
The goal is to build trust in personal data services through transparency, interchangeability, public governance, respectable companies, public awareness, and secure technology. This is why the idea of a dynamic consent is so important.
“Consent management is the primary mechanism for permitting and enforcing the legal use of data. Via MyData accounts individuals can instruct the services to fetch and process data in accordance with consents that the individual has granted to data services.”
In May, the Immersive Automation project met up with Magnus Aabech from the Norwegian news agency NTB. Aabech is involved in NTB’s election bot project and talked to us about the experiences he has had so far with news automation. After the parliamentary elections in Norway, Aabech also shared some of NTB’s experiences with automated election reporting.
The Norwegian news agency (NTB) has previously focused on automating sports news, mainly football, but during the parliamentary elections in September 2017, the agency also used a simple bot in order to produce partially automated election news. The agency had a channel on Slack, which alerted the reporters when some type of changes occurred in the results. The bot would also produce a news text about the changes. The responsible reporter would then determine whether the changes were worth publishing.
“Just like the Immersive Automation project, we also work with text templates. However, our system is not really using NLG,” Aabech explains.
“We have received a lot of attention within the industry, which is always a plus”
“This could surely have been done in a shorter format, but my colleague and I are not that experienced in writing code,” Aabech says and laughs a little.
The election bot was created in-house, and the work involved three or four employees. One of the reporters worked full time on the bot for two months, while the others were involved part-time in addition to their regular tasks.
Aabech describes the experience as a positive one.
“The election bot provided us with plenty of valuable information, and it also illustrated to a lot of the NTB’s employees what can actually be done by news automation. Although the bot was a success in a lot of ways, we still experienced some technical difficulties in the beginning. Fortunately, we managed to solve them.”
For the NTB the most important thing was to develop the skills of the employees as well as prepare for the Norwegian municipal elections in 2019.
“And, of course, we were happy about the quality of the texts and how well the bot worked. In addition, we have received a lot of attention within the industry, which is always a plus.”
“The election bot provided us with plenty of valuable information, and it also illustrated to a lot of the NTB’s employees what can actually be done by news automation.”
Developing systems for news automation is expensive, and as such Aabech was interested in knowing more about the ways in which our project has attempted to create a re-usable system of the Valtteri election bot.
“Projects like these can be expensive. Automation is also something that I work with on top of my regular tasks, so the progress is quite slow.”
We compared the structures, and what makes Valtteri so special, is that instead of just making one big black box, the IA-project has created a chain of smaller black boxes, and as such can alter each individual box instead of starting from scratch and building a completely new system.
While the IA-project aims at producing news, which can be published directly for the audience to read, the NTB decided to proofread the automatically produced election texts before they went out on the NTB’s newswire.
“This was a type of pilot project, but in the future we will send out the automatically produced texts directly. That is also what we do with our football texts,” Aabech says.
EDIT: In the original text we wrote that the NTB proof reads all of its automatically produced texts. This was only the case with the election reporting.
The spread of fake news and hateful content is one of the most debated topics right now. As machine learning techniques become more and more sophisticated, numerous fields have begun to utilise these techniques. In her PhD, text and data analyst Myriam Munezero has studied machine learning models that can detect antisocial behaviours. In this blogpost she explains the possibilities of natural language processing in violence prevention.
More than a billion people use Facebook daily, and the social media platform has become one of the most influential news businesses, with an incredible ability to mobilise people. Despite community standards and encouragements to tackle hateful content more efficiently, racist and hateful material still exist on the platform.
“The words we use, as well as our writing styles, can reveal information about our preferences, thoughts, emotions, and behaviours,” Myriam Munezero says.
Natural language processing techniques have been shown to be useful in identifying harmful behaviors, such as cyberbullying, harassment, extremism, and terrorism, in text
In her research conducted at the University of Eastern Finland, she and her research team developed machine learning models that can detect antisocial behaviours, such as hate speech and indications of violence, from texts. Historically, most attempts to address antisocial behaviour have been done from educational, social and psychological points of view. This new study has, however, demonstrated the potential of using natural language processing techniques to develop state-of-the-art solutions to combat antisocial behaviour in written communication.
“Natural language processing techniques have been shown to be useful in identifying similar harmful behaviors, such as cyberbullying, harassment, extremism, and terrorism in text, all with varying levels of accuracy. However, few research address the broader antisocial behavior, which is characterized by covert and overt hostility and intentional aggression toward others,” Munezero explains.
Munezero and her fellow researchers have created solutions that can be integrated in web forums or social media websites to automatically or semi-automatically detect potential incidences of antisocial behaviour. The high accuracy of these solutions allows for fast and reliable warnings and interventions to be made before the possible acts of violence are committed. In many instances, people who have committed school shootings for instance, have indicated their intentions online prior to action. By detecting these indications, future acts of violence could be prevented.
One of the great challenges in detecting antisocial behaviour is first defining what precisely counts as antisocial behaviour and then determining how to detect such phenomena. Thus, using an exploratory and interdisciplinary approach, Munezero’s study applied natural language processing techniques to identify, extract, and utilise the linguistic features, including emotional features, pertaining to antisocial behaviour.
The study investigated emotions and their role or presence in antisocial behaviour. Literature in the fields of psychology and cognitive science shows that emotions have a direct or indirect role in instigating antisocial behaviour. Thus, for the analysis of emotions in written language, the study created a novel resource for analysing emotions. This resource further contributes to subfields of natural language processing, such as emotion and sentiment analysis.
The study also created a novel corpus of antisocial behaviour texts, allowing for a deeper insight into and understanding of how antisocial behaviour is expressed in written language.
“Finding representative corpora to study harmful behaviours is usually difficult,” Munezero says.
As the results are encouraging, Munezero finds that further progress within this topic can be made with continued research on the relationships between natural language and societal concerns.
Myriam Munezero’s PhD was approved on April 12 at the University of Eastern Finland. She also appears in an article in the newspaper Karjalainen. Munezero currently works as a researcher at the faculty of Data Science at the University of Helsinki and is a member of the Immersive Automation team.