The automation of news has been in the pipeline for decades. However, it still humans that are writing the news. Why is that so?
The most apparent challenge for automation is language. Algorithms are already able to conjugate words successfully. However, the subtle nuances of human language do not conjugate in conditional sentences of “if A, then B”. This limitation makes the language stiff and in the long run rather monotonous. An even bigger problem is content. Algorithms are producing numeric and highly structured result data from companies, sports and elections. This enables news about these subjects to be successfully automated, including in Finland and in Finnish. However, a real scoop is all about new, unexpected, and hard-earned information. A pre-coded algorithm cannot get the grip of such issues.
Thirdly, the hesitation of media companies and software developers is hindering the development. “One would imagine that there is a lot going on in the industry,” says news automation researcher Carl-Gustav Lindén, “but with a few exceptions, there really isn’t”. Technology itself is not a foreign issue in the field of editing, as the newsrooms are full of it. However, the talk of “robotic journalists” has frightened human publishers, although “there is no sign that development in automation would have reduced journalists’ work,” Lindén recalls. “We should rather see this development as a step forward in the co-operation between journalists and technology.”
It is certain that “robots” will not be writing analytical and engaging stories in a matter of years or even decades. It is also certain that the collaboration between people and software will be developed. A few Finnish editorial offices are already locating potential news topics from public protocols with the help of algorithms. Why could the same software not also compile background material, reveal hidden correlations between distant variables, produce copies and make different versions of ready texts for diverse distribution channels?
Given the speed at which technology is developing, predicting a hundred years in the future feels quite ridiculous. We are still going to need to select and communicate news a hundred years from now. Perhaps we’ll be transmitting news to human consciousness directly, without the use of verbal language, which we will think of as a useless bottleneck in the process.
Nearhood was a platform for hyperlocal journalism. While some scholars and analysts say media consumers are increasingly interested in hyperlocal news, why did Nearhood not become a success story?
During the Immersive Automation project’s second workshop for media partners, senior lecturer John Grönvall from Arcada University of Applied Sciences used a case study as a means to explain why good ideas do not always lead to successful results.
“Nearhood was a research project on the sharing economy in Helsinki, and it combined interesting open data with a multiplatform web service,” Grönvall explains.
By the sharing economy Grönvall refers to underutilised resources and peer-to-peer networks enabled by different platforms. While Uber and Airbnb represent global services built on an idea of a sharing economy, sites such as huuto.fi, events like Saunapäivä, and Facebook-groups such as Haaga kiertoon, are also examples of sharing economies.
“The idea is to facilitate peer-to-peer interaction and transaction, and to shift the communication flow from one-to-may towards many-to-many. The role and power of the middleman is reduced through technology.”
The platform never succeeded in attracting a critical mass, which meant that there were no network effects.
Nearhood aimed to combine hyperlocal social media and news aggregation for neighbourhoods in Helsinki. The application was supposed to work as a bulletin or notice board containing information from local businesses, residents, and municipalities.
“So, when I found an abandoned sofa close to my house, I could post a photo on Nearhood to let people in my area know that there was a piece of furniture up for grabs. While hyperlocal content was still largely unorganised in individual blogs or Facebook groups, everyone agreed that it had a huge potential. Now the metropolitan cities have begun to open up this data,” Grönvall says and mentions the Helsinki Region Infoshare (HRI) as an example.
The code and the database still exist, but the platform is no-longer available to the public.
HRI is a web service for open data sources in the cities of Helsinki, Espoo, Vantaa, and Kauniainen. The data can be used in research and development activities, decision-making, visualization, data journalism, as well as in the development of apps. Citizens, businesses, research facilities, and other actors can freely use the data at no cost.
Despite the promising starting point, Nearhood did not become a huge success.
“The platform never succeeded in attracting a critical mass, which meant that there were no network effects. It also failed to engage developers due to the mediocre user experiences, and the complexity of the system.”
Like many other similar platforms and apps, the dominance of Facebook was too hefty to compete against.
“The megaplatform, in this case Facebook, was too dominant to compete against.”
The code and the database still exist, but the platform is no-longer available to the public.
Grönvall conducted research interviews with the Nearhood executives. Although Nearhood never became a great success story, it still provided some valuable insights.
“One of the executives urged the developers to focus on one feature and do that well. That simplicity is the key thing. They also recommended focusing on developing an app instead of launching a platform on the web.”
As Finland moves towards a more human centric data management, the importance of dynamic consent increases.
The Immersive Automation project arranged its second workshop for media partners and journalists in September. Senior Adviser Taru Rastas from the Ministry of Transport and Communications gave a presentation of her take on data in media, and how the Ministry is working to increase citizens’ right to decide and monitor their personal information.
“Promoting digital business is a part of the government agenda. There are five key government projects, and big data and MyData are one of those,” Rastas explains.
The point is to encourage organizations holding personal data to give individuals control over this data, extending beyond their minimum legal requirements to do so.
The project includes for example data sharing practices, and data protection in digital business. MyData, in turn, refers to a new approach to personal data management, where the aim is to provide individuals with the practical means to access, obtain, and use datasets containing their personal information. This personal information might be for example medical records, financial information, or traffic data derived from various online services. The point is to encourage organizations holding personal data to give individuals control over this data, extending beyond their minimum legal requirements to do so.
“In human centric data management the individual is seen as both a connector and controller. This means that data sets concerning the individual can be connected, and the individual decides who can use and how they can use this data.”
In other words, individuals are empowered actors in the management of their personal lives both online and offline.
An essential requirement in order to carry out a more human centric management of data is that the access to personal data must be easy. This is why the MyData approach incorporates the ‘Open Data’ movement philosophy that providing access to information in a free and transparent format increases its usefulness and value. Open Data is technically and legally free for anyone to use, reuse, and distribute. Similarly, data collected about a person will meet the criterion of MyData if it is technically and legally available for the individual to use, reuse, and distribute as they wish.
“MyData can as such be used to create new services which help individuals to manage their lives. The providers of these services can then create new business models, and economic growth to the society.”
The goal is to build trust in personal data services through transparency, interchangeability, public governance, respectable companies, public awareness, and secure technology. This is why the idea of a dynamic consent is so important.
“Consent management is the primary mechanism for permitting and enforcing the legal use of data. Via MyData accounts individuals can instruct the services to fetch and process data in accordance with consents that the individual has granted to data services.”
In May, the Immersive Automation project met up with Magnus Aabech from the Norwegian news agency NTB. Aabech is involved in NTB’s election bot project and talked to us about the experiences he has had so far with news automation. After the parliamentary elections in Norway, Aabech also shared some of NTB’s experiences with automated election reporting.
The Norwegian news agency (NTB) has previously focused on automating sports news, mainly football, but during the parliamentary elections in September 2017, the agency also used a simple bot in order to produce partially automated election news. The agency had a channel on Slack, which alerted the reporters when some type of changes occurred in the results. The bot would also produce a news text about the changes. The responsible reporter would then determine whether the changes were worth publishing.
“Just like the Immersive Automation project, we also work with text templates. However, our system is not really using NLG,” Aabech explains.
“We have received a lot of attention within the industry, which is always a plus”
“This could surely have been done in a shorter format, but my colleague and I are not that experienced in writing code,” Aabech says and laughs a little.
The election bot was created in-house, and the work involved three or four employees. One of the reporters worked full time on the bot for two months, while the others were involved part-time in addition to their regular tasks.
Aabech describes the experience as a positive one.
“The election bot provided us with plenty of valuable information, and it also illustrated to a lot of the NTB’s employees what can actually be done by news automation. Although the bot was a success in a lot of ways, we still experienced some technical difficulties in the beginning. Fortunately, we managed to solve them.”
For the NTB the most important thing was to develop the skills of the employees as well as prepare for the Norwegian municipal elections in 2019.
“And, of course, we were happy about the quality of the texts and how well the bot worked. In addition, we have received a lot of attention within the industry, which is always a plus.”
“The election bot provided us with plenty of valuable information, and it also illustrated to a lot of the NTB’s employees what can actually be done by news automation.”
Developing systems for news automation is expensive, and as such Aabech was interested in knowing more about the ways in which our project has attempted to create a re-usable system of the Valtteri election bot.
“Projects like these can be expensive. Automation is also something that I work with on top of my regular tasks, so the progress is quite slow.”
We compared the structures, and what makes Valtteri so special, is that instead of just making one big black box, the IA-project has created a chain of smaller black boxes, and as such can alter each individual box instead of starting from scratch and building a completely new system.
While the IA-project aims at producing news, which can be published directly for the audience to read, the NTB decided to proofread the automatically produced election texts before they went out on the NTB’s newswire.
“This was a type of pilot project, but in the future we will send out the automatically produced texts directly. That is also what we do with our football texts,” Aabech says.
EDIT: In the original text we wrote that the NTB proof reads all of its automatically produced texts. This was only the case with the election reporting.
The spread of fake news and hateful content is one of the most debated topics right now. As machine learning techniques become more and more sophisticated, numerous fields have begun to utilise these techniques. In her PhD, text and data analyst Myriam Munezero has studied machine learning models that can detect antisocial behaviours. In this blogpost she explains the possibilities of natural language processing in violence prevention.
More than a billion people use Facebook daily, and the social media platform has become one of the most influential news businesses, with an incredible ability to mobilise people. Despite community standards and encouragements to tackle hateful content more efficiently, racist and hateful material still exist on the platform.
“The words we use, as well as our writing styles, can reveal information about our preferences, thoughts, emotions, and behaviours,” Myriam Munezero says.
Natural language processing techniques have been shown to be useful in identifying harmful behaviors, such as cyberbullying, harassment, extremism, and terrorism, in text
In her research conducted at the University of Eastern Finland, she and her research team developed machine learning models that can detect antisocial behaviours, such as hate speech and indications of violence, from texts. Historically, most attempts to address antisocial behaviour have been done from educational, social and psychological points of view. This new study has, however, demonstrated the potential of using natural language processing techniques to develop state-of-the-art solutions to combat antisocial behaviour in written communication.
“Natural language processing techniques have been shown to be useful in identifying similar harmful behaviors, such as cyberbullying, harassment, extremism, and terrorism in text, all with varying levels of accuracy. However, few research address the broader antisocial behavior, which is characterized by covert and overt hostility and intentional aggression toward others,” Munezero explains.
Munezero and her fellow researchers have created solutions that can be integrated in web forums or social media websites to automatically or semi-automatically detect potential incidences of antisocial behaviour. The high accuracy of these solutions allows for fast and reliable warnings and interventions to be made before the possible acts of violence are committed. In many instances, people who have committed school shootings for instance, have indicated their intentions online prior to action. By detecting these indications, future acts of violence could be prevented.
One of the great challenges in detecting antisocial behaviour is first defining what precisely counts as antisocial behaviour and then determining how to detect such phenomena. Thus, using an exploratory and interdisciplinary approach, Munezero’s study applied natural language processing techniques to identify, extract, and utilise the linguistic features, including emotional features, pertaining to antisocial behaviour.
The study investigated emotions and their role or presence in antisocial behaviour. Literature in the fields of psychology and cognitive science shows that emotions have a direct or indirect role in instigating antisocial behaviour. Thus, for the analysis of emotions in written language, the study created a novel resource for analysing emotions. This resource further contributes to subfields of natural language processing, such as emotion and sentiment analysis.
The study also created a novel corpus of antisocial behaviour texts, allowing for a deeper insight into and understanding of how antisocial behaviour is expressed in written language.
“Finding representative corpora to study harmful behaviours is usually difficult,” Munezero says.
As the results are encouraging, Munezero finds that further progress within this topic can be made with continued research on the relationships between natural language and societal concerns.
Myriam Munezero’s PhD was approved on April 12 at the University of Eastern Finland. She also appears in an article in the newspaper Karjalainen. Munezero currently works as a researcher at the faculty of Data Science at the University of Helsinki and is a member of the Immersive Automation team.
The other day we received some excellent questions from Peter Carson at Edinburgh Napier University. Carson is currently collecting empirical material for his dissertation, which focuses on the impact of artificial intelligence on journalism. We decided to publish our thoughts in this blog.
Could automated journalism lead to the deskilling of journalists and the loss of jobs?
Considering the skills associated with the current state of the art, what those systems are capable of today, we are not too concerned about this. All new technologies come with challenges and may require new skillsets, whilst other skills may become redundant. Automation may exert increased pressure on the sense-making skills of journalists, which might lead to a higher level of specialisation for reporters and the material they produce. At the same time, if the media industry is willing to invest in depth, automation can free up resources for increased quality of stories. The role of journalists will definitely change, but there are still such wide gaps in the level of sophistication between human journalists and machines, that journalists will be needed for quite some time. In the optimistic scenario journalists will focus on the specific areas and topics that machines are not capable of, while text generators will take on the more mundane routine tasks.
If AI (artificial intelligence) becomes a dominating force in journalism production, is there a potential that journalists will only be essential in an editorial capacity? Algorithms do currently not operate in a vacuum, they need creators and managers to function properly, so these positions will be required. The timeframe for a full-on automation of creative material is most likely longer than one would expect, and it will most likely take years before we see newsrooms only consisting of editors. We suggest looking into Dörr (2015) and Van Dalen’s (2012) respective work for more thoughts on the possible changes to the professional role of journalists.
Are there dangers in having AI-algorithms curate news on social media feeds without some sort of overseeing regulatory body?
This depends on the skills of machine learning. Algorithms are currently as biased as their creators, meaning that just as people make mistakes and false judgements, automation can make the same errors. The example of Microsoft’s Tay shows that we need to carefully think through the steps, and learn from them as we go along. It is important to keep in mind that social media algorithms learn from our behaviours, meaning that we impact on their actions.
Do we need more agreement on the ethics of data collection, when mining data on unaware individuals, to use for advertising or newsgathering purposes? Yes. We estimate that this will be a big topic for discussion within international bodies during the coming years. We think all media needs to be overseen by an ethics committee or suchlike, and computational journalism or algorithmic text generators are no different from traditional media outlets. However, we need to invest resources into this area, since the production volumes and potential capacity of algorithms can be extremely difficult to monitor due to their immense quantities.
Are personalised news feeds curated by AI drawing people into “bubbles” of information, that shield them from new or challenging views? The debate on whether this is happening or not, or if it actually is a new phenomenon or not, is currently taking place. While we might be able to sense that we are in a bubble, we can, however, actively try to impact that bubble by choosing to “teach” our social media algorithms that we also want to see other stuff.
With the advent of “fake news”, is AI the answer to upholding values of truth in journalism and preventing the spread of misinformation? Fake news is nothing new, nonetheless it is beneficial for our society that there is an ongoing debate on the impact of the information forming our opinions. Using algorithm-driven intelligence for locating and filtering false information is definitely something that could be beneficial. This, however, simultaneously poses questions of how that would be governed, and who would have the right to make those decisions.
Do we need to establish rules about transparency and accountability when articles are written by algorithms? Yes, ethical rules and guidelines are always beneficial. At the same time, it is not as clear if a top-down approach is the best one, as the field evolves rapidly.
How do we prevent hidden biases in AI-generated news stories? That is the million-dollar question, as we cannot even do that in traditional human-created news. Are humans even able to be unbiased? There are examples of how algorithms actually help us expose our inherent biases, as in the case of Google Image search preferred showing images of male CEOs.
Do we need to ensure that new journalism students have a degree of code literacy? Based on discussions and interviews with data scientists and data journalists, we think the best results are achieved when journalists collaborate with the people who develop the technological aspects of automation (programmers/software engineers) instead of journalists trying to be programmers or programmers trying to be journalists. However, what is essential is that journalists adapt a computational way of thinking in their professional role, in order to better understand the possibilities and added value that algorithms and computation can have in the editorial offices or newsrooms. At the same time, a literacy of the foundations of journalistic norms and values is also relevant for the people who work with the technological aspects of the media industry.
Peter Carson’s questions were discussed and answered by project lead Carl-Gustav Lindén, who is a journalism and media researcher, journalist and PhD-student Stefanie Sirén-Heikel, who focuses on journalistic verification, newsroom innovations, and media management, and research assistant and journalist Laura Klingberg.
After several months of intensive work, the Immersive Automation team is now ready to present its first prototype, Valtteri the election bot. Just in time for the municipal elections in Finland, Valtteri writes short pieces of news based on the election results. In this blogpost, you can learn more about how and why Valtteri was created.
As the Immersive Automation project studies the automation of editorial processes, it was necessary for us to create a prototype, which could illustrate the difficulties of automation and guide us along our journey towards future news ecosystems. Data analyst Leo Leppänen specialises in language technology, and he is the brains behind Valtteri. Over the past couple of months, he has programmed and developed Valtteri with the assistance of the other researchers in the IA-team.
“This is our first prototype and the point is to manually create a system which can illustrate where machine learning could be most useful and profitable,” he says.
Valtteri utilises data from the Finnish Ministry of Justice and combines the data with templates created by the research team.
“This probably sounds very simple and easy, but for a computer this includes some major challenges. The computer does not know what useful and interesting information is, and the amount of data is massive. The human brain possesses vast amounts of information, whereas the computer has no other knowledge than the things we have taught it,“ he further explains.
After the municipal elections, the research team will gather feedback and analyse the user experiences.
“The next step will be to find all that essential knowledge humans have and transfer it into Valtteri. Our main challenge is that a computer is a slow learner and needs plenty of examples to learn from,” Leppänen says.
He also points out that this is an experiment and a first prototype, and thus a fairly simple system.
You can find Valtteri the newsbot here. The bot works in Finnish, Swedish, and English.
Until Monday April 10, when the latest election results become available, Valtteri practices newswriting with old data from the 2012 municipal elections.
Did you try Valtteri? We are curious to know what you think!
Tweet us @vaalibotti or send us a message info @ immersiveautomation.com
On March 20 and 21 the Immersive Automation project arranged its first training for journalists. During two intensive days of lectures and discussions the participants were given an introduction to computational thinking and the applications of it in a newsroom setting. Among the guest lecturers was assistant professor Nick Diakopoulos from the University of Maryland, who is a leading scholar in algorithmic accountability and social computing in the news.
Nick Diakopoulos defines computational thinking as a praxis about data, modelling, simulation, and programming into journalistic norms, goals, and epistemology.
“Essentially it’s about finding and telling news stories, with, by, or about algorithms,” he says.
He is very clear about the fact that computational thinking does not mean that we should think like computers.
“Instead it’s about thinking in a way so that we can use our computers in the best way possible to solve a problem.”
And why do we need computational thinking in news automation?
“Because computational thinkers will be more effective at exploiting the capabilities of automation.”
He compares an automated writing pipeline to the process of baking a cake; you have the algorithms and the parameters. The algorithm is like the recipe for the cake and the parameters are the ingredients, which can be altered and changed according to our wishes and needs.
“We have the basic recipe and if we for example want to make a vegan version of the cake, we just simply substitute a few of the ingredients.”
Computational thinking does not mean that we should think like computers.
Diakopoulos also led a workshop on bots, as they can be excellent at serving niche audiences and the costs of creating a bot are low. Some of the partnering media houses are also working on bots in their newsrooms and as such the topic was very current for the participants.
The participants also got to meet Valtteri, an election news bot, and the first prototype of the Immersive Automation project. The bot is currently training with data from the 2012 municipal elections in Finland so it will be ready for action on April 9. Data analysts Myriam Munezero and Leo Leppänen, who are members of the IA-research team, are the brains behind Valtteri. The rest of the research team has contributed to the creation process by considering news angles, writing templates, and analysing the linguistic capabilities of the bot.
In the most recent years, automated generation of news content has arrived in the editorial offices. Some people like to talk about ‘robot journalism’. While automation has conquered plenty of industries, the media appears to have fallen behind. If a robot is capable of performing surgery on human bodies, why could it not assist journalists in the newsrooms as well? This a comparison media and journalism researcher Carl-Gustav Lindén often makes during his lectures on the topic. In this blog post he will present a few key arguments as to why our newsrooms could benefit from automation.
Perhaps we should not call these systems robot journalists at all, as they do not include mechanical parts. In fact, they consist of a snippet of code or an algorithm creating news stories from structured – often numeric – data. The data might for instance originate in sensors detecting seismic activity, or somebody reporting sports results from a local football game.
While journalists are busy working with their more complex editorial tasks, a text generator can produce huge amounts of shorter texts for a wider audience.
“In the case of routine news of low value, I think journalists need to consider how we can reduce the amount of human labour by using smart machines that generate and distribute texts. This could enable journalists to concentrate on unique complicated stories that provide the most value to the audience, engaging content that people are willing to pay for,” Carl-Gustav Lindén says.
Automation in the newsrooms has existed for decades. Software has edited, managed, and distributed content.
Introducing news automation in an editorial office does not mean that journalists, or the human involvement, will be erased from journalism. Algorithms and NLG-systems are not black boxes, they are created by humans and therefore journalists need to get involved.
According to Lindén it is a matter of crucial editorial decisions on what machines should do, and algorithmic authority and accountability is not a minor issue. However, new technology is something journalists are used to so this should not be a problem.
“Automation in the newsrooms has existed for decades. Software has helped journalists with editing, managing, and distributing content. Think about Photoshop or complex CMS-systems. If you walk into a television studio you will find automation everywhere. This is only the next step.”
The automation of news production seems to fit in the media industry, where the commercial pressures and higher profit expectations have heavily increased over the past few years. News automation can also investigate areas, which we previously have not been able to cover. Essentially this means that algorithms and text generators can work alongside journalists and perform tasks that humans are incapable of doing.
“I see so many applications that I do not even know where to start. One very exciting opportunity is to use sensor data monitoring human activity, in say traffic or other movements of people,” Lindén says.
Carl-Gustav Lindén’s newest article Decades of Automation in the Newsroom was recently published in Digital Journalism.
Quotations in journalistic texts are regarded as word-for-word recollections of what an interviewee has stated. However, there are very little research on actual quoting practices. This is why journalist and scholar Lauri Haapanen decided to focus on quoting in his PhD. In this blog post he will reflect upon how NLG-systems could benefit from knowledge of how journalists actually quote.
When a reader enjoys a story in a magazine, they have no way of knowing how an interview between a journalist and a source was conducted. Even quotations – which are widely considered being verbatim repetitions of what has been said in the interview – might be very accurate, but they might as well be heavily modified, or even partially trumped-up.
A text generator could write a story and a journalist could interview sources and add citations in suitable places.
“For journalists, and their editors, the most important thing is of course to produce a good piece of writing. This means they might be forced to make compromises, since the citations must serve a purpose for the story,” Lauri Haapanen explains.
The Immersive Automation-project focuses on news automation and algorithmically produced news. Since human-written journalistic texts often contain quotations, automated content should also include them to meet the traditional expectations of readers.
In the development process of news automation, it is realistic to expect human journalists and machines to collaborate. “A text generator could write a story and a journalist could interview sources and add quotations in suitable places,” says Haapanen.
At a later stage, when the algorithms that create texts become more sophisticated, Haapanen suggests software developers also include criteria regarding the selection, positioning, and text modification of quotes.
This is where Haapanen’s research within journalistic quoting practices could be useful. In his dissertation he categorised nine essential quoting strategies used by journalists when writing articles.
Based on empirical data, Haapanen found that when journalists extract selected stretches from the interview discourse, they aim at (1) constructing the persona of the interviewee, (2) disclaiming the responsibility for the content, and/or (3) adding plausibility to the article.
As such, machines should be able to mine these kinds of segments from the source data available.
When journalists then position the selected stretches into the emerging article, they aim at (4) constructing the narration and (5) pacing the structure. When journalists modify the linguistic form and meaning of the selected stretches, they aim at (6) standardising the linguistic form, although they occasionally (7) allow some vernacular aspects that serve a particular purpose in the storyline. Furthermore, journalists aim at (8) clarifying the original message and (9) sharpening the function of the quotation.
Within the scope of the Immersive Automation project we look at how these nine quoting practices can be incorporated in automated news generation. “After all, computers must learn to ‘think’ like human journalists in the process of quoting,” Haapanen says.
Lauri Haapanen defends his thesis at the University of Helsinki on Saturday March 11. He also appeared on YLE’s radio program Julkinen sana on Wednesday March 8. He has written a blog post forThe Media Industry Research Foundation of Finland, the advocacy organisation for the Finnish media industry, and appears in an article in Suomen Lehdistö.