Quotations in journalistic texts are regarded as word-for-word recollections of what an interviewee has stated. However, there are very little research on actual quoting practices. This is why journalist and scholar Lauri Haapanen decided to focus on quoting in his PhD. In this blog post he will reflect upon how NLG-systems could benefit from knowledge of how journalists actually quote.
When a reader enjoys a story in a magazine, they have no way of knowing how an interview between a journalist and a source was conducted. Even quotations – which are widely considered being verbatim repetitions of what has been said in the interview – might be very accurate, but they might as well be heavily modified, or even partially trumped-up.
A text generator could write a story and a journalist could interview sources and add citations in suitable places.
“For journalists, and their editors, the most important thing is of course to produce a good piece of writing. This means they might be forced to make compromises, since the citations must serve a purpose for the story,” Lauri Haapanen explains.
The Immersive Automation-project focuses on news automation and algorithmically produced news. Since human-written journalistic texts often contain quotations, automated content should also include them to meet the traditional expectations of readers.
In the development process of news automation, it is realistic to expect human journalists and machines to collaborate.
“A text generator could write a story and a journalist could interview sources and add quotations in suitable places,” says Haapanen.
At a later stage, when the algorithms that create texts become more sophisticated, Haapanen suggests software developers also include criteria regarding the selection, positioning, and text modification of quotes.
This is where Haapanen’s research within journalistic quoting practices could be useful. In his dissertation he categorised nine essential quoting strategies used by journalists when writing articles.
Based on empirical data, Haapanen found that when journalists extract selected stretches from the interview discourse, they aim at (1) constructing the persona of the interviewee, (2) disclaiming the responsibility for the content, and/or (3) adding plausibility to the article.
As such, machines should be able to mine these kinds of segments from the source data available.
When journalists then position the selected stretches into the emerging article, they aim at (4) constructing the narration and (5) pacing the structure. When journalists modify the linguistic form and meaning of the selected stretches, they aim at (6) standardising the linguistic form, although they occasionally (7) allow some vernacular aspects that serve a particular purpose in the storyline. Furthermore, journalists aim at (8) clarifying the original message and (9) sharpening the function of the quotation.
Within the scope of the Immersive Automation project we look at how these nine quoting practices can be incorporated in automated news generation.
“After all, computers must learn to ‘think’ like human journalists in the process of quoting,” Haapanen says.
Lauri Haapanen defends his thesis at the University of Helsinki on Saturday March 11. He also appeared on YLE’s radio program Julkinen sana on Wednesday March 8. He has written a blog post for The Media Industry Research Foundation of Finland, the advocacy organisation for the Finnish media industry, and appears in an article in Suomen Lehdistö.