<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Bright Future of News Automation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Carl-Gustav Linden</string-name>
          <email>carl-gustav.linden@helsinki.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Swedish School of Social Science, University of Helsinki and Sdertrn University</institution>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Sometimes, bad metaphors can cause trouble. The idea that robot journalists
are coming to take the jobs in newsrooms are getting traction [10] [8]. A picture
search produces illustrations of robots banging away at keyboards. However, it
should not come as a surprise to anyone that there are no robots moving around.
Instead, we are talking about new software introduced in the newsroom, a
process that already has been going on for decades [13]. That is the background
for this paper with the aim to present some experiences of news automation,
identify some important obstacles and explain why the future is still bright.
News automation can mean many things, but here we refer to Natural Language
Generation (NLG) and automated generation of articles, from numbers to text.
That is a much smaller eld within language technologies compared to Natural
Language Processing (NLP), text to numbers. For an overview of NLG, see [17]
and speci cally for research into news automation [17] [12] [11]. The media
industry has just taken the rst small steps on a journey towards a more advanced
software environment with systems that will help journalists perform tasks they
cannot manage too much data - and work they do not want to do. Focusing
the discussion on the metaphor robot journalist is detrimental to this positive
development. We should stop talking about robots when it comes to making
stories on news produced by software. Most of the work, writing text templates,
is pure handicraft where journalists very much are in charge [14]. If there were
any mechanical parts involved, the metaphor should not be a humanoid, but the
non-intrusive and useful washing machine. Just like sorting clothes according to
fabric and colour, journalists clean { select and sort { the data, dump it in the
washing machine, choose the right programme and push the start button. The
washing machine metaphor does not evoke imagined existential threats to jobs.
Another word of caution is needed. Journalists, researchers and service providers
should also stop calling NLG news automation systems smart or AI for now.
There is no intelligence here because these systems can only say what happened
but have no clue when it comes to the why something happened [21]. Automated
systems can report a gure, but they cannot yet say what it means; on their own,
computer-generated stories contain no context, no analysis of trends, anomalies,
and deeper forces at work. Machine learning or neural networks, for instance,
are still far away from newsroom reality.</p>
      <p>Copyright held by the authors. NOBIDS 2018</p>
    </sec>
    <sec id="sec-2">
      <title>Historical background</title>
      <p>
        Already the Surrealist movement experimented with automated drawing and
writing, automatism. However, news automation as we know it was actually for
the rst time conceptualised in a debut novel from 1965, The Tin Men by Michael
Frayn [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The satire is about the lives of workers at the ctitious William Morris
Institute for Automation Research, a British think tank trying to develop robots
that are supposed to relieve the employees from work. It is lled with bizarre
characters and schemes. In the book, Dr. Goldwasser is working on the
automation of newspapers and has invented this ctive language technology UHL (Unit
Headline Language) for that purpose. He had collected multi-purpose
monosyllables used by headline-writers, such as fear, ban, dash, strike, and fed them
into a computer. UHL takes standard headline words and creates headlines from
them. These newspaper headlines are completely meaningless. Examples such
as "Strike Threat Probe" and "Lab Row Looms" produce sentences - Headline
Pidgin - that everybody recognises but nobody can explain the meaning of. Dr.
Goldwasser invented three di erent ways of creating headlines with UHL. By
adding one unit at random to the formula each day, or cumulatively or entirely
at random you get new headlines. A decade later, in the late 1970s, Tale-Spin,
a conceptual NLG program that writes stories, was introduced 10. In the mid
1980s, German researchers created a system designed to produce German
newspaper stories about labor market developments, with the somewhat unfortunate
name Semtex [18]. And the FoG weather-report generator, the rst-ever
deployed operational NLG system, goes back to the early 1990s [6]. Frayns novel
was published more than 50 years ago and to people who have studied the topic
and been involved in R&amp;D, in the darkest moments it seems that there is not
much development beyound the properties of UHL. It is not that di cult to
generate sentences and texts that at rst sight seems to work quite well but
after adding complexity to the system, that is, more data and correlations
creates unsolvable problems. Looking at the state of the art in news automation,
the company that comes rst into mind is Narrative Science, created as the
student project StatsMonkey at Northwestern University in Chicago 2008-2009
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], patented and commercialised by two professors of computer science, Kris
Hammond and Larry Birnbaum. It was recently selected the most innovative
company in Chicago. Narrative science started by providing sports texts, but
the company quite soon realised that news media is not the best customer, so
they are now serving other business areas, mainly in the nancial sector. This
seems to be quite common in the NLG business and should make the media
sector very worried. Ehud Reiter, a NLG founding father and professor in Aberdeen,
also involved as chief scientist in the British leading NLG company Arria, made
a small study and found out that the most important sector for commercial
NLG is in business reporting. This is perhaps because (a) there is a lot of money
in nance, and (b) data and use cases are similar enough to allow systems to
be replicated. [16] News media is not even mentioned in the analysis. Against
this backdrop, it might seem bold to claim that the future of news automation
is bright. One critical issue is the willingness of journalists to be involved in
creating NLG systems and their own perceived roles, especially when it comes
to professional ethics. However, this discussion can be considered an extension
of the already existing online challenge on professionalism [20]. So why should
journalists get involved and in what way can journalists add value? These ve
points are the main reasons: 1) truth and accuracy, 2) independence, integrity,
3) fairness and impartiality, 4) humanity, and 5) accountability. In short,
journalists need to make sure that the outcomes of NLG development are ethically,
morally and socially tolerable. We can probably think of many other reasons,
such as the brutally obvious probability that engineers and their managers will
do this anyway if journalists do not want to be involved.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>The ethical dimension</title>
      <p>For some practical guidance, let us turn to the Society of Professional Journalists
in the United States and their SPJ Code of Ethics. There we nd ve central
rules where ethical dimensions of news automation are most the obvious. Here
is a brief explanation on how to translate ethics into preferred action.
{ Journalists should take responsibility for the information they provide,
regardless of medium This means that journalists should be willing to get
involved in how news automation is designed, where the data comes from,
how it is relevant from a journalistic point of view and what kind of
decisions algorithms make. There might be some obstacles here, such that
newsrooms are short of journalists trained in computational thinking and
able to work with programmers. Further, these NLG applications are often
software systems bought from a service provider and practically black boxes
to journalists.
{ Identify sources clearly. The public is entitled to as much information as
possible to judge the reliability and motivations of sources The media should
be accountable and transparent with where the data comes from and how it is
processed. Journalists are supposed to explain ethical choices and processes
to audiences. Is it possible or necessary to start a dialogue with the public
about journalistic practices, coverage and news content when it comes to
news automation? Is the public actually that interested?
{ Provide access to source material when it is relevant and appropriate Here
there might be a problem that data sources or the software is proprietary
and owned by an external commercial company. Media companies might also
be afraid to open up processes too much since inputs to algorithms could be
gamed by, for instance, public relations experts.
{ Avoid stereotyping. Journalists should examine the ways their values and
experiences may shape their reporting. We need to ask how values are built
into these systems and what are the biases in how news is personalised,
for instance gender, race or religion. There is an understanding among
media companies that certain types of biases should be avoided, but no clear
consensus or written down rules.
{ Acknowledge mistakes and correct them promptly and prominently. Explain
corrections and clari cations carefully and clearly. Journalists should be able
to explain errors that come from faulty data or wrong assumptions built into
to news automation systems. Explainable machines in algorithmic
decisionmaking has a lot of appeal [19]. The news agency Associated Press has
actually solved this problem by built in features that allows the engineers
to backtrack decisions made by the system and check where it went wrong.
These features are not shared with journalists in the AP newsroom. In The
EU General Data Protection Regulation GDPR it is said that owners of
automated decision systems should be able to explain how the system works,
they cannot have black boxes. What this means in practice is a bit unclear
to everyone as we are all eagerly waiting for cases to end up in court.</p>
      <p>Transparency and accountability are two key features of responsible
professional journalism. Stuart Myles, a manager at the US news agency AP, has
proposed [15] a way to discuss transparency that might be useful. The model
consists of four layers:
{ The minimum type of algorithmic news transparency is disclosure and by
that, Myles means making it known that algorithms have been used in
creating or making decisions about news items.
{ A step up from disclosure would be justi cation. A justi cation aims to
show that the results of the algorithmic news are reasonable in a particular
instance.
{ An explanation is a more comprehensive form of transparency than a justi
cation. It indicates why a particular decision, categorization or arrangement
of news was selected and not some other.
{ By reproduction Myles means providing su cient information that the
results of a news algorithm could be independently replicated. News
organizations sometimes provide the underlying data and algorithms, which they
used in a particular report, with the goal of making their use of news
algorithms transparent. This feature comes quite close to the concept of open
source.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Why work with journalists?</title>
      <p>
        Few computer or data scientists have a history of working with journalists. Here
are some general stereotypes of their professional identity: Journalists generally
do not think in structures, they are preoccupied with stories and narratives,
they are creative with words and hate repetition, which can become a hindrance
in mass production of texts, but also believe they are bad at math. Journalists
usually want to be in charge when working with other professional groups and
they often tend to be di cult people, nay-sayers with a preference for asking
critical questions 3. Actually, media work is a form of a ective labour, which is
passionate: extreme emotions are part of making the work meaningful [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. At the
same time, journalism and news media is risk-averse due to the nature of their
industry that traditionally have been characterised by deadline thinking, getting
the news delivered on a tight schedule. For computer scientists, maybe the most
problematic feature is the inability of journalists to explain central concepts in
their profession, such as what is news [7]. But in NLG development, journalists
need to work in teams with computer scientists for several reasons: their inability
to handle large diversity in news input or output with data sources of high
dimensionality or high degrees of personalisation. They also aim for maximum
impact and do not want to serve micro audiences with news, which is possible at
small marginal cost with NLG. How can these two groups work together? Yes,
de nitely, and media business scholar Lucy Kng who has studied Silicon Valley
modes of organisations and culture and compared that to legacy media think [9]
they have at least three things in common: 1) Journalists and engineers love the
clarity of thought, whether it is as code or as language, 2) they have a strong
commitment to the craft, and 3) they share high intrinsic motivation, a sense of
purpose.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>
        So, at last, what is the bright future of news automation? Our own research
group Immersive Automation has written an industry report for the
newspaper publishers association WAN-IFRA (forthcoming 2019). One of the chapters
deals with the future and besides our own knowledge, we asked experts on their
views. This is the result of the query. The future of news automation will be
about two parallel processes: 1) Decomposition, or deconstruction, of the
fundamental principles of journalism 2) Breaking down journalistic work into the
actual information artefacts and micro processes The most important question
to ask here is What can be automated and what are inherently human tasks?
Further, access to open data of high quality is a key issue. Without
interesting and dynamic data feeding the NLG systems to produce updated versions of
texts with new information and new angles depending on the story nding
process. There will also be more exible NLG systems, which makes it easier and
cheaper to develop versions for chat bots or talking/listening machines. With
the implementation of updated mobile networks, namely 5G, we will see new
opportunities for creating and distributing immersive and meaningful content
text, video, sound, social signals based on personal interests, time, location
and activity. Early examples of this are already developed, such as the German
Ambient News project funded by the Google News Initative [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. There will be
many more opportunities for story discovery with automated analysis of large
datasets used as raw material for interesting stories: rst draft, alarms, updates.
As always, development comes down to money and it is reason to worry about
the slow development in the eld. Basic use of news automation, mainly
creating stories about football, stock prices and real estate, is growing quickly, but
more sophisticated features are not there. Besides nancial resources, another
problem is the general lack of an innovation culture in media. Unfortunately,
doing product development and process innovation with money from Google is
not a substitute for serious research and development [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] even though Googles
representative claim, We are all in this together [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Background</title>
      <p>This paper is adapted from a talk that Adjunct Professor (Docent) Carl-Gustav
Linden, University of Helsinki (Swedish School of Social Science) and A liated
Lecturer at the Sdertrn University, gave at the Norwegian Big Data
Symposium 2018. There he shared his experiences of introducing news automation
to journalists. He was the manager of the R&amp;D project Immersive
Automation (www.immersiveautomation.com). The primary aim with the project was
to create a roadmap and a demonstration of a future news ecosystem based on
automated storytelling and intense audience engagement paired with the belief
that stories powered by data and machine learning will lead to a dramatically
more personal and customized news experience with localisation of content as a
key feature.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgement</title>
      <p>This paper is supported by European Unions Horizon 2020 research and
innovation programme under grant agreement No 825153, project EMBEDDIA
(Cross-Lingual Embeddings for Less-Represented Languages in European News
Media).
8</p>
    </sec>
    <sec id="sec-8">
      <title>Disclaimer clause</title>
      <p>"The results of this [publication] re ects only the author's view and the
Commission is not responsible for any use that may be made of the information it
contains".
6. Eli Goldberg. Fog: Synthesizing forecast text directly from weather maps. In</p>
      <p>Arti cial Intelligence for Applications, 1993. Proceedings., Ninth Conference on,
pages 156{162. IEEE, 1993.
7. Tony Harcup and Deirdre ONeill. What is news? news values revisited (again).</p>
      <p>Journalism Studies, 18(12):1470{1488, 2017.
8. Jaemin Jung, Haeyeop Song, Youngju Kim, Hyunsuk Im, and Sewook Oh. Intrusion
of software robots into journalism: The public's and journalists' perceptions of news
written by algorithms and human journalists. Computers in Human Behavior,
71:291{298, 2017.
9. L. Kung. Going digital how are legacy leaders transforming strategy, leadership</p>
      <p>and culture? Keynote at the Mediapiv conference, 2018.
10. Latar Noam Lemelshtrich. Robot Journalism: Can Human Journalism Survive?</p>
      <p>World Scienti c, 2018.
11. Leo Leppanen, Myriam Munezero, Mark Granroth-Wilding, and Hannu Toivonen.</p>
      <p>Data-driven news generation for automated journalism. In Proceedings of the 10th</p>
      <p>International Conference on Natural Language Generation, pages 188{197, 2017.
12. Leo Leppanen, Myriam Munezero, Stefanie Siren-Heikel, Mark Granroth-Wilding,
and Hannu Toivonen. Finding and expressing news from structured data. In
Proceedings of the 21st International Academic Mindtrek Conference, pages 174{
183. ACM, 2017.
13. Carl-Gustav Linden. Decades of automation in the newsroom: Why are there still
so many jobs in journalism? Digital Journalism, 5(2):123{140, 2017.
14. Carl-Gustav Linden. Robot journalism. http://datadrivenjournalism.net/news and analysis/
robot journalism the damage done by a metaphor, 2017. [Online; accessed
16</p>
      <p>February-2018].
15. S. Myles. How can we make algorithmic news more transparent? Paper presented
at the conference Algorithms, Automation, and News, 2018.
16. E. Reiter. Where is nlg most successful commercially?
https://ehudreiter.com/2018/10/30/most-successful-commercial-nlg, 2018.
[Online; accessed 9-November-2018].
17. Ehud Reiter and Robert Dale. Building natural language generation systems.
Cambridge university press, 2000.
18. Dietmar Rosner. The automated news agency: Semtexa text generator for german.</p>
      <p>In Natural Language Generation, pages 133{148. Springer, 1987.
19. Andrew D Selbst and Solon Barocas. The intuitive appeal of explainable machines.</p>
      <p>2018.
20. Jane B Singer. Who are these guys? the online challenge to the notion of journalistic
professionalism. Journalism, 4(2):139{163, 2003.
21. J. Stray. The age of the cyborg. https://www.cjr.org/analysis/cyborg virtual reality reuters tracer.php,
2016. [Online; accessed 9-November-2018].</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Nicholas D Allen</surname>
          </string-name>
          , John R Templon,
          <source>Patrick Summerhays McNally</source>
          ,
          <string-name>
            <given-names>Larry</given-names>
            <surname>Birnbaum</surname>
          </string-name>
          , and
          <string-name>
            <surname>Kristian J Hammond. Statsmonkey</surname>
          </string-name>
          :
          <article-title>A data-driven sports narrative writer</article-title>
          .
          <source>In AAAI Fall Symposium: Computational Models of Narrative</source>
          , volume
          <volume>2</volume>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Burkhardt</surname>
          </string-name>
          .
          <article-title>Ambient news lessons learned</article-title>
          . https://medium.com/datenfreunde/https-medium
          <article-title>-com-datenfreunde-ambientnews-lessons-learned-</article-title>
          <string-name>
            <surname>f190a48d8102</surname>
          </string-name>
          ,
          <year>2018</year>
          . [Online; accessed 9-December-2018].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Madhav</given-names>
            <surname>Chinnappa</surname>
          </string-name>
          .
          <article-title>We are all in this together</article-title>
          .
          <source>British Journalism Review</source>
          ,
          <volume>28</volume>
          (
          <issue>3</issue>
          ):
          <volume>50</volume>
          {
          <fpage>55</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Mark</given-names>
            <surname>Deuze</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mirjam</given-names>
            <surname>Prenger</surname>
          </string-name>
          . Making Media: production, Practices, and
          <string-name>
            <surname>Professions</surname>
          </string-name>
          . Amsterdam University Press,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M</given-names>
            <surname>FRAYN.</surname>
          </string-name>
          <article-title>The tin men (london: Collins)</article-title>
          . also pp.
          <fpage>191</fpage>
          -
          <lpage>194</lpage>
          in stanley cohen and
          <article-title>jock young (eds.) the manufacture of news</article-title>
          .
          <source>beverly hills</source>
          ,
          <year>1965</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>