<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>journalistic angles</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1007/978-3-030-34885-4\_35</article-id>
      <title-group>
        <article-title>Challenges and Opportunities for Journalistic Knowledge Platforms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marc Gallofré Ocaña</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas L. Opdahl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bergen</institution>
          ,
          <addr-line>Fosswinckelsgt. 6, Postboks 7802, 5020 Bergen</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>26</volume>
      <fpage>1</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>Journalism is under pressure from loss of advertisement and revenues, while experiencing an increase in digital consumption and user demands for quality journalism and trusted sources. Journalistic Knowledge Platforms (JKPs) are an emerging generation of platforms which combine state-of-the-art artificial intelligence (AI) techniques such as knowledge graphs, linked open data (LOD), and natural-language processing (NLP) for transforming newsrooms and leveraging information technologies to increase the quality and lower the cost of news production. In order to drive research and design better JKPs that allow journalists to get most benefits out of them, we need to understand what challenges and opportunities JKPs are facing. This paper presents an overview of the main challenges and opportunities involved in JKPs which have been manually extracted from literature with the support of natural language processing and understanding techniques. These challenges and opportunities are organised in: stakeholders, information, functionalities, components, techniques and other aspects.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Newsroom</kwd>
        <kwd>Knowledge Graph</kwd>
        <kwd>Digitalization</kwd>
        <kwd>Overview</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>and transforming newsrooms, leveraging information
technologies to increase the quality and lower the cost
Journalism is under pressure from loss of advertise- of news production. In order to drive research and
ment and revenues, in combination with competing design JKPs that allow journalists to get most
beneonline distribution channels that stream free content, fits out of them and support newsrooms with better
while experiencing an increase in digital consump- solutions, we need to understand the challenges and
tion and readers who demand quality journalism opportunities that JKPs present for both users and
and trusted sources [1]. Information is no longer developers. To do so, we have reviewed the research
consumed from a single newspaper. Instead, readers literature in light of our own experience with
develhave access to and can contrast fresh and first-hand oping News Hunter [3, 4, 5], a series of JKP prototypes
information sources available on the internet and in collaboration with a developer of newsroom tools
social media at any time. for the international market.</p>
      <p>News organisations are constantly adapting their This paper presents a synthesis of the challenges
business models to digital media innovations, to and opportunities for journalistic knowledge
platimprove information quality, competitiveness and forms that we have found in the literature, hopefully
growth [2]. Journalistic Knowledge Platforms (JKPs) describing the most central factors that are
drivare an emerging type of platform that combines ing development of JKPs today. These factors have
state-of-the-art artificial intelligence (AI) techniques been grouped into six categories: stakeholders,
insuch as knowledge graphs and natural-language pro- formation, functionalities, components, techniques
cessing (NLP); and exploit news and social media and other aspects. We conclude that JKPs ofer
information over the net in real-time, using linked many opportunities for efective production of
highopen data (LOD), encyclopaedic sources and news quality journalism, real-time information, enriched
archives to construct knowledge graphs and provide background information, and multilingual and
crossfresh and unexpected information to journalists, help- platform solutions for monitoring worldwide
muling them to dive deeply into information, events and timedia output, by ofering solutions to problems
story-lines. JKPs are increasingly driving innovation such as language independence, complex newsrooms
workflows, and disperse information. Central
challenges include leveraging pre-news information from
social media and multimedia sources, precise
semantic lifting and enrichment of texts, scaling semantic
technologies to big data, and detecting and reasoning
over events.</p>
      <p>Proceedings of the CIKM 2020 Workshops,
October 19-20, Galway, Ireland.
email: Marc.Gallofre@uib.no (M. Gallofré Ocaña);
Andreas.Opdahl@uib.no (A.L. Opdahl)
orcid: 0000-0001-7637-3303 (M. Gallofré Ocaña);
0000-0002-3141-1385 (A.L. Opdahl)</p>
      <p>© 2020 Copyright for this paper by its authors. Use permitted under Creative
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmUmoRns WLiceonrsekAsthtriobuptioPnr4o.0cIneteerdnaitniognasl ((CCC EBYU4R.0)-.WS.org)</p>
      <p>This paper is organised as follows: Section 2
summarises the methodology used for screening the
challenges and opportunities. Section 3 briefly
reviews the research literature. Section 4 explains the
coding process. Sections 5 to 10 synthesise the main
challenges and opportunities for each factor
respectively — stakeholders, information, functionalities,
components, techniques and other aspects.</p>
      <sec id="sec-1-1">
        <title>After a broad survey of the literature, we selected</title>
        <p>eleven papers describing describing five research
projects related to JKPs as the starting point of our
review: NEWS [20, 21], EventRegistry [22],
NewsReader [23, 24, 25], SUMMA [26, 27, 28, 29] and
ASRAEL [30].</p>
        <p>
          NEWS is a project, in collaboration with the Spanish
Agencia EFE and the Italian ANSA news agencies, that
2. Method makes use of semantic technologies to improve news
agencies’ workflows, productiveness and revenues by
Our research method consists of four steps: Firstly, focusing on the annotation, intelligent information
rewe selected the most relevant research papers that we trieval and user interface aspects [21]. EventRegistry
have identified in our previous studies on JKPs archi- is focused on collecting news articles, identifying and
tectures and news angles [
          <xref ref-type="bibr" rid="ref24">4, 6, 7, 8, 9, 10, 11, 12</xref>
          ]. From extracting information about events, and
summaristhese selected papers we manually extracted claims, ing and visualising them [22]. NewsReader extracts
i.e., sentences that express potential challenges or op- information about what, who, where, when from
mulportunities. tilingual news articles and represents events in time
        </p>
        <p>
          Secondly, a purposive sampling was conducted in- using RDF in a knowledge graph, allowing users to
dependently by two expert coders (the authors). The find networks of actors along time [25]. SUMMA
colcoders generated multiple codes for each extracted laborates with BBC Monitoring and Deutsche Welle
claim and the codes were cleaned with the support of to develop a multilingual and multimedia platform
usNLP and NLU techniques (i.e., Damerau-Levenshtein ing state-of-the-art NLP techniques to monitor
interdistance [13], word2vec [
          <xref ref-type="bibr" rid="ref6">14</xref>
          ], and Wordnet [15])1. nal and external media work and provide data
jourFrom the resulting cleaned codes, we selected the nalism services [27]. ASRAEL aggregates news
artimost representative ones as preliminary codes and cles and leverages the Wikidata knowledge base to
dedivided them into categories. scribe and cluster news events and provides
informa
        </p>
        <p>Thirdly, based on the preliminary codes, claims tion retrieval tools to interact with the resulting news
were independently coded once again by both au- representations [30].
thors. This time, the coders were allowed to code each
claim with multiple codes for each category. The
coding agreement was estimated using Gwet’s AC1 [19] 4. Coding process
inter-rater reliability coeficient with nominal ratings.</p>
        <p>Because coders were allowed to not to code, to com- In the purposive sampling step, we extracted 322
pute the Gwet’s AC1, empty codes were not treated as claims from the related literature and marked them
missing values, instead, they were treated as if they up using 406 codes. After cleaning and tidying up
where coded as “undefined”. Hence, to compute the the initial codes, we identified six top-level categories
contingency tables for multiple codes we applied the which we divided into 62 sub-categories to be used
following rule: the agreement between coders A and for preliminary coding. The following six top-level
B only happens between correctly matching codes categories were used:
(A∩B) and the other codes (A△B) were matched with • Stakeholder: the agent that the challenge or
opmissing values and treated as disagreements. portunity is for. The agent can be either a
tech</p>
        <p>Finally, when both coders agreed on the final codes nical agent or social agent.
for each claim, challenges and opportunities were
extracted from each claim following the assigned codes.
• Information: the information needed to meet
the challenge or exploit the opportunity.</p>
        <p>1Implemented in python with support of Scikit-learn [16],
NLTK [17], SpaCy [18] and other libraries.
• Functionality: the service or functionality that
the platform should ofer to meet a challenge or
exploit an opportunity.
• Component: the part of a platform that must be
• Technique: the IT solution used to meet the</p>
        <p>challenge or exploit the opportunity.</p>
        <p>We computed the inter-rater agreement for the
preliminary coding with the AC1 coeficient for each
category: 0.77 for Stakeholders, 0.65 for Components,
0.71 for Techniques, 0.71 for Aspects, 0.72 for
Information types and 0.57 for Functionalities. The
average AC1 is 0.69 with a standard deviation of
0.063, which according to Landis-Koch and Altman’s
benchmark scales, express an acceptable agreement
among coders [19]. Finally, the assigned codes were
discussed between and agreed on by the two coders.</p>
        <p>• Other aspects: another type of concern that
the challenge or opportunity involves, such
as customer heterogeneity, performance or
maintenance.
created or improved to meet the challenge or ex- own newsrooms, the British government, and other
ploit the opportunity. subscribers” [27, p. 1]; and the organisations that
are responsible for controlling news media standards,
vocabulary and ontologies (e.g., IPTC organisation2),
which are indirectly influencing JKPs because the
work of many news agencies and JKPs depends on
those standards, as in the NEWS project where “most
of the NewsCodes defined by IPTC do not have
alternative versions in diferent languages, only in
English” [20, p. 9].</p>
        <p>Finally, the technical agent, which is a stakeholder
that represents the JKPs and any system or
technical infrastructure in newsrooms that support or
interact with JKPs. A particular subtype of technical agent
are the external systems that communicate with
newsroom systems, like the information systems of
potential customers [20].</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>6. Information</title>
      <sec id="sec-2-1">
        <title>JKPs cover the whole information pipeline from gath</title>
        <p>5. Stakeholders ering information and news creation to knowledge
exploitation and distribution. Our study identified the
Stakeholders are agents that represent the forces and following sub-categories of information to be
considinterests that drive the future of JKPs. The identified ered in JKPs: news content, textual data, multimedia
sub-categories of stakeholders are: general user, news data, data format, metadata, LOD, events and
inforprofessional, fact checker, archivist, ICT professional, mation needs.
audience, customer, researcher, news agency, public News agencies produce both textual and
multiorganisation and technical agent. media news content which have to be managed and</p>
        <p>General users interact with services provided by the distributed to their customers and audience [21, 20].
JKPs or newsrooms. These can be divided between the As textual data we consider the raw text in form of
internal users that belong to newsrooms and the ex- news articles, documents, markup files, PDF, web
ternals ones. The internal users are news professionals pages, biographies, history and geopolitical data of
like journalists who use JKPs for creating histo- countries, reports, social media feeds and social blogs.
ries [20]; fact checkers who conduct an essential task Whereas, as multimedia we consider live
broadin combating with fake news and misinformation [28]; cast, spoken content, photographs, audio and video.
archivists who maintain up-to-date the ontology and Besides, news agencies produce contents in diferent
news archives [20]; and ICT professionals and knowl- formats like plain text, Information Interchange Model
edge engineers who represent those users involved (IIM), News Industry Text Format (NITF), NewsML
in the development and maintenance of JKPs [21]. and RDF [20].</p>
        <p>Whereas, the external users are the audience [22]; the Metadata is used to annotate and manage the
procustomers to whom new agencies ofer services; and duced content. Metadata can describe e.g., author,
researchers who investigate JKPs or analyse data, as in language, creation timestamp, location, keywords,
the SUMMA project where “[political scientists want] category, provenance, priority, urgency, status,
upto perform data analyses based on large amounts of dates, rights, interest, description or media type. JKPs
news reports” [27, p. 2]. use Linked Open Data (LOD) to annotate and enrich</p>
        <p>The organisations influencing the JKPs are: the content using semantic vocabularies and leveraging
news agencies, including newsrooms; the public organ- knowledge bases, as in the ASRAEL project where
isations which are those governmental agencies that they “leverage the Wikidata knowledge base to
prointeract with or consume services from newsrooms’ duce semantic annotations of news articles” [30, p.
JKPs, as in the SUMMA project which “provides 1].
media monitoring and analysis services to the BBC</p>
        <p>News agencies create stories describing events and p. 1].
deliver them to their customers and audience [21], Knowledge discovery is one of the most
attracmaking the events the central information need. De- tive functionalities of JKPs. Knowledge discovery
spite that, social stakeholders have other information allows users to obtain news insights, analysis and
needs: General users are interested in knowing who, relevant information, like in NewsReader where it
what, with whom, where and when events took place, “increases the user understanding of the domain,
networks of timeline actors implications, find the facilitates the reconstruction of news story lines, and
events of a certain type or in a certain place, obtain enables users to perform exploratory investigation
facts and retrieve evidence [24]. News professionals of news hidden facts” [24, p. 1]. Other interesting
need access to news agencies’ archives and knowledge functionalities among JKPs are: trends used to
disbases for documentation purposes, find connections cover emerging topics, long-term developments and
from past events, follow histories and identify emerg- changes in events over time [22, 25]; alerts to keep
ing topics [20, 23, 27]. While customers have diferent users up-to-date with the last incoming items [26];
information needs mainly depending on their busi- summarisation of news histories and events to provide
ness or interests, e.g., “the press cabinet of a company additional insights [22]; clustering of story-lines and
is usually interested in news items talking about the events [27]; and personalisation of both the JKPs and
company or its rivals, whereas a sports TV channel its functionalities according to users’ preferences and
is interested mostly in news items describing sports profiles [21].
events” [20]. JKPs provide functionalities to news agencies and
newsrooms organisation and workflows. JKPs are
used as business support systems to manage internal
7. Functionalities newsrooms output; monitor what is being broadcast,
produced and covered [27]; overcome limitations in
newsrooms’ workflows; and improve productivity
and revenues [20]. Another functionality provided
by the JKPs is the content management which allows
news agencies to produce, store, organise, manage,
maintain and distribute the content and metadata
produced every day [20].</p>
        <p>JKPs provide diferent functionalities to their users.</p>
        <p>We identified twelve main sub-categories of
functionality: news creation, verification, source selection,
monitoring, knowledge discovery, trends, alert,
summarisation, clustering, personalisation, business
support and content management.</p>
        <p>News professionals use the JKPs for the news
creation process. JKPs guide journalists in writing up
their stories, support them with contextual back- 8. Components
ground knowledge for those stories [21], provide
means for comparing current events with other simi- JKPs rely on diferent components to fulfil its
functionlar events [30], and facilitate access to previous work alities and support users. We split JKP components
for creating similar content for a diferent audience, into five sub-categories: input, processing, storage,
inregion or language [27]. JKPs also support news teraction and output.
professions with verification tasks like fact checking, As input, we consider the diferent sources of
conprovenance [24], rights and authorship manage- tent and information used in JKPs that are relevant for
ment [20, 21], which are typically time-consuming stakeholders. The textual and multimedia sources are
tasks for news professions as explained in “manual sources of interest. However, not all analysed projects
verification of claims is a tedious task, that consumes a treat the information in the same way or use the same
lot of time and efort from journalists and professional information types, like ASRAEL which only uses the
fact-checkers” [28, p. 1]. title and first paragraph to represent the events [30];</p>
        <p>Source selection and monitoring functionalities are and not all contents receive the same interest by news
two common functionalities across the studied JKPs, professionals, as in SUMMA which considers
“enterwhich harvest and store content from internal and tainment programming such as movies and sitcoms,
external sources and monitor them in real-time. By commercial breaks, and repetitions of content (e.g., on
doing this, JKPs relieve journalists from these time- 24/7 news channels) [...] of limited interest to
moniconsuming tasks, as it was happening in the BBC toring operations” [27, p. 1].
where “each of its ca. 300 journalist monitors up to The processing components cover tasks from
harfour live broadcasts in parallel, plus several other vesting and annotating input sources to processing
information sources such as social media feeds” [27, and lifting them, following an ETL process (i.e.,
Extract, Transform, Load). Input sources are harvested pull components, news agencies ofer interfaces to
acusing diferent components, each with a specific pur- cess, browse and query their repositories [20].
pose: harvesting, translating, filtering and transcribing.</p>
        <p>A common characteristic of the analysed projects is
that source selection and monitoring functionalities 9. Techniques
are conducted in real-time by harvesting
information sources [22, 23, 27]. The harvested content is Techniques used in JKPs can be grouped in eight
then translated [27] and filtered according with the sub-categories: semantic technology, fact extraction,
diferent stakeholders’ interests and needs. Spoken conceptual model, reasoning, network analysis, event
content is transcribed [27] and images are textually analysis, NLP and training.
described [21]. Semantic technology is used to support
functional</p>
        <p>JKPs use specific components to automatically an- ities like knowledge discovery, news creation,
verifinotate the harvested content with metadata to support cation, clustering, trends, and content management.
functionalities like business support, content manage- Semantic technologies support knowledge
discovment and personalisation [20]. The annotated content ery by providing means for lifting news items, and
is typically processed by diferent components which disambiguating, enriching and leveraging them with
are organised in an NLP pipeline. The NLP pipeline information from external knowledge bases [21, 25] –
processes the content through state-of-the-art NLP processes carried by the lifting, ontology and
knowland NLU modules to perform linguistic tasks [25, 24]. edge base components; news creation, by providing
These tasks are focused on capturing and extracting systems and vocabulary to automatically annotate
the diferent information types described in section 6. news in annotation components [21]; and
verificaBoth the results of the NLP pipeline and the annotated tion, by combining semantic technologies with the
content are disambiguated and represented semanti- lifting and knowledge base components and linking
cally using lifting components. The lifting component factual claims to its sources and external knowledge
links the semantic representation of news items bases [24, 27]. Semantic technologies and semantic
to a knowledge base, for examples an RDF-based representation techniques facilitate clustering news
knowledge graph [25], and enriches the semantic items and events [30], and detecting trends and story
interpretations with facts from external knowledge lines [24]. Moreover, semantic technologies provide
bases, for example from the LOD cloud [24, 30]. shared semantic resources and formats which are</p>
        <p>The JKP storage infrastructure is normally composed used to support content management and facilitate
of an archive, a knowledge base and an ontology. The conceptual interoperability [25].
archive stores news articles, biographies, reports [25] Fact extraction techniques extract facts from news
and other textual and multimedia items; the knowl- items and link them to facts in external knowledge
edge base is where the lifted semantic representations bases (e.g., Wikidata, Wikipedia). These techniques
of news items are stored and enriched with external are used to provide functionalities like verification and
information [24]; and the ontology is used to represent knowledge discovery [27] and are common features of
the structure of the news items, leveraged information. lifting, knowledge base and query components.
metadata and vocabulary [20]. Conceptual models provide vocabularies and
ontolo</p>
        <p>JKP users interact with the previous components gies which are used in conjunction with semantic
techmainly using three types of interaction components: nologies to support and standardise functionalities like
front-ends, tools and query engines. JKPs provide content management and personalisation. Ontologies
front-end components [21] to allow stakeholders to can be used for defining user interests and preferences
access the system functionalities; tools which ofer based on the provided vocabulary or as shared
modfeatures to journalists when creating news articles or els [20]. Conceptual models are applied in
distributo general users when interacting with the system, tion, lifting, annotation, ontology, query, knowledge
like money converters or dictionaries [20]; and query base and source components.
engines that allow users to query, analyse or visualise Both conceptual models and semantic technologies
the database through APIs [27]. facilitate the usage of other techniques like reasoning,</p>
        <p>News agencies use two types of distribution com- network analysis and event analysis. These techniques
ponents for delivering content to their audience and support functionalities like knowledge discovery,
cluscustomers [20]: push and pull. Push components of- tering and trends, and are applied in the lifting,
knowlfer interfaces where information consumers can select edge base, ontology and annotation components.
Reaand subscribe to streams of news [20], whereas with soning techniques abstract and infer new knowledge
from news items, events and temporal aspects [24, 25]. to the information relevance that customers expect
Network analysis is used to find networks of actors and from news agencies [21, 20]. Moreover, because the
organisation implications through diferent events and dificulty of manually monitoring and finding related
time [24]. Event analysis is applied to detect, identify articles from other news providers, the audience,
and annotate the events described in news [21, 20]. customers and news professions can get biased or</p>
        <p>The above techniques are supported by NLP tasks incomplete information [22].
like named entity detection, role detection, topic de- Customers are heterogeneous, they have diferent
intection, temporal expression normalisation, temporal formation needs and use diferent systems to interact
relation detection, factual claims extraction, natural with news agencies [20].
language understanding [25, 29, 27]. These NLP tasks, According to our study, JKPs deal with big data
among others, are also used in JKPs’ functionalities requirements like volume, velocity, variety: The
ASsuch as knowledge discovery, content management, RAEL project estimated that “the number of collected
summarising, verification, trends, clustering, query, articles ranges between 100.000 and 200.000 articles
lifting and annotation. In order to obtain optimal re- per day” and collected “news articles from around
sults from the NLP tasks, diferent training techniques 75.000 news sources” [22, p. 1]. NewsReader used an
have to be used over extensive news corpus [30]. archive that “contains billions of articles, biographies,
and reports” [25, p. 1]. The SUMMA platform “[was]
able to ingest 400 TV streams simultaneously” [27, p.
10. Other aspects 6].</p>
      </sec>
      <sec id="sec-2-2">
        <title>Other information aspects that JKPs deal with are</title>
        <p>Stakeholders, information, functionalities, compo- the multilingual and timeliness data aspects.
Infornents and techniques are influenced or afected by mation and news production are created in multiple
additional concerns of various types. We organised languages (e.g., Catalan, Norwegian, Spanish,
Enthese other aspects into the following sub-categories: glish, Italian, French, Portuguese and Chinese) and
standards, proprietary, human factors, customers het- need to be translated, transcribed and delivered to
erogeneity, big data, multilingual, timeliness, quality, customers and audiences in their languages of
prefersoftware architecture, performance, maintenance, and ence [20, 27, 25, 30]. The timeliness aspect refers to the
legacy. temporal aspect of events, thus news professionals,</p>
        <p>Before moving into JKPs, news agencies used audience and customers want to receive the
informatheir terms, categories and vocabularies to describe tion as soon as it is generated [21] and reconstruct
their items. Yet, the interoperability between news story-lines or histories over time [24, 27].
agencies and customers was dificult. The usage of Quality of the results and outputs of JKPs are
standards like like IPTC news codes and media topics, summarised in “news agencies are required to
prosemantic vocabularies, NAF and RDF improved the vide fresh, relevant, high-quality information to their
interoperability between news agencies and other customers” [21, p. 1] and ignoring these quality
stakeholders [20]. requirements can imply economic losses for
cus</p>
        <p>JKPs keep track of proprietary news information like tomers [20].
authorship, copyrights and sources [21, 20] as a part Aspects concerning technical agents and their
of the content management functionalities. Property components include the software architecture,
perforinformation is used as metadata in annotation compo- mance, maintenance and relation of JKPs with other
nents and provides provenance and reliability infor- systems. The software architecture of JKPs should
mation [24, p. 4]. consider scalability to deal with big data
require</p>
        <p>There are diferent human factors influencing JKPs ments [21, 24, 27], distribution to run its components
and stakeholders. Before JKPs, news professionals and systems over multiple machines [20, 26],
comwere performing many processes by hand like news ponents independence so they can be used for other
tagging, verification tasks, fact searching, finding purposes [26], interoperability between components
related articles, and source monitoring. Performing and systems [20, 25], and performance for reducing
these tasks manually is time-consuming, error-prone, the processing and distributing time of information
consumes a lot of eforts, and reduces the amount and live feeds [21, 24]. Manual maintenance is a
and precision of the added metadata [21, 20, 28, 22]. time-consuming and error-prone task [20] which
Therefore, customers have to manually filter irrele- is automated with JKPs to keep the JKP and
onvant content received from news agencies, creating tology up-to-date [26]. As JKPs communicate with
an information overload problem which is contrary customers systems, legacy components and other
newsroom systems, JKPs need to be designed to fa- On the other hand, providing one-size-fits-all JKP
cilitate the integration with other technologies and solutions for all possible stakeholders is challenging,
systems [20, 26]. because of their diversity and difering information
needs. Newsworthy information comes from diverse
news sources like pre-news information from
so11. Conclusion cial media or multimedia sources such as TV news
programs. Leveraging these information sources is
JKPs are a new type of platforms which ofer many a complex task which requires new techniques to
opportunities for newsrooms and journalists by com- distinguish potentially newsworthy information from
bining AI techniques such as knowledge graphs, LOD non-relevant content and extract information from
and NLP to improve and facilitate the production multimedia items like images or videos.
Summarisof high-quality journalism. We collected challenges ing and presenting news-related information in JKPs
and opportunities that JKPs present and organised like background information, events in time or actor
them into six categories that we assume are important networks to users with diferent information needs
for the evolution of JKPs (stakeholders, information, and skills is not a trivial task. JKPs consist of diferent
functionalities, components, techniques and other components which interact together and with
exteraspects). nal components that need to be integrated in JKPs</p>
        <p>JKPs ofer new opportunities for consuming and systems. Extracting precise semantic representations
interacting with news by providing enriched content of and reasoning over relations and time remain open
from external sources like Wikipedia or Wikidata research questions. JKPs deal with big data, but some
to stakeholders seeking relevant information, such semantic technologies, reasoning and AI techniques
as news professionals and general audiences. News are not yet ready for it. Among the reviewed JKPs,
texts are enriched with additional information about, the most common challenges are problems such
e.g., involved actors, places and organisations, the as language independence, multiple news channels,
connections with other news and related events. In- complex newsrooms workflows, dispersed and diverse
formation and data sources in JKPs are no longer split information, lack of facts, and integration with legacy
along dispersed and disconnected repositories as it and customer systems.
happens in traditional solutions. Instead, the infor- After reviewing the literature, we have realised that
mation pieces are connected by the knowledge graph. there is not a clear definition and agreement about
JKPs enhance functionalities like news creation and what constitutes an event. The event concept is used
content management. News creation is improved with in diferent ways in the literature, from a handshake
background information providing journalists with between two actors to bigger events like the Spanish
better information for their stories. Automatic meta- Civil War or events in between such as a trial process.
data annotation and the usage of standards like IPTC In this study, we have only reviewed five JKP-related
relieve archivists from manually annotating news and research projects, although they are the five most
cenimprove the content management capabilities of JKPs tral ones we have found. Hence, we may have omitted
and newsroom workflows. Knowledge graphs in JKPs important issues that were not represented or brought
bring new forms of representing news-related content up in these projects. We are therefore planning to
exand exploiting it. Techniques like network analysis, tend the number of considered projects through a
sysevent analysis and reasoning improve the background tematic literature review and contrast and expand our
information and knowledge discovery in JKPs while ifndings with published works on data and digital
jouropening new research questions for researchers. JKPs nalism. A logical continuation of this expanded study
can use standards such as RDF, IPTC’s media topics is the formal identification and modelling of goals,
reand semantic vocabularies which simplify the interop- quirements and use cases for JKPs, which we did not
erability and understanding between news agencies ifnd yet in the literature. Furthermore, we plan to
forand stakeholders. The most highlighted opportunities malise a reference framework for JKPs and continue
that have been identified in the literature include the development of our JKP identified to validate and
event detection and analysis over time, real-time and integrate our findings.
up-to-date trustworthy information, access to
enriched background information for supporting news
creation, multilingual and multimedia cross-platform
solutions, and tools for monitoring worldwide media
output and internal newsrooms production.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>mantic web technologies into news agencies</article-title>
          , in: [29]
          <string-name>
            <given-names>P.</given-names>
            <surname>Paikens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Barzdins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mendes</surname>
          </string-name>
          , D. C. Fer-
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>The</given-names>
            <surname>Semantic</surname>
          </string-name>
          <string-name>
            <surname>Web - ISWC</surname>
          </string-name>
          <year>2006</year>
          ,
          <year>2006</year>
          , pp.
          <fpage>778</fpage>
          -
          <lpage>reira</lpage>
          , S. Broscheit,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Almeida</surname>
          </string-name>
          , S. Miranda,
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          791. doi:
          <volume>10</volume>
          .1007/11926078\_56.
          <string-name>
            <given-names>D.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Balage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Martins</surname>
          </string-name>
          , Summa [22]
          <string-name>
            <given-names>G.</given-names>
            <surname>Leban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fortuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Brank</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Grobelnik, at tac knowledge base population task 2016</article-title>
          , in:
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>from news</article-title>
          ,
          <source>in: Proceedings of the 23rd In- ence (TAC)</source>
          ,
          <year>2016</year>
          . URL: https://tac.nist.gov/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>ternational Conference on World Wide Web, publications/2016/participant.papers/</mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>WWW '14 Companion, Association for Comput- TAC2016.summa.proceedings.pdf.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>ing Machinery</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>107</fpage>
          --
          <lpage>110</lpage>
          . doi:
          <volume>10</volume>
          .1145/ [30]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ehrhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ferret</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Teyssou</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          2567948.2577024.
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tannier</surname>
          </string-name>
          , Searching news articles us[23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kattenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Beloki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Soroa</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          <article-title>Artola, ing an event knowledge graph leveraged by wiki-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Fokkens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Huygen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Verstoep</surname>
          </string-name>
          ,
          <article-title>Two archi- data</article-title>
          ,
          <source>in: Companion Proceedings of The 2019</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>tectures for parallel processing for huge amounts World Wide Web Conference</article-title>
          , WWW '19, Associ-
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>of text</article-title>
          ,
          <source>in: Proceedings of Language Resources ation for Computing Machinery</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1232</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>and Evaluation</given-names>
            <surname>Conference</surname>
          </string-name>
          (LREC),
          <source>European -1239. doi:10.1145/3308560</source>
          .3316761.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Language Resources Association</surname>
          </string-name>
          (ELRA),
          <year>2016</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          pp.
          <fpage>4513</fpage>
          --
          <lpage>4519</lpage>
          . URL: https://www.aclweb.org/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          anthology/L16-1714. [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rospocher</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van Erp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vossen</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Fokkens,
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Semantics</surname>
          </string-name>
          37-
          <fpage>38</fpage>
          (
          <year>2016</year>
          )
          <fpage>132</fpage>
          -
          <lpage>151</lpage>
          . doi:
          <volume>10</volume>
          .1016/
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          j.websem.
          <year>2015</year>
          .
          <volume>12</volume>
          .004. [25]
          <string-name>
            <given-names>P.</given-names>
            <surname>Vossen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agerri</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Aldabe</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Cybulska,
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>tems</surname>
          </string-name>
          ,
          <source>Elsevier</source>
          <volume>110</volume>
          (
          <year>2016</year>
          )
          <fpage>60</fpage>
          -
          <lpage>85</lpage>
          . doi:
          <volume>10</volume>
          .1016/
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          j.knosys.
          <year>2016</year>
          .
          <volume>07</volume>
          .013. [26]
          <string-name>
            <given-names>U.</given-names>
            <surname>Germann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Liepins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gosko</surname>
          </string-name>
          , G. Barzdins,
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>tion for Computational Linguistics</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>47</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          51. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W18</fpage>
          -2508. [27]
          <string-name>
            <given-names>U.</given-names>
            <surname>Germann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Liepins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Barzdins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gosko</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>ACL</source>
          <year>2018</year>
          ,
          <string-name>
            <surname>System</surname>
            <given-names>Demonstrations</given-names>
          </string-name>
          , Association
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>for Computational</surname>
            <given-names>Linguistics</given-names>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>99</fpage>
          -
          <lpage>104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <source>doi:10</source>
          .18653/v1/
          <fpage>P18</fpage>
          -4017. [28]
          <string-name>
            <given-names>S. a.</given-names>
            <surname>Miranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Vla-
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>in: The World Wide Web Conference</source>
          , WWW '
          <volume>19</volume>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Association for Computing Machinery</surname>
          </string-name>
          ,
          <year>2019</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          3579--
          <fpage>3583</fpage>
          . doi:
          <volume>10</volume>
          .1145/3308558.3314135.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>