1. Introduction

journalistic angles

10.1007/978-3-030-34885-4\_35

Challenges and Opportunities for Journalistic Knowledge Platforms

Marc Gallofré Ocaña

Andreas L. Opdahl

0 0 University of Bergen , Fosswinckelsgt. 6, Postboks 7802, 5020 Bergen , Norway

2020

26 1 17

Journalism is under pressure from loss of advertisement and revenues, while experiencing an increase in digital consumption and user demands for quality journalism and trusted sources. Journalistic Knowledge Platforms (JKPs) are an emerging generation of platforms which combine state-of-the-art artificial intelligence (AI) techniques such as knowledge graphs, linked open data (LOD), and natural-language processing (NLP) for transforming newsrooms and leveraging information technologies to increase the quality and lower the cost of news production. In order to drive research and design better JKPs that allow journalists to get most benefits out of them, we need to understand what challenges and opportunities JKPs are facing. This paper presents an overview of the main challenges and opportunities involved in JKPs which have been manually extracted from literature with the support of natural language processing and understanding techniques. These challenges and opportunities are organised in: stakeholders, information, functionalities, components, techniques and other aspects.

eol>Newsroom Knowledge Graph Digitalization Overview

1. Introduction

and transforming newsrooms, leveraging information technologies to increase the quality and lower the cost Journalism is under pressure from loss of advertise- of news production. In order to drive research and ment and revenues, in combination with competing design JKPs that allow journalists to get most beneonline distribution channels that stream free content, fits out of them and support newsrooms with better while experiencing an increase in digital consump- solutions, we need to understand the challenges and tion and readers who demand quality journalism opportunities that JKPs present for both users and and trusted sources [1]. Information is no longer developers. To do so, we have reviewed the research consumed from a single newspaper. Instead, readers literature in light of our own experience with develhave access to and can contrast fresh and first-hand oping News Hunter [3, 4, 5], a series of JKP prototypes information sources available on the internet and in collaboration with a developer of newsroom tools social media at any time. for the international market.

News organisations are constantly adapting their This paper presents a synthesis of the challenges business models to digital media innovations, to and opportunities for journalistic knowledge platimprove information quality, competitiveness and forms that we have found in the literature, hopefully growth [2]. Journalistic Knowledge Platforms (JKPs) describing the most central factors that are drivare an emerging type of platform that combines ing development of JKPs today. These factors have state-of-the-art artificial intelligence (AI) techniques been grouped into six categories: stakeholders, insuch as knowledge graphs and natural-language pro- formation, functionalities, components, techniques cessing (NLP); and exploit news and social media and other aspects. We conclude that JKPs ofer information over the net in real-time, using linked many opportunities for efective production of highopen data (LOD), encyclopaedic sources and news quality journalism, real-time information, enriched archives to construct knowledge graphs and provide background information, and multilingual and crossfresh and unexpected information to journalists, help- platform solutions for monitoring worldwide muling them to dive deeply into information, events and timedia output, by ofering solutions to problems story-lines. JKPs are increasingly driving innovation such as language independence, complex newsrooms workflows, and disperse information. Central challenges include leveraging pre-news information from social media and multimedia sources, precise semantic lifting and enrichment of texts, scaling semantic technologies to big data, and detecting and reasoning over events.

Proceedings of the CIKM 2020 Workshops, October 19-20, Galway, Ireland. email: Marc.Gallofre@uib.no (M. Gallofré Ocaña); Andreas.Opdahl@uib.no (A.L. Opdahl) orcid: 0000-0001-7637-3303 (M. Gallofré Ocaña); 0000-0002-3141-1385 (A.L. Opdahl)

© 2020 Copyright for this paper by its authors. Use permitted under Creative CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmUmoRns WLiceonrsekAsthtriobuptioPnr4o.0cIneteerdnaitniognasl ((CCC EBYU4R.0)-.WS.org)

This paper is organised as follows: Section 2 summarises the methodology used for screening the challenges and opportunities. Section 3 briefly reviews the research literature. Section 4 explains the coding process. Sections 5 to 10 synthesise the main challenges and opportunities for each factor respectively — stakeholders, information, functionalities, components, techniques and other aspects.

After a broad survey of the literature, we selected

eleven papers describing describing five research projects related to JKPs as the starting point of our review: NEWS [20, 21], EventRegistry [22], NewsReader [23, 24, 25], SUMMA [26, 27, 28, 29] and ASRAEL [30].

NEWS is a project, in collaboration with the Spanish Agencia EFE and the Italian ANSA news agencies, that 2. Method makes use of semantic technologies to improve news agencies’ workflows, productiveness and revenues by Our research method consists of four steps: Firstly, focusing on the annotation, intelligent information rewe selected the most relevant research papers that we trieval and user interface aspects [21]. EventRegistry have identified in our previous studies on JKPs archi- is focused on collecting news articles, identifying and tectures and news angles [ 4, 6, 7, 8, 9, 10, 11, 12 ]. From extracting information about events, and summaristhese selected papers we manually extracted claims, ing and visualising them [22]. NewsReader extracts i.e., sentences that express potential challenges or op- information about what, who, where, when from mulportunities. tilingual news articles and represents events in time

Secondly, a purposive sampling was conducted in- using RDF in a knowledge graph, allowing users to dependently by two expert coders (the authors). The find networks of actors along time [25]. SUMMA colcoders generated multiple codes for each extracted laborates with BBC Monitoring and Deutsche Welle claim and the codes were cleaned with the support of to develop a multilingual and multimedia platform usNLP and NLU techniques (i.e., Damerau-Levenshtein ing state-of-the-art NLP techniques to monitor interdistance [13], word2vec [ 14 ], and Wordnet [15])1. nal and external media work and provide data jourFrom the resulting cleaned codes, we selected the nalism services [27]. ASRAEL aggregates news artimost representative ones as preliminary codes and cles and leverages the Wikidata knowledge base to dedivided them into categories. scribe and cluster news events and provides informa

Thirdly, based on the preliminary codes, claims tion retrieval tools to interact with the resulting news were independently coded once again by both au- representations [30]. thors. This time, the coders were allowed to code each claim with multiple codes for each category. The coding agreement was estimated using Gwet’s AC1 [19] 4. Coding process inter-rater reliability coeficient with nominal ratings.

Because coders were allowed to not to code, to com- In the purposive sampling step, we extracted 322 pute the Gwet’s AC1, empty codes were not treated as claims from the related literature and marked them missing values, instead, they were treated as if they up using 406 codes. After cleaning and tidying up where coded as “undefined”. Hence, to compute the the initial codes, we identified six top-level categories contingency tables for multiple codes we applied the which we divided into 62 sub-categories to be used following rule: the agreement between coders A and for preliminary coding. The following six top-level B only happens between correctly matching codes categories were used: (A∩B) and the other codes (A△B) were matched with • Stakeholder: the agent that the challenge or opmissing values and treated as disagreements. portunity is for. The agent can be either a tech

Finally, when both coders agreed on the final codes nical agent or social agent. for each claim, challenges and opportunities were extracted from each claim following the assigned codes. • Information: the information needed to meet the challenge or exploit the opportunity.

1Implemented in python with support of Scikit-learn [16], NLTK [17], SpaCy [18] and other libraries. • Functionality: the service or functionality that the platform should ofer to meet a challenge or exploit an opportunity. • Component: the part of a platform that must be • Technique: the IT solution used to meet the

challenge or exploit the opportunity.

We computed the inter-rater agreement for the preliminary coding with the AC1 coeficient for each category: 0.77 for Stakeholders, 0.65 for Components, 0.71 for Techniques, 0.71 for Aspects, 0.72 for Information types and 0.57 for Functionalities. The average AC1 is 0.69 with a standard deviation of 0.063, which according to Landis-Koch and Altman’s benchmark scales, express an acceptable agreement among coders [19]. Finally, the assigned codes were discussed between and agreed on by the two coders.

• Other aspects: another type of concern that the challenge or opportunity involves, such as customer heterogeneity, performance or maintenance. created or improved to meet the challenge or ex- own newsrooms, the British government, and other ploit the opportunity. subscribers” [27, p. 1]; and the organisations that are responsible for controlling news media standards, vocabulary and ontologies (e.g., IPTC organisation2), which are indirectly influencing JKPs because the work of many news agencies and JKPs depends on those standards, as in the NEWS project where “most of the NewsCodes defined by IPTC do not have alternative versions in diferent languages, only in English” [20, p. 9].

Finally, the technical agent, which is a stakeholder that represents the JKPs and any system or technical infrastructure in newsrooms that support or interact with JKPs. A particular subtype of technical agent are the external systems that communicate with newsroom systems, like the information systems of potential customers [20].

6. Information JKPs cover the whole information pipeline from gath

5. Stakeholders ering information and news creation to knowledge exploitation and distribution. Our study identified the Stakeholders are agents that represent the forces and following sub-categories of information to be considinterests that drive the future of JKPs. The identified ered in JKPs: news content, textual data, multimedia sub-categories of stakeholders are: general user, news data, data format, metadata, LOD, events and inforprofessional, fact checker, archivist, ICT professional, mation needs. audience, customer, researcher, news agency, public News agencies produce both textual and multiorganisation and technical agent. media news content which have to be managed and

General users interact with services provided by the distributed to their customers and audience [21, 20]. JKPs or newsrooms. These can be divided between the As textual data we consider the raw text in form of internal users that belong to newsrooms and the ex- news articles, documents, markup files, PDF, web ternals ones. The internal users are news professionals pages, biographies, history and geopolitical data of like journalists who use JKPs for creating histo- countries, reports, social media feeds and social blogs. ries [20]; fact checkers who conduct an essential task Whereas, as multimedia we consider live broadin combating with fake news and misinformation [28]; cast, spoken content, photographs, audio and video. archivists who maintain up-to-date the ontology and Besides, news agencies produce contents in diferent news archives [20]; and ICT professionals and knowl- formats like plain text, Information Interchange Model edge engineers who represent those users involved (IIM), News Industry Text Format (NITF), NewsML in the development and maintenance of JKPs [21]. and RDF [20].

Whereas, the external users are the audience [22]; the Metadata is used to annotate and manage the procustomers to whom new agencies ofer services; and duced content. Metadata can describe e.g., author, researchers who investigate JKPs or analyse data, as in language, creation timestamp, location, keywords, the SUMMA project where “[political scientists want] category, provenance, priority, urgency, status, upto perform data analyses based on large amounts of dates, rights, interest, description or media type. JKPs news reports” [27, p. 2]. use Linked Open Data (LOD) to annotate and enrich

The organisations influencing the JKPs are: the content using semantic vocabularies and leveraging news agencies, including newsrooms; the public organ- knowledge bases, as in the ASRAEL project where isations which are those governmental agencies that they “leverage the Wikidata knowledge base to prointeract with or consume services from newsrooms’ duce semantic annotations of news articles” [30, p. JKPs, as in the SUMMA project which “provides 1]. media monitoring and analysis services to the BBC

News agencies create stories describing events and p. 1]. deliver them to their customers and audience [21], Knowledge discovery is one of the most attracmaking the events the central information need. De- tive functionalities of JKPs. Knowledge discovery spite that, social stakeholders have other information allows users to obtain news insights, analysis and needs: General users are interested in knowing who, relevant information, like in NewsReader where it what, with whom, where and when events took place, “increases the user understanding of the domain, networks of timeline actors implications, find the facilitates the reconstruction of news story lines, and events of a certain type or in a certain place, obtain enables users to perform exploratory investigation facts and retrieve evidence [24]. News professionals of news hidden facts” [24, p. 1]. Other interesting need access to news agencies’ archives and knowledge functionalities among JKPs are: trends used to disbases for documentation purposes, find connections cover emerging topics, long-term developments and from past events, follow histories and identify emerg- changes in events over time [22, 25]; alerts to keep ing topics [20, 23, 27]. While customers have diferent users up-to-date with the last incoming items [26]; information needs mainly depending on their busi- summarisation of news histories and events to provide ness or interests, e.g., “the press cabinet of a company additional insights [22]; clustering of story-lines and is usually interested in news items talking about the events [27]; and personalisation of both the JKPs and company or its rivals, whereas a sports TV channel its functionalities according to users’ preferences and is interested mostly in news items describing sports profiles [21]. events” [20]. JKPs provide functionalities to news agencies and newsrooms organisation and workflows. JKPs are used as business support systems to manage internal 7. Functionalities newsrooms output; monitor what is being broadcast, produced and covered [27]; overcome limitations in newsrooms’ workflows; and improve productivity and revenues [20]. Another functionality provided by the JKPs is the content management which allows news agencies to produce, store, organise, manage, maintain and distribute the content and metadata produced every day [20].

JKPs provide diferent functionalities to their users.

We identified twelve main sub-categories of functionality: news creation, verification, source selection, monitoring, knowledge discovery, trends, alert, summarisation, clustering, personalisation, business support and content management.

News professionals use the JKPs for the news creation process. JKPs guide journalists in writing up their stories, support them with contextual back- 8. Components ground knowledge for those stories [21], provide means for comparing current events with other simi- JKPs rely on diferent components to fulfil its functionlar events [30], and facilitate access to previous work alities and support users. We split JKP components for creating similar content for a diferent audience, into five sub-categories: input, processing, storage, inregion or language [27]. JKPs also support news teraction and output. professions with verification tasks like fact checking, As input, we consider the diferent sources of conprovenance [24], rights and authorship manage- tent and information used in JKPs that are relevant for ment [20, 21], which are typically time-consuming stakeholders. The textual and multimedia sources are tasks for news professions as explained in “manual sources of interest. However, not all analysed projects verification of claims is a tedious task, that consumes a treat the information in the same way or use the same lot of time and efort from journalists and professional information types, like ASRAEL which only uses the fact-checkers” [28, p. 1]. title and first paragraph to represent the events [30];

Source selection and monitoring functionalities are and not all contents receive the same interest by news two common functionalities across the studied JKPs, professionals, as in SUMMA which considers “enterwhich harvest and store content from internal and tainment programming such as movies and sitcoms, external sources and monitor them in real-time. By commercial breaks, and repetitions of content (e.g., on doing this, JKPs relieve journalists from these time- 24/7 news channels) [...] of limited interest to moniconsuming tasks, as it was happening in the BBC toring operations” [27, p. 1]. where “each of its ca. 300 journalist monitors up to The processing components cover tasks from harfour live broadcasts in parallel, plus several other vesting and annotating input sources to processing information sources such as social media feeds” [27, and lifting them, following an ETL process (i.e., Extract, Transform, Load). Input sources are harvested pull components, news agencies ofer interfaces to acusing diferent components, each with a specific pur- cess, browse and query their repositories [20]. pose: harvesting, translating, filtering and transcribing.

A common characteristic of the analysed projects is that source selection and monitoring functionalities 9. Techniques are conducted in real-time by harvesting information sources [22, 23, 27]. The harvested content is Techniques used in JKPs can be grouped in eight then translated [27] and filtered according with the sub-categories: semantic technology, fact extraction, diferent stakeholders’ interests and needs. Spoken conceptual model, reasoning, network analysis, event content is transcribed [27] and images are textually analysis, NLP and training. described [21]. Semantic technology is used to support functional

JKPs use specific components to automatically an- ities like knowledge discovery, news creation, verifinotate the harvested content with metadata to support cation, clustering, trends, and content management. functionalities like business support, content manage- Semantic technologies support knowledge discovment and personalisation [20]. The annotated content ery by providing means for lifting news items, and is typically processed by diferent components which disambiguating, enriching and leveraging them with are organised in an NLP pipeline. The NLP pipeline information from external knowledge bases [21, 25] – processes the content through state-of-the-art NLP processes carried by the lifting, ontology and knowland NLU modules to perform linguistic tasks [25, 24]. edge base components; news creation, by providing These tasks are focused on capturing and extracting systems and vocabulary to automatically annotate the diferent information types described in section 6. news in annotation components [21]; and verificaBoth the results of the NLP pipeline and the annotated tion, by combining semantic technologies with the content are disambiguated and represented semanti- lifting and knowledge base components and linking cally using lifting components. The lifting component factual claims to its sources and external knowledge links the semantic representation of news items bases [24, 27]. Semantic technologies and semantic to a knowledge base, for examples an RDF-based representation techniques facilitate clustering news knowledge graph [25], and enriches the semantic items and events [30], and detecting trends and story interpretations with facts from external knowledge lines [24]. Moreover, semantic technologies provide bases, for example from the LOD cloud [24, 30]. shared semantic resources and formats which are

The JKP storage infrastructure is normally composed used to support content management and facilitate of an archive, a knowledge base and an ontology. The conceptual interoperability [25]. archive stores news articles, biographies, reports [25] Fact extraction techniques extract facts from news and other textual and multimedia items; the knowl- items and link them to facts in external knowledge edge base is where the lifted semantic representations bases (e.g., Wikidata, Wikipedia). These techniques of news items are stored and enriched with external are used to provide functionalities like verification and information [24]; and the ontology is used to represent knowledge discovery [27] and are common features of the structure of the news items, leveraged information. lifting, knowledge base and query components. metadata and vocabulary [20]. Conceptual models provide vocabularies and ontolo

JKP users interact with the previous components gies which are used in conjunction with semantic techmainly using three types of interaction components: nologies to support and standardise functionalities like front-ends, tools and query engines. JKPs provide content management and personalisation. Ontologies front-end components [21] to allow stakeholders to can be used for defining user interests and preferences access the system functionalities; tools which ofer based on the provided vocabulary or as shared modfeatures to journalists when creating news articles or els [20]. Conceptual models are applied in distributo general users when interacting with the system, tion, lifting, annotation, ontology, query, knowledge like money converters or dictionaries [20]; and query base and source components. engines that allow users to query, analyse or visualise Both conceptual models and semantic technologies the database through APIs [27]. facilitate the usage of other techniques like reasoning,

News agencies use two types of distribution com- network analysis and event analysis. These techniques ponents for delivering content to their audience and support functionalities like knowledge discovery, cluscustomers [20]: push and pull. Push components of- tering and trends, and are applied in the lifting, knowlfer interfaces where information consumers can select edge base, ontology and annotation components. Reaand subscribe to streams of news [20], whereas with soning techniques abstract and infer new knowledge from news items, events and temporal aspects [24, 25]. to the information relevance that customers expect Network analysis is used to find networks of actors and from news agencies [21, 20]. Moreover, because the organisation implications through diferent events and dificulty of manually monitoring and finding related time [24]. Event analysis is applied to detect, identify articles from other news providers, the audience, and annotate the events described in news [21, 20]. customers and news professions can get biased or

The above techniques are supported by NLP tasks incomplete information [22]. like named entity detection, role detection, topic de- Customers are heterogeneous, they have diferent intection, temporal expression normalisation, temporal formation needs and use diferent systems to interact relation detection, factual claims extraction, natural with news agencies [20]. language understanding [25, 29, 27]. These NLP tasks, According to our study, JKPs deal with big data among others, are also used in JKPs’ functionalities requirements like volume, velocity, variety: The ASsuch as knowledge discovery, content management, RAEL project estimated that “the number of collected summarising, verification, trends, clustering, query, articles ranges between 100.000 and 200.000 articles lifting and annotation. In order to obtain optimal re- per day” and collected “news articles from around sults from the NLP tasks, diferent training techniques 75.000 news sources” [22, p. 1]. NewsReader used an have to be used over extensive news corpus [30]. archive that “contains billions of articles, biographies, and reports” [25, p. 1]. The SUMMA platform “[was] able to ingest 400 TV streams simultaneously” [27, p. 10. Other aspects 6].

Other information aspects that JKPs deal with are

Stakeholders, information, functionalities, compo- the multilingual and timeliness data aspects. Infornents and techniques are influenced or afected by mation and news production are created in multiple additional concerns of various types. We organised languages (e.g., Catalan, Norwegian, Spanish, Enthese other aspects into the following sub-categories: glish, Italian, French, Portuguese and Chinese) and standards, proprietary, human factors, customers het- need to be translated, transcribed and delivered to erogeneity, big data, multilingual, timeliness, quality, customers and audiences in their languages of prefersoftware architecture, performance, maintenance, and ence [20, 27, 25, 30]. The timeliness aspect refers to the legacy. temporal aspect of events, thus news professionals,

Before moving into JKPs, news agencies used audience and customers want to receive the informatheir terms, categories and vocabularies to describe tion as soon as it is generated [21] and reconstruct their items. Yet, the interoperability between news story-lines or histories over time [24, 27]. agencies and customers was dificult. The usage of Quality of the results and outputs of JKPs are standards like like IPTC news codes and media topics, summarised in “news agencies are required to prosemantic vocabularies, NAF and RDF improved the vide fresh, relevant, high-quality information to their interoperability between news agencies and other customers” [21, p. 1] and ignoring these quality stakeholders [20]. requirements can imply economic losses for cus

JKPs keep track of proprietary news information like tomers [20]. authorship, copyrights and sources [21, 20] as a part Aspects concerning technical agents and their of the content management functionalities. Property components include the software architecture, perforinformation is used as metadata in annotation compo- mance, maintenance and relation of JKPs with other nents and provides provenance and reliability infor- systems. The software architecture of JKPs should mation [24, p. 4]. consider scalability to deal with big data require

There are diferent human factors influencing JKPs ments [21, 24, 27], distribution to run its components and stakeholders. Before JKPs, news professionals and systems over multiple machines [20, 26], comwere performing many processes by hand like news ponents independence so they can be used for other tagging, verification tasks, fact searching, finding purposes [26], interoperability between components related articles, and source monitoring. Performing and systems [20, 25], and performance for reducing these tasks manually is time-consuming, error-prone, the processing and distributing time of information consumes a lot of eforts, and reduces the amount and live feeds [21, 24]. Manual maintenance is a and precision of the added metadata [21, 20, 28, 22]. time-consuming and error-prone task [20] which Therefore, customers have to manually filter irrele- is automated with JKPs to keep the JKP and onvant content received from news agencies, creating tology up-to-date [26]. As JKPs communicate with an information overload problem which is contrary customers systems, legacy components and other newsroom systems, JKPs need to be designed to fa- On the other hand, providing one-size-fits-all JKP cilitate the integration with other technologies and solutions for all possible stakeholders is challenging, systems [20, 26]. because of their diversity and difering information needs. Newsworthy information comes from diverse news sources like pre-news information from so11. Conclusion cial media or multimedia sources such as TV news programs. Leveraging these information sources is JKPs are a new type of platforms which ofer many a complex task which requires new techniques to opportunities for newsrooms and journalists by com- distinguish potentially newsworthy information from bining AI techniques such as knowledge graphs, LOD non-relevant content and extract information from and NLP to improve and facilitate the production multimedia items like images or videos. Summarisof high-quality journalism. We collected challenges ing and presenting news-related information in JKPs and opportunities that JKPs present and organised like background information, events in time or actor them into six categories that we assume are important networks to users with diferent information needs for the evolution of JKPs (stakeholders, information, and skills is not a trivial task. JKPs consist of diferent functionalities, components, techniques and other components which interact together and with exteraspects). nal components that need to be integrated in JKPs

JKPs ofer new opportunities for consuming and systems. Extracting precise semantic representations interacting with news by providing enriched content of and reasoning over relations and time remain open from external sources like Wikipedia or Wikidata research questions. JKPs deal with big data, but some to stakeholders seeking relevant information, such semantic technologies, reasoning and AI techniques as news professionals and general audiences. News are not yet ready for it. Among the reviewed JKPs, texts are enriched with additional information about, the most common challenges are problems such e.g., involved actors, places and organisations, the as language independence, multiple news channels, connections with other news and related events. In- complex newsrooms workflows, dispersed and diverse formation and data sources in JKPs are no longer split information, lack of facts, and integration with legacy along dispersed and disconnected repositories as it and customer systems. happens in traditional solutions. Instead, the infor- After reviewing the literature, we have realised that mation pieces are connected by the knowledge graph. there is not a clear definition and agreement about JKPs enhance functionalities like news creation and what constitutes an event. The event concept is used content management. News creation is improved with in diferent ways in the literature, from a handshake background information providing journalists with between two actors to bigger events like the Spanish better information for their stories. Automatic meta- Civil War or events in between such as a trial process. data annotation and the usage of standards like IPTC In this study, we have only reviewed five JKP-related relieve archivists from manually annotating news and research projects, although they are the five most cenimprove the content management capabilities of JKPs tral ones we have found. Hence, we may have omitted and newsroom workflows. Knowledge graphs in JKPs important issues that were not represented or brought bring new forms of representing news-related content up in these projects. We are therefore planning to exand exploiting it. Techniques like network analysis, tend the number of considered projects through a sysevent analysis and reasoning improve the background tematic literature review and contrast and expand our information and knowledge discovery in JKPs while ifndings with published works on data and digital jouropening new research questions for researchers. JKPs nalism. A logical continuation of this expanded study can use standards such as RDF, IPTC’s media topics is the formal identification and modelling of goals, reand semantic vocabularies which simplify the interop- quirements and use cases for JKPs, which we did not erability and understanding between news agencies ifnd yet in the literature. Furthermore, we plan to forand stakeholders. The most highlighted opportunities malise a reference framework for JKPs and continue that have been identified in the literature include the development of our JKP identified to validate and event detection and analysis over time, real-time and integrate our findings. up-to-date trustworthy information, access to enriched background information for supporting news creation, multilingual and multimedia cross-platform solutions, and tools for monitoring worldwide media output and internal newsrooms production.

mantic web technologies into news agencies , in: [29]

Paikens ,

Barzdins ,

Mendes , D. C. Fer-

The

Semantic

Web - ISWC

2006 , 2006 , pp. 778 - reira , S. Broscheit,

M. S.

Almeida , S. Miranda,

791. doi: 10 .1007/11926078\_56.

Nogueira ,

Balage ,

A. F.

Martins , Summa [22]

Leban ,

Fortuna ,

Brank , M. Grobelnik, at tac knowledge base population task 2016 , in:

from news , in: Proceedings of the 23rd In- ence (TAC) , 2016 . URL: https://tac.nist.gov/

ternational Conference on World Wide Web, publications/2016/participant.papers/

WWW '14 Companion, Association for Comput- TAC2016.summa.proceedings.pdf.

ing Machinery , 2014 , pp. 107 -- 110 . doi: 10 .1145/ [30]

Rudnik ,

Ehrhart ,

Ferret ,

Teyssou ,

2567948.2577024.

Troncy ,

Tannier , Searching news articles us[23]

Kattenberg ,

Beloki ,

Soroa , X.

Artola, ing an event knowledge graph leveraged by wiki-

Fokkens ,

Huygen ,

Verstoep , Two archi- data , in: Companion Proceedings of The 2019

tectures for parallel processing for huge amounts World Wide Web Conference , WWW '19, Associ-

of text , in: Proceedings of Language Resources ation for Computing Machinery , 2019 , pp. 1232 -

and Evaluation

Conference (LREC), European -1239. doi:10.1145/3308560 .3316761.

Language Resources Association (ELRA), 2016 ,

pp. 4513 -- 4519 . URL: https://www.aclweb.org/

anthology/L16-1714. [24]

Rospocher , M. van Erp ,

Vossen , A . Fokkens,

Semantics 37- 38 ( 2016 ) 132 - 151 . doi: 10 .1016/

j.websem. 2015 . 12 .004. [25]

Vossen ,

Agerri , I. Aldabe , A . Cybulska,

tems , Elsevier 110 ( 2016 ) 60 - 85 . doi: 10 .1016/

j.knosys. 2016 . 07 .013. [26]

Germann ,

Liepins ,

Gosko , G. Barzdins,

tion for Computational Linguistics , 2018 , pp. 47 -

51. doi: 10 .18653/v1/ W18 -2508. [27]

Germann ,

Liepins ,

Barzdins ,

Gosko ,

ACL 2018 , System

Demonstrations

, Association

for Computational

Linguistics

, 2018 , pp. 99 - 104 .

doi:10 .18653/v1/ P18 -4017. [28]

S. a.

Miranda ,

Nogueira ,

Mendes , A . Vla-

in: The World Wide Web Conference , WWW ' 19 ,

Association for Computing Machinery , 2019 , pp.

3579-- 3583 . doi: 10 .1145/3308558.3314135.