Challenges and Opportunities for Journalistic Knowledge Platforms Marc Gallofré Ocañaa , Andreas L. Opdahla a University of Bergen, Fosswinckelsgt. 6, Postboks 7802, 5020 Bergen, Norway Abstract Journalism is under pressure from loss of advertisement and revenues, while experiencing an increase in digital consumption and user demands for quality journalism and trusted sources. Journalistic Knowledge Platforms (JKPs) are an emerging generation of platforms which combine state-of-the-art artificial intelligence (AI) techniques such as knowledge graphs, linked open data (LOD), and natural-language processing (NLP) for transforming newsrooms and leveraging information technologies to increase the quality and lower the cost of news production. In order to drive research and design better JKPs that allow journalists to get most benefits out of them, we need to understand what challenges and opportunities JKPs are facing. This paper presents an overview of the main challenges and opportunities involved in JKPs which have been manually extracted from literature with the support of natural language processing and understanding techniques. These challenges and opportunities are organised in: stakeholders, information, functionalities, components, techniques and other aspects. Keywords Newsroom, Knowledge Graph, Digitalization, Overview 1. Introduction and transforming newsrooms, leveraging information technologies to increase the quality and lower the cost Journalism is under pressure from loss of advertise- of news production. In order to drive research and ment and revenues, in combination with competing design JKPs that allow journalists to get most bene- online distribution channels that stream free content, fits out of them and support newsrooms with better while experiencing an increase in digital consump- solutions, we need to understand the challenges and tion and readers who demand quality journalism opportunities that JKPs present for both users and and trusted sources [1]. Information is no longer developers. To do so, we have reviewed the research consumed from a single newspaper. Instead, readers literature in light of our own experience with devel- have access to and can contrast fresh and first-hand oping News Hunter [3, 4, 5], a series of JKP prototypes information sources available on the internet and in collaboration with a developer of newsroom tools social media at any time. for the international market. News organisations are constantly adapting their This paper presents a synthesis of the challenges business models to digital media innovations, to and opportunities for journalistic knowledge plat- improve information quality, competitiveness and forms that we have found in the literature, hopefully growth [2]. Journalistic Knowledge Platforms (JKPs) describing the most central factors that are driv- are an emerging type of platform that combines ing development of JKPs today. These factors have state-of-the-art artificial intelligence (AI) techniques been grouped into six categories: stakeholders, in- such as knowledge graphs and natural-language pro- formation, functionalities, components, techniques cessing (NLP); and exploit news and social media and other aspects. We conclude that JKPs offer information over the net in real-time, using linked many opportunities for effective production of high- open data (LOD), encyclopaedic sources and news quality journalism, real-time information, enriched archives to construct knowledge graphs and provide background information, and multilingual and cross- fresh and unexpected information to journalists, help- platform solutions for monitoring worldwide mul- ing them to dive deeply into information, events and timedia output, by offering solutions to problems story-lines. JKPs are increasingly driving innovation such as language independence, complex newsrooms Proceedings of the CIKM 2020 Workshops, workflows, and disperse information. Central chal- October 19-20, Galway, Ireland. lenges include leveraging pre-news information from email: Marc.Gallofre@uib.no (M. Gallofré Ocaña); social media and multimedia sources, precise seman- Andreas.Opdahl@uib.no (A.L. Opdahl) tic lifting and enrichment of texts, scaling semantic orcid: 0000-0001-7637-3303 (M. Gallofré Ocaña); 0000-0002-3141-1385 (A.L. Opdahl) technologies to big data, and detecting and reasoning © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). over events. CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) This paper is organised as follows: Section 2 sum- 3. Reviewed papers marises the methodology used for screening the challenges and opportunities. Section 3 briefly re- After a broad survey of the literature, we selected views the research literature. Section 4 explains the eleven papers describing describing five research coding process. Sections 5 to 10 synthesise the main projects related to JKPs as the starting point of our challenges and opportunities for each factor respec- review: NEWS [20, 21], EventRegistry [22], News- tively — stakeholders, information, functionalities, Reader [23, 24, 25], SUMMA [26, 27, 28, 29] and components, techniques and other aspects. ASRAEL [30]. NEWS is a project, in collaboration with the Spanish Agencia EFE and the Italian ANSA news agencies, that 2. Method makes use of semantic technologies to improve news agencies’ workflows, productiveness and revenues by Our research method consists of four steps: Firstly, focusing on the annotation, intelligent information re- we selected the most relevant research papers that we trieval and user interface aspects [21]. EventRegistry have identified in our previous studies on JKPs archi- is focused on collecting news articles, identifying and tectures and news angles [4, 6, 7, 8, 9, 10, 11, 12]. From extracting information about events, and summaris- these selected papers we manually extracted claims, ing and visualising them [22]. NewsReader extracts i.e., sentences that express potential challenges or op- information about what, who, where, when from mul- portunities. tilingual news articles and represents events in time Secondly, a purposive sampling was conducted in- using RDF in a knowledge graph, allowing users to dependently by two expert coders (the authors). The find networks of actors along time [25]. SUMMA col- coders generated multiple codes for each extracted laborates with BBC Monitoring and Deutsche Welle claim and the codes were cleaned with the support of to develop a multilingual and multimedia platform us- NLP and NLU techniques (i.e., Damerau-Levenshtein ing state-of-the-art NLP techniques to monitor inter- distance [13], word2vec [14], and Wordnet [15])1 . nal and external media work and provide data jour- From the resulting cleaned codes, we selected the nalism services [27]. ASRAEL aggregates news arti- most representative ones as preliminary codes and cles and leverages the Wikidata knowledge base to de- divided them into categories. scribe and cluster news events and provides informa- Thirdly, based on the preliminary codes, claims tion retrieval tools to interact with the resulting news were independently coded once again by both au- representations [30]. thors. This time, the coders were allowed to code each claim with multiple codes for each category. The cod- ing agreement was estimated using Gwet’s AC1 [19] 4. Coding process inter-rater reliability coefficient with nominal ratings. Because coders were allowed to not to code, to com- In the purposive sampling step, we extracted 322 pute the Gwet’s AC1 , empty codes were not treated as claims from the related literature and marked them missing values, instead, they were treated as if they up using 406 codes. After cleaning and tidying up where coded as “undefined”. Hence, to compute the the initial codes, we identified six top-level categories contingency tables for multiple codes we applied the which we divided into 62 sub-categories to be used following rule: the agreement between coders A and for preliminary coding. The following six top-level B only happens between correctly matching codes categories were used: (A∩B) and the other codes (A△B) were matched with • Stakeholder: the agent that the challenge or op- missing values and treated as disagreements. portunity is for. The agent can be either a tech- Finally, when both coders agreed on the final codes nical agent or social agent. for each claim, challenges and opportunities were ex- tracted from each claim following the assigned codes. • Information: the information needed to meet the challenge or exploit the opportunity. • Functionality: the service or functionality that the platform should offer to meet a challenge or exploit an opportunity. 1 Implemented in python with support of Scikit-learn [16], • Component: the part of a platform that must be NLTK [17], SpaCy [18] and other libraries. created or improved to meet the challenge or ex- own newsrooms, the British government, and other ploit the opportunity. subscribers” [27, p. 1]; and the organisations that are responsible for controlling news media standards, • Technique: the IT solution used to meet the vocabulary and ontologies (e.g., IPTC organisation2 ), challenge or exploit the opportunity. which are indirectly influencing JKPs because the • Other aspects: another type of concern that work of many news agencies and JKPs depends on the challenge or opportunity involves, such those standards, as in the NEWS project where “most as customer heterogeneity, performance or of the NewsCodes defined by IPTC do not have al- maintenance. ternative versions in different languages, only in English” [20, p. 9]. We computed the inter-rater agreement for the Finally, the technical agent, which is a stakeholder preliminary coding with the AC1 coefficient for each that represents the JKPs and any system or techni- category: 0.77 for Stakeholders, 0.65 for Components, cal infrastructure in newsrooms that support or inter- 0.71 for Techniques, 0.71 for Aspects, 0.72 for In- act with JKPs. A particular subtype of technical agent formation types and 0.57 for Functionalities. The are the external systems that communicate with news- average AC1 is 0.69 with a standard deviation of room systems, like the information systems of poten- 0.063, which according to Landis-Koch and Altman’s tial customers [20]. benchmark scales, express an acceptable agreement among coders [19]. Finally, the assigned codes were discussed between and agreed on by the two coders. 6. Information JKPs cover the whole information pipeline from gath- 5. Stakeholders ering information and news creation to knowledge ex- ploitation and distribution. Our study identified the Stakeholders are agents that represent the forces and following sub-categories of information to be consid- interests that drive the future of JKPs. The identified ered in JKPs: news content, textual data, multimedia sub-categories of stakeholders are: general user, news data, data format, metadata, LOD, events and infor- professional, fact checker, archivist, ICT professional, mation needs. audience, customer, researcher, news agency, public News agencies produce both textual and multi- organisation and technical agent. media news content which have to be managed and General users interact with services provided by the distributed to their customers and audience [21, 20]. JKPs or newsrooms. These can be divided between the As textual data we consider the raw text in form of internal users that belong to newsrooms and the ex- news articles, documents, markup files, PDF, web ternals ones. The internal users are news professionals pages, biographies, history and geopolitical data of like journalists who use JKPs for creating histo- countries, reports, social media feeds and social blogs. ries [20]; fact checkers who conduct an essential task Whereas, as multimedia we consider live broad- in combating with fake news and misinformation [28]; cast, spoken content, photographs, audio and video. archivists who maintain up-to-date the ontology and Besides, news agencies produce contents in different news archives [20]; and ICT professionals and knowl- formats like plain text, Information Interchange Model edge engineers who represent those users involved (IIM), News Industry Text Format (NITF), NewsML in the development and maintenance of JKPs [21]. and RDF [20]. Whereas, the external users are the audience [22]; the Metadata is used to annotate and manage the pro- customers to whom new agencies offer services; and duced content. Metadata can describe e.g., author, researchers who investigate JKPs or analyse data, as in language, creation timestamp, location, keywords, the SUMMA project where “[political scientists want] category, provenance, priority, urgency, status, up- to perform data analyses based on large amounts of dates, rights, interest, description or media type. JKPs news reports” [27, p. 2]. use Linked Open Data (LOD) to annotate and enrich The organisations influencing the JKPs are: the content using semantic vocabularies and leveraging news agencies, including newsrooms; the public organ- knowledge bases, as in the ASRAEL project where isations which are those governmental agencies that they “leverage the Wikidata knowledge base to pro- interact with or consume services from newsrooms’ duce semantic annotations of news articles” [30, p. JKPs, as in the SUMMA project which “provides 1]. media monitoring and analysis services to the BBC 2 https://iptc.org/ News agencies create stories describing events and p. 1]. deliver them to their customers and audience [21], Knowledge discovery is one of the most attrac- making the events the central information need. De- tive functionalities of JKPs. Knowledge discovery spite that, social stakeholders have other information allows users to obtain news insights, analysis and needs: General users are interested in knowing who, relevant information, like in NewsReader where it what, with whom, where and when events took place, “increases the user understanding of the domain, networks of timeline actors implications, find the facilitates the reconstruction of news story lines, and events of a certain type or in a certain place, obtain enables users to perform exploratory investigation facts and retrieve evidence [24]. News professionals of news hidden facts” [24, p. 1]. Other interesting need access to news agencies’ archives and knowledge functionalities among JKPs are: trends used to dis- bases for documentation purposes, find connections cover emerging topics, long-term developments and from past events, follow histories and identify emerg- changes in events over time [22, 25]; alerts to keep ing topics [20, 23, 27]. While customers have different users up-to-date with the last incoming items [26]; information needs mainly depending on their busi- summarisation of news histories and events to provide ness or interests, e.g., “the press cabinet of a company additional insights [22]; clustering of story-lines and is usually interested in news items talking about the events [27]; and personalisation of both the JKPs and company or its rivals, whereas a sports TV channel its functionalities according to users’ preferences and is interested mostly in news items describing sports profiles [21]. events” [20]. JKPs provide functionalities to news agencies and newsrooms organisation and workflows. JKPs are used as business support systems to manage internal 7. Functionalities newsrooms output; monitor what is being broadcast, produced and covered [27]; overcome limitations in JKPs provide different functionalities to their users. newsrooms’ workflows; and improve productivity We identified twelve main sub-categories of function- and revenues [20]. Another functionality provided ality: news creation, verification, source selection, by the JKPs is the content management which allows monitoring, knowledge discovery, trends, alert, sum- news agencies to produce, store, organise, manage, marisation, clustering, personalisation, business maintain and distribute the content and metadata support and content management. produced every day [20]. News professionals use the JKPs for the news cre- ation process. JKPs guide journalists in writing up their stories, support them with contextual back- 8. Components ground knowledge for those stories [21], provide means for comparing current events with other simi- JKPs rely on different components to fulfil its function- lar events [30], and facilitate access to previous work alities and support users. We split JKP components for creating similar content for a different audience, into five sub-categories: input, processing, storage, in- region or language [27]. JKPs also support news teraction and output. professions with verification tasks like fact checking, As input, we consider the different sources of con- provenance [24], rights and authorship manage- tent and information used in JKPs that are relevant for ment [20, 21], which are typically time-consuming stakeholders. The textual and multimedia sources are tasks for news professions as explained in “manual sources of interest. However, not all analysed projects verification of claims is a tedious task, that consumes a treat the information in the same way or use the same lot of time and effort from journalists and professional information types, like ASRAEL which only uses the fact-checkers” [28, p. 1]. title and first paragraph to represent the events [30]; Source selection and monitoring functionalities are and not all contents receive the same interest by news two common functionalities across the studied JKPs, professionals, as in SUMMA which considers “enter- which harvest and store content from internal and tainment programming such as movies and sitcoms, external sources and monitor them in real-time. By commercial breaks, and repetitions of content (e.g., on doing this, JKPs relieve journalists from these time- 24/7 news channels) [...] of limited interest to moni- consuming tasks, as it was happening in the BBC toring operations” [27, p. 1]. where “each of its ca. 300 journalist monitors up to The processing components cover tasks from har- four live broadcasts in parallel, plus several other vesting and annotating input sources to processing information sources such as social media feeds” [27, and lifting them, following an ETL process (i.e., Ex- tract, Transform, Load). Input sources are harvested pull components, news agencies offer interfaces to ac- using different components, each with a specific pur- cess, browse and query their repositories [20]. pose: harvesting, translating, filtering and transcribing. A common characteristic of the analysed projects is that source selection and monitoring functionalities 9. Techniques are conducted in real-time by harvesting informa- Techniques used in JKPs can be grouped in eight tion sources [22, 23, 27]. The harvested content is sub-categories: semantic technology, fact extraction, then translated [27] and filtered according with the conceptual model, reasoning, network analysis, event different stakeholders’ interests and needs. Spoken analysis, NLP and training. content is transcribed [27] and images are textually Semantic technology is used to support functional- described [21]. ities like knowledge discovery, news creation, verifi- JKPs use specific components to automatically an- cation, clustering, trends, and content management. notate the harvested content with metadata to support Semantic technologies support knowledge discov- functionalities like business support, content manage- ery by providing means for lifting news items, and ment and personalisation [20]. The annotated content disambiguating, enriching and leveraging them with is typically processed by different components which information from external knowledge bases [21, 25] – are organised in an NLP pipeline. The NLP pipeline processes carried by the lifting, ontology and knowl- processes the content through state-of-the-art NLP edge base components; news creation, by providing and NLU modules to perform linguistic tasks [25, 24]. systems and vocabulary to automatically annotate These tasks are focused on capturing and extracting news in annotation components [21]; and verifica- the different information types described in section 6. tion, by combining semantic technologies with the Both the results of the NLP pipeline and the annotated lifting and knowledge base components and linking content are disambiguated and represented semanti- factual claims to its sources and external knowledge cally using lifting components. The lifting component bases [24, 27]. Semantic technologies and semantic links the semantic representation of news items representation techniques facilitate clustering news to a knowledge base, for examples an RDF-based items and events [30], and detecting trends and story knowledge graph [25], and enriches the semantic lines [24]. Moreover, semantic technologies provide interpretations with facts from external knowledge shared semantic resources and formats which are bases, for example from the LOD cloud [24, 30]. used to support content management and facilitate The JKP storage infrastructure is normally composed conceptual interoperability [25]. of an archive, a knowledge base and an ontology. The Fact extraction techniques extract facts from news archive stores news articles, biographies, reports [25] items and link them to facts in external knowledge and other textual and multimedia items; the knowl- bases (e.g., Wikidata, Wikipedia). These techniques edge base is where the lifted semantic representations are used to provide functionalities like verification and of news items are stored and enriched with external knowledge discovery [27] and are common features of information [24]; and the ontology is used to represent lifting, knowledge base and query components. the structure of the news items, leveraged information. Conceptual models provide vocabularies and ontolo- metadata and vocabulary [20]. gies which are used in conjunction with semantic tech- JKP users interact with the previous components nologies to support and standardise functionalities like mainly using three types of interaction components: content management and personalisation. Ontologies front-ends, tools and query engines. JKPs provide can be used for defining user interests and preferences front-end components [21] to allow stakeholders to based on the provided vocabulary or as shared mod- access the system functionalities; tools which offer els [20]. Conceptual models are applied in distribu- features to journalists when creating news articles or tion, lifting, annotation, ontology, query, knowledge to general users when interacting with the system, base and source components. like money converters or dictionaries [20]; and query Both conceptual models and semantic technologies engines that allow users to query, analyse or visualise facilitate the usage of other techniques like reasoning, the database through APIs [27]. network analysis and event analysis. These techniques News agencies use two types of distribution com- support functionalities like knowledge discovery, clus- ponents for delivering content to their audience and tering and trends, and are applied in the lifting, knowl- customers [20]: push and pull. Push components of- edge base, ontology and annotation components. Rea- fer interfaces where information consumers can select soning techniques abstract and infer new knowledge and subscribe to streams of news [20], whereas with from news items, events and temporal aspects [24, 25]. to the information relevance that customers expect Network analysis is used to find networks of actors and from news agencies [21, 20]. Moreover, because the organisation implications through different events and difficulty of manually monitoring and finding related time [24]. Event analysis is applied to detect, identify articles from other news providers, the audience, and annotate the events described in news [21, 20]. customers and news professions can get biased or The above techniques are supported by NLP tasks incomplete information [22]. like named entity detection, role detection, topic de- Customers are heterogeneous, they have different in- tection, temporal expression normalisation, temporal formation needs and use different systems to interact relation detection, factual claims extraction, natural with news agencies [20]. language understanding [25, 29, 27]. These NLP tasks, According to our study, JKPs deal with big data among others, are also used in JKPs’ functionalities requirements like volume, velocity, variety: The AS- such as knowledge discovery, content management, RAEL project estimated that “the number of collected summarising, verification, trends, clustering, query, articles ranges between 100.000 and 200.000 articles lifting and annotation. In order to obtain optimal re- per day” and collected “news articles from around sults from the NLP tasks, different training techniques 75.000 news sources” [22, p. 1]. NewsReader used an have to be used over extensive news corpus [30]. archive that “contains billions of articles, biographies, and reports” [25, p. 1]. The SUMMA platform “[was] able to ingest 400 TV streams simultaneously” [27, p. 10. Other aspects 6]. Other information aspects that JKPs deal with are Stakeholders, information, functionalities, compo- the multilingual and timeliness data aspects. Infor- nents and techniques are influenced or affected by mation and news production are created in multiple additional concerns of various types. We organised languages (e.g., Catalan, Norwegian, Spanish, En- these other aspects into the following sub-categories: glish, Italian, French, Portuguese and Chinese) and standards, proprietary, human factors, customers het- need to be translated, transcribed and delivered to erogeneity, big data, multilingual, timeliness, quality, customers and audiences in their languages of prefer- software architecture, performance, maintenance, and ence [20, 27, 25, 30]. The timeliness aspect refers to the legacy. temporal aspect of events, thus news professionals, Before moving into JKPs, news agencies used audience and customers want to receive the informa- their terms, categories and vocabularies to describe tion as soon as it is generated [21] and reconstruct their items. Yet, the interoperability between news story-lines or histories over time [24, 27]. agencies and customers was difficult. The usage of Quality of the results and outputs of JKPs are standards like like IPTC news codes and media topics, summarised in “news agencies are required to pro- semantic vocabularies, NAF and RDF improved the vide fresh, relevant, high-quality information to their interoperability between news agencies and other customers” [21, p. 1] and ignoring these quality stakeholders [20]. requirements can imply economic losses for cus- JKPs keep track of proprietary news information like tomers [20]. authorship, copyrights and sources [21, 20] as a part Aspects concerning technical agents and their of the content management functionalities. Property components include the software architecture, perfor- information is used as metadata in annotation compo- mance, maintenance and relation of JKPs with other nents and provides provenance and reliability infor- systems. The software architecture of JKPs should mation [24, p. 4]. consider scalability to deal with big data require- There are different human factors influencing JKPs ments [21, 24, 27], distribution to run its components and stakeholders. Before JKPs, news professionals and systems over multiple machines [20, 26], com- were performing many processes by hand like news ponents independence so they can be used for other tagging, verification tasks, fact searching, finding purposes [26], interoperability between components related articles, and source monitoring. Performing and systems [20, 25], and performance for reducing these tasks manually is time-consuming, error-prone, the processing and distributing time of information consumes a lot of efforts, and reduces the amount and live feeds [21, 24]. Manual maintenance is a and precision of the added metadata [21, 20, 28, 22]. time-consuming and error-prone task [20] which Therefore, customers have to manually filter irrele- is automated with JKPs to keep the JKP and on- vant content received from news agencies, creating tology up-to-date [26]. As JKPs communicate with an information overload problem which is contrary customers systems, legacy components and other newsroom systems, JKPs need to be designed to fa- On the other hand, providing one-size-fits-all JKP cilitate the integration with other technologies and solutions for all possible stakeholders is challenging, systems [20, 26]. because of their diversity and differing information needs. Newsworthy information comes from diverse news sources like pre-news information from so- 11. Conclusion cial media or multimedia sources such as TV news programs. Leveraging these information sources is JKPs are a new type of platforms which offer many a complex task which requires new techniques to opportunities for newsrooms and journalists by com- distinguish potentially newsworthy information from bining AI techniques such as knowledge graphs, LOD non-relevant content and extract information from and NLP to improve and facilitate the production multimedia items like images or videos. Summaris- of high-quality journalism. We collected challenges ing and presenting news-related information in JKPs and opportunities that JKPs present and organised like background information, events in time or actor them into six categories that we assume are important networks to users with different information needs for the evolution of JKPs (stakeholders, information, and skills is not a trivial task. JKPs consist of different functionalities, components, techniques and other components which interact together and with exter- aspects). nal components that need to be integrated in JKPs JKPs offer new opportunities for consuming and systems. Extracting precise semantic representations interacting with news by providing enriched content of and reasoning over relations and time remain open from external sources like Wikipedia or Wikidata research questions. JKPs deal with big data, but some to stakeholders seeking relevant information, such semantic technologies, reasoning and AI techniques as news professionals and general audiences. News are not yet ready for it. Among the reviewed JKPs, texts are enriched with additional information about, the most common challenges are problems such e.g., involved actors, places and organisations, the as language independence, multiple news channels, connections with other news and related events. In- complex newsrooms workflows, dispersed and diverse formation and data sources in JKPs are no longer split information, lack of facts, and integration with legacy along dispersed and disconnected repositories as it and customer systems. happens in traditional solutions. Instead, the infor- After reviewing the literature, we have realised that mation pieces are connected by the knowledge graph. there is not a clear definition and agreement about JKPs enhance functionalities like news creation and what constitutes an event. The event concept is used content management. News creation is improved with in different ways in the literature, from a handshake background information providing journalists with between two actors to bigger events like the Spanish better information for their stories. Automatic meta- Civil War or events in between such as a trial process. data annotation and the usage of standards like IPTC In this study, we have only reviewed five JKP-related relieve archivists from manually annotating news and research projects, although they are the five most cen- improve the content management capabilities of JKPs tral ones we have found. Hence, we may have omitted and newsroom workflows. Knowledge graphs in JKPs important issues that were not represented or brought bring new forms of representing news-related content up in these projects. We are therefore planning to ex- and exploiting it. Techniques like network analysis, tend the number of considered projects through a sys- event analysis and reasoning improve the background tematic literature review and contrast and expand our information and knowledge discovery in JKPs while findings with published works on data and digital jour- opening new research questions for researchers. JKPs nalism. A logical continuation of this expanded study can use standards such as RDF, IPTC’s media topics is the formal identification and modelling of goals, re- and semantic vocabularies which simplify the interop- quirements and use cases for JKPs, which we did not erability and understanding between news agencies find yet in the literature. Furthermore, we plan to for- and stakeholders. The most highlighted opportunities malise a reference framework for JKPs and continue that have been identified in the literature include the development of our JKP identified to validate and event detection and analysis over time, real-time and integrate our findings. up-to-date trustworthy information, access to en- riched background information for supporting news creation, multilingual and multimedia cross-platform solutions, and tools for monitoring worldwide media output and internal newsrooms production. Acknowledgments Springer International Publishing, 2019, pp. 449– 455. doi:10.1007/978-3-030-34885-4\_35. This work has been supported by the Norwegian Re- [10] A. L. Opdahl, B. Tessem, Ontologies for finding search Council IKTPLUSS project 275872 News Angler, journalistic angles, Software and Systems Mod- which is a collaboration with Wolftech AB, Bergen, eling (2020) 1–17. Norway. [11] E. Motta, E. Daga, A. L. Opdahl, B. Tessem, Anal- ysis and design of computational news angles, IEEE Access (2020). References [12] T. Al-Moslmi, M. Gallofré Ocaña, Lifting news [1] PwC, Global entertainment & media outlook into a journalistic knowledge platform, in: Pro- 2019–2023, 2020. URL: https://www.pwc.com/ ceedings of the CIKM 2020 Workshops, Galway, gx/en/industries/tmt/media/outlook.html. Ireland, 2020. To appear. [2] J. Vázquez Herrero, S. Direito-Rebollal, A. S. [13] F. J. Damerau, A technique for computer de- Rodríguez, X. García, Journalistic Metamor- tection and correction of spelling errors, Com- phosis: Media Transformation in the Digital mun. ACM 7 (1964) 171––176. doi:10.1145/ 363958.363994. Age, Springer International Publishing, 2020. doi:10.1007/978-3-030-36315-4. [14] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, [3] A. Berven, O. Christensen, S. Moldeklev, A. Op- J. Dean, Distributed representations of words dahl, K. Villanger, News hunter: building and and phrases and their compositionality, in: mining knowledge graphs for newsroom sys- Advances in Neural Information Processing Sys- tems, in: NOKOBIT—Norsk konferanse for tems 26 (NIPS 2013), 2013, pp. 3111–3119. URL: organisasjoners bruk av informasjonsteknologi, http://papers.nips.cc/paper/5021-distributed- volume 26, 2018. URL: https://ojs.bibsys.no/ representations-of-words-and-phrases-and- index.php/Nokobit/article/view/548/467. their-compositionality.pdf. [4] M. Gallofré Ocaña, L. Nyre, A. L. Opdahl, [15] G. A. Miller, Wordnet: A lexical database B. Tessem, C. Trattner, C. Veres, Towards a big for english, Commun. ACM 38 (1995) 39—-41. data platform for news angles, in: 4th Nor- doi:10.1145/219717.219748. wegian Big Data Symposium (NOBIDS) 2018, [16] F. Pedregosa, G. Varoquaux, A. Gramfort, 2018, pp. 17–29. URL: http://ceur-ws.org/Vol- V. Michel, B. Thirion, O. Grisel, M. Blondel, 2316/paper1.pdf. P. Prettenhofer, R. Weiss, V. Dubourg, J. Van- [5] A. Berven, O. Christensen, S. Moldeklev, A. Op- derplas, A. Passos, D. Cournapeau, M. Brucher, dahl, K. Villanger, A knowledge graph platform M. Perrot, E. Duchesnay, Scikit-learn: Machine for newsrooms, Computers in Industry (2020). To learning in Python, Journal of Machine Learn- appear. ing Research 12 (2011) 2825–2830. URL: http:// [6] B. Tessem, A. L. Opdahl, Supporting journalistic www.jmlr.org/papers/v12/pedregosa11a. news angles with models and analogies, in: 2019 [17] S. Bird, E. Loper, E. Klein, Natural language pro- 13th International Conference on Research Chal- cessing with python, O’Reilly Media, Inc., 2009. lenges in Information Science (RCIS), IEEE, 2019, [18] M. Honnibal, I. Montani, spaCy 2: Natural lan- pp. 1–7. doi:10.1109/RCIS.2019.8877058. guage understanding with Bloom embeddings, [7] A. L. Opdahl, B. Tessem, Towards ontological convolutional neural networks and incremental support for journalistic angles, in: Enterprise, parsing, 2017. To appear. Business-Process and Information Systems Mod- [19] K. L. Gwet, Handbook of inter-rater reliability: eling, Springer International Publishing, 2019, The definitive guide to measuring the extent of pp. 279–294. doi:10.1007/978-3-030-20618- agreement among raters, Advanced Analytics, 5\_19. LLC, 2014. [8] T. A. A. Al-Moslmi, M. Gallofré Ocaña, A. L. Op- [20] N. Fernández, D. Fuentes, L. Sánchez, J. A. dahl, B. Tessem, Detecting newsworthy events Fisteus, The news ontology: Design and in a journalistic platform, in: The 3rd European applications, Expert Systems with Appli- Data and Computational Journalism Conference, cations 37 (2010) 8694 – 8704. doi:10.1016/ j.eswa.2010.06.055. 2019, pp. 3–5. [9] B. Tessem, Analogical news angles from text [21] N. Fernández, J. M. Blázquez, J. A. Fisteus, similarity, in: Artificial Intelligence XXXVI, L. Sánchez, M. Sintek, A. Bernardi, M. Fuentes, A. Marrara, Z. Ben-Asher, News: Bringing se- mantic web technologies into news agencies, in: [29] P. Paikens, G. Barzdins, A. Mendes, D. C. Fer- The Semantic Web - ISWC 2006, 2006, pp. 778– reira, S. Broscheit, M. S. Almeida, S. Miranda, 791. doi:10.1007/11926078\_56. D. Nogueira, P. Balage, A. F. Martins, Summa [22] G. Leban, B. Fortuna, J. Brank, M. Grobelnik, at tac knowledge base population task 2016, in: Event registry: Learning about world events Proceedings of the Ninth Text Analysis Confer- from news, in: Proceedings of the 23rd In- ence (TAC), 2016. URL: https://tac.nist.gov/ ternational Conference on World Wide Web, publications/2016/participant.papers/ WWW ’14 Companion, Association for Comput- TAC2016.summa.proceedings.pdf. ing Machinery, 2014, pp. 107—-110. doi:10.1145/ [30] C. Rudnik, T. Ehrhart, O. Ferret, D. Teyssou, 2567948.2577024. R. Troncy, X. Tannier, Searching news articles us- [23] M. Kattenberg, Z. Beloki, A. Soroa, X. Artola, ing an event knowledge graph leveraged by wiki- A. Fokkens, P. Huygen, K. Verstoep, Two archi- data, in: Companion Proceedings of The 2019 tectures for parallel processing for huge amounts World Wide Web Conference, WWW ’19, Associ- of text, in: Proceedings of Language Resources ation for Computing Machinery, 2019, pp. 1232– and Evaluation Conference (LREC), European –1239. doi:10.1145/3308560.3316761. Language Resources Association (ELRA), 2016, pp. 4513––4519. URL: https://www.aclweb.org/ anthology/L16-1714. [24] M. Rospocher, M. van Erp, P. Vossen, A. Fokkens, I. Aldabe, G. Rigau, A. Soroa, T. Ploeger, T. Bogaard, Building event-centric knowl- edge graphs from news, Journal of Web Semantics 37-38 (2016) 132–151. doi:10.1016/ j.websem.2015.12.004. [25] P. Vossen, R. Agerri, I. Aldabe, A. Cybulska, M. van Erp, A. Fokkens, E. Laparra, A.-L. Mi- nard, A. P. Aprosio, G. Rigau, M. Rospocher, R. Segers, Newsreader: Using knowledge re- sources in a cross-lingual reading machine to generate more knowledge from massive streams of news, Special Issue Knowledge-Based Sys- tems, Elsevier 110 (2016) 60–85. doi:10.1016/ j.knosys.2016.07.013. [26] U. Germann, R. Liepins, D. Gosko, G. Barzdins, Integrating multiple NLP technologies into an open-source platform for multilingual media monitoring, in: Proceedings of Workshop for NLP Open Source Software (NLP-OSS), Associa- tion for Computational Linguistics, 2018, pp. 47– 51. doi:10.18653/v1/W18-2508. [27] U. Germann, R. Liepins, G. Barzdins, D. Gosko, S. Miranda, D. Nogueira, The SUMMA plat- form: A scalable infrastructure for multi-lingual multi-media monitoring, in: Proceedings of ACL 2018, System Demonstrations, Association for Computational Linguistics, 2018, pp. 99–104. doi:10.18653/v1/P18-4017. [28] S. a. Miranda, D. Nogueira, A. Mendes, A. Vla- chos, A. Secker, R. Garrett, J. Mitchel, Z. Mar- inho, Automated fact checking in the news room, in: The World Wide Web Conference, WWW ’19, Association for Computing Machinery, 2019, pp. 3579—-3583. doi:10.1145/3308558.3314135.