1. Introduction

Lifting News into a Journalistic Knowledge Platform

Tareq Al-Moslmi

Marc Gallofré Ocaña

0 0 University of Bergen , Fosswinckelsgt. 6, Postboks 7802, 5020 Bergen , Norway

A massive amount of news is being shared online by individuals and news agencies, making it dificult to take advantage of these news and analyse them in traditional ways. In view of this, there is an urgent need to use recent technologies to analyse all news relevant information that is being shared in natural language and convert it into forms that can be more easily and precisely processed by computers. Knowledge Graphs (KGs) ofer ofer a good solution for such processing. Natural Language Processing (NLP) ofers the possibility for mining and lifting natural language texts to knowledge graphs allowing to exploit its semantic capabilities, facilitating new possibilities for news analysis and understanding. However, the current available techniques are still away from perfect. Many approaches and frameworks have been proposed to track and analyse news in the last few years. The shortcomings of those systems are that they are static and not updateable, are not designed for largescale data volumes, did not support real-time processing, dealt with limited data resources, used traditional lifting pipelines and supported limited tasks, or have neglected the use of knowledge graphs to represent news into a computer-processable form. Therefore, there is a need to better support lifting natural language into a KG. With the continuous development of NLP techniques, the design of new dynamic NLP lifters that can cope with all the previous shortcomings is required. This paper introduces a general NLP lifting architecture for automatically lifting and processing news reports in real-time based on the recent development of the NLP methods.

eol>Natural language processing (NLP) Journalistic knowledge platforms Knowledge Graphs Computational journalism Stream data processing Semantic technologies Big data

1. Introduction

about news being shared on the web and social media networks. JKPs have become crucial for press indusFor several years we have seen how the traditional try. Yet, many works have proposed to process the news press has moved to online content and new news texts in many diferent ways in order to apply online press has appeared, publishing more online diferent JKP processes. content than ever. Social networks enhanced that Our group have been developing a series of JKP prophenomenon facilitating real-time interactions and totypes called News Hunter [ 1, 2, 3 ] in collaboration sharing, allowing pre-news to come to the surface, with a developer of newsroom tools for the internaand bringing users with newer ways to digest news. tional market. News Hunter moves forward the JKP Analysing news in real-time for supporting jour- to address the journalistic needs proposing a system nalist work requires lifting those news to machine- to harvest real-time news stories from RSS feeds and understandable formats. Semantic representation of social media, lifting news using SOTA approaches, and news using knowledge graphs is one of such formats representing stories into knowledge graphs using Sethat could be employed. Since news texts are ex- mantic Web standard technologies, Linked Open Data pressed as natural language, there is a crucial need and NIF formats. News Hunter also explores detection for processing and lifting these texts into a knowledge and suggestion of news angles and exploitation of Segraph. mantic Web to support journalistic work [ 4, 5, 6, 7, 8 ].

This paper presents an NLP lifting architecture Diferently from previous works, our introduced component of the Journalistic Knowledge Platforms NLP subsystem’s architecture for News Hunter aims (JKP) for lifting natural language news text into knowl- to lift all processed news into a semantic knowledge edge graphs. JKP is a system intended for analysing, graph in real-time. Moreover, two Natural Language lifting, and representing news using knowledge graphs Processing (NLP) lifting tracks could be chosen: the to support journalists exploiting knowledge from and traditional pipeline and the end-to-end which follows the state-of-the-art (SOTA) development of deep Proceedings of the CIKM 2020 Workshops, October 19-20, Galway, Ireland. neural network. That would avoid some limitations email: Tareq.Al-Moslmi@uib.no (T. Al-Moslmi); reported in previous lifting tasks [ 9, 10 ]. Marc.Gallofre@uib.no (M. Gallofré Ocaña) The rest of the paper is organised as follows: Sec(oMrc.iGda:l0lo0f0r0é-O00c0a2ñ-a5)296-2709 (T. Al-Moslmi); 0000-0001-7637-3303 tion 2 presents the background for our work. Section © 2020 Copyright for this paper by its authors. Use permitted under Creative 3 introduced the general architecture of JKP. Section CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmUmoRns WLiceonrsekAsthtriobuptioPnr4o.0cIneteerdnaitniognasl ((CCC EBYU4R.0)-.WS.org) 4 constitutes the bulk of the paper and introduces the general NLP lifting process for real-time news lifting tection and temporal relation detection over four difto a knowledge graph. Section 5 concludes the paper ferent languages dealing and millions of news artiand outlines plans for future work. cles. The NLP pipeline processes each item starting with linguistic techniques (tokenizer, PoS, multiwords tagger), traditional NER and NEL (based on DBpe2. Related Work dia Spotlight), opinion miner, semantic role labeler, event resolution, temporal recognizer and causal and Current JKPs [ 11, 12, 13, 14, 15, 16 ] deal with big data factuality relation extraction. To overcome the large multilingual text and multimedia sources of news- amount of news articles, NewsReader implemented related items from which they have implemented their its NLP pipeline using Big Data oriented technologies diferent NLP pipelines. These JPKs implemented NLP (i.e., Hadoop and Storm) into an scalable and real-time pipelines for lifting news into knowledge graphs and system [14]. detect events normally by using traditional Named Big data, multimedia and multilingual sources toEntity Recognition (NER) and Named Entity Linking gether are encountered in SUMMA project [15] which (NEL) systems, and pre-processed news text using is an open-source platform for automated, scalable and linguistic techniques such as Part-of-Speech tagging distributed monitoring of real-time media broadcasts (PoS), tokenisation, lemmatization and translation. In to support news agencies work like BBC or Deutsche addition, NEWS project [ 11 ] used pattern matching Welle. The platform is built using big data-oriented to detect events, implemented NEL using PageRank technologies and services running in Docker2 containand classified items, concepts and events using IPTC ers. SUMMA converts multimedia sources into text codes. NewsReader [13] used DBpedia Spotlight for which is translated into English when found in other NEL and mined opinion, causal, factual, temporal and languages. Then, the text is processed through a NLP semantic role information from news. ASRAEL [16] pipeline which classify them by topic using a hierarused SpaCy for NER, ADEL for NEL and Wikidata for chical attention model, cluster them into storylines uslinking events. SUMMA [15] used support vector ma- ing clustering algorithms, and represent them using chines (SVM) for NEL and classified topics from news. traditional NER (dependency parsing) and NEL (SVMAnd both EvenRegistry [12] and SUMMA [15] used Ranking) techniques. clustering techniques to detect events. Likewise, the previous works ASRAEL project [16]

The NEWS project [ 17, 11 ] aimed to provide fresh uses knowledge graphs to represent events in news armultilingual information to news agencies (Spanish ticles for searching purposes. To do so, they map AFP EFE and Italian ANSA agencies) analysing both tex- articles to Wikidata using NER (based on spaCy) and tual and multimedia items. NEWS uses Ontology Ltd. the NEL system ADEL. (currently part of EXFO Nova Context real-time active As observed in the previous works there is a need topology platform1 to implement the NLP pipeline for big data, real-time and semantic technologies apto provide item categorization, concept representa- proaches to deal with high volumes of news items that tion, abstract generation, event recognition and NER comes from multilingual and multimedia sources, and using the ITPC codes. The NLP pipeline combines a common interest for detecting events among jourboth linguistic techniques (patterns and rules such as nalists and the diferent projects. Moreover, the proPoS tagging) and traditional NER and NEL techniques posed NLP techniques follow traditional approaches (statistical techniques and PageRank). For recogniz- and similar pipelines which may not be always suiting events, NEWS project used pattern recognition able for big data and real-time or for providing the best techniques to describe and find the desired events. results.

The process of recognizing events is a relevant fea- Many approaches for lifting natural language to ture of such systems, which is approached in many knowledge graphs are based on previous-generation diferent ways. For example, Event Registry [12] uses NER techniques, and new lifting approaches that add clustering algorithms to detect and group similar ar- disambiguation and linking to recent best-of-breed ticles which represent the same event. Following the NE recognisers are needed . There is also a lack of central idea of events, NewsReader project [13, 18] standards for comparing lifting approaches[ 10 ]. This proposed a method, tools and a system to automati- can partly be attributed to a lack of commonly accally leverage and represent events from news. cepted benchmarks, but it also a consequence of the

The NewsReader NLP pipeline performs language recognition-disambiguation-linking pipeline. For exspecific NER and NEL, event and semantic role de1https://www.exfo.com/en/ontology/ 2www.docker.com or multiple sources of interest. Due to the high amount of news items and their velocity of production, the harvested items are represented using standard lightweight formats like JSON, in order to facilitate its parsing, execution, transfer, sharing between Figure 1: News Hunter architecture [ 2 ] components and temporal storage. News items are gathered together with its associated metadata (e.g., URL, source, author, ID, timestamp) which is included ample, it is hard to fairly compare pure NER with com- in the JSON files to benefit, speed and simplify its bined NER-NED-NEL techniques, when the latter is further processing and NLP tasks. restricted to identifying named entities in the KB that News items are processed according to their source: is used for disambiguation and linking. Moreover, tra- social media or news agencies . The news histories ditional sequential steps are now being integrated by coming from news agencies (RSS feeds, news webjoint learning or end-to-end processes. Consequently, sties or archive) in JSON format are lifted into the mentions and entities that were previously analysed knowledge graph as RDF triples using the NLP lifter, in isolation are now being lifted in each other’s con- which can be adapted to the domain specific of the text. The current culmination of these trends are the news history (e.g., economics, politics, sports). On the deep-learning approaches that reported promising re- other hand, the news items coming from social mesults recently. Most of those developments are not dia can be either pre-news (i.e., real-time information considered in previous works and this paper targets about events or something that is happening at the to cope with these gaps. moment but not yet or incomplete as news histories) or small summaries/abstracts about news. Thus, identifying the topic they are related to and cluster them 3. Journalistic Knowledge into groups of pre-news items that represent the same Platform architecture event and topic facilitates its processing. As these clusters of pre-news items represent a potential event In our previous work on News Hunter[ 2 ] we have pro- with richer information that a single one item, they posed a general architecture for journalistic knowl- can be lifted using NLP techniques into the Knowledge edge platforms (Figure 1) which is intended for big Graphs. data real-time news lifting and processing. The still Furthermore, as the social media items are potenevolving architecture consists of 5 main parts: (1) The tial real-time pre-news or events which can be breakharvesting system which harvests the news from the ing news, they are of highly importance for journalweb (e.g., RSS feeds, Facebook, Twitter) or daily pro- ists. Yet, the clusters are analysed and monitored in duced in-house news (e.g., agency daily news activity) order to find trends or breaking news events, that are and its associated metadata (e.g., URL, source, author, reported in real-time to journalists. ID, timestamp), and represents them using JSON in or- In this paper, we are introducing the NLP lifting arder to facilitate its parsing, transferring and simplify- chitecture that received the input from the harvester ing it further processing. (2) The data lake or storage that have been explained previously[ 2 ]. The harvester system for big data and real-time which is designed is taking care of getting the data from diferent sources for sharing the news items across the diferent pro- and standardise the data type into a unified format like cesses. (3) The semantic news component which con- JSON, XML, or NIF. The text can be stored in a bigtains the NLP lifter and the semantic DB (knowledge data oriented databases such as Apache Cassandra 3 or graph). (4) The semantic and streaming news analysis HBase 4, which are oriented for distribution and largeservices, which due to the importance of social media scale processing pipelines. Moreover, the text can be can provide real-time analysis like trend monitoring, distributed along the diferent NLP tasks using API or and event detection. (5) The service layer which al- distribution framework like Kafka 5 or RabbitMQ 6. lows users interact with the JKP. The NLP liifter then has to deal with the data and lift

News items can be collected from multiple sources: it into a proper semantic format that will then be inonline news (e.g., RSS feeds), social media (e.g., Face- serted to the KG. book, Twitter), archives or daily produced in-house news (e.g., agency daily news activity). The news crawler is oriented to harvest news from any source 3https://cassandra.apache.org 4https://hbase.apache.org 5https://kafka.apache.org 6https://www.rabbitmq.com

4. NLP lifter

ing, and structural parsing. Recent works indicate that robust lifting systems require accurate tuning of sevThis section describes the NLP lifter for news natural eral steps, especially tokenization and semantic similanguage texts to knowledge graphs. The NLP lifter larity [21]. Recently, deep neural networks, especially which is a component of the JKP architecture consists end-to-end methods, have reduced the need for preof the main NLP lifting tasks as well as some addi- processing steps. Moreover, using deep neural nettional related tasks. Diferently from others proposed works for pre-processing tasks such as tokenization systems, our proposed NLP lifter is docker-based and has recently produced promising results [22]. The procontains the most possible tasks (traditional and re- posed NLP lifter could include as many pre-processing cently developed ones) as shown in Figure 2. This steps as possible, which will be in separate dockers, so allow the development of the platform and ensure us- the user can choose all suitable ones for the target data. ing the most recent technology all the time. There will be two main NLP tracks: the traditional pipeline that 4.2. Named entity recognition is updated by recent technologies and the end-to-end track which is the SOTA in many tasks. In addition, Named entity recognition is the task that identifies there is the ensemble service that could combine more the named entities contained in the text like perthan one lifter to produce better results. The purpose sons, locations, organizations, time, date, money, etc. and advantage of this is that the user can choose to NER approaches could be categorised into three main use the most suitable track for his case and data as groups: knowledge-based approaches, learning-based well as the most recent techniques. In the traditional methods, and feature-inferring neural network methpipeline the tasks like NER, NED, and NEL are imple- ods. Despite the existence of recent SOTA NER remented separately and mostly using the of-the-shelf sults (especially recent deep NN approaches) such software. The of-the-shelf systems are usually based as [23, 24, 25, 26], these approaches have not been ution old approaches and their performance is not the lized and exploited in the process of lifting natural lanSOTA. Moreover, traditional lifting methods neglect guage to knowledge graphs as mentioned earlier. This the relations between entity types and entity context. paper aims to implement those SOTA NER methods in However, there will be a possibility in our introduced docker-based components to tackle this shortcoming. architecture to ensure the using of the most updated ones or using newest systems by just replacing or 4.3. Named entity linking adding their dockers to the related component. The news item annotation ontology that has already been designed by[ 7 ] defines how the semantic annotations of news items should be represented in the knowledge graph. Each harvested news item is associated with one or more annotations, which may be, for example, named entities, concepts, topics, times or geolocations or relations between annotations. The ontology also describes how the sources of news items and annotations are represented in the knowledge graph to maintain provenance[ 7 ]. We describe the general NLP lifter components as the following:

NEL annotates each mention in a text with the iden

tifier of its corresponding entity that is described in a KB in the LOD cloud. Our paper has defined NEL as a wider task that includes NED as one of its processes.

Many NEL approaches are utilizing of-the-shelf systems for NER task. It is, however, a challenging task to choose which particular model to use for those systems. That is because it requires to estimate the similarity level between the system’s training datasets and the dataset that needs to be processed in which we strive to accurately recognize entities, according to [27]. Most recent SOTA systems on AIDA-CoNLL 4.1. Pre-processing dataset includes [28, 29, 30, 31]. There is no perfect NEL model for all datasets and one model might be the The quality of the data plays a key role in determin- best on one dataset but perform poorly on others. Acing the suitable pre-processing techniques. Since we cordingly, having the top N best SOTA implemented are dealing with the real-time streaming, the cleaning in dockers will allow the user to pick the most suitable and normalization are required to remove unnecessary model for his data and/or replace or update them at or noisy terms (like ASCII codes, currency symbols, any time when needed. hashtags, and so forth). The most frequently used preprocessing techniques are tokenization and POS tagging [19, 20]. Other common steps are sentence splitting, lemmatisation, chunking and dependency pars4.4. End-to-end track named entities and concepts) and reported the SOTA results. Similar to previous components, the proposed lifter will implement those methods and include them as optional tasks as many others for the user.

The majority of previous studies were mostly assuming the availability of mentions and entities and focused on the disambiguation process only. However, leveraging mutual dependency between mentions and their entities is neglected. Moreover, it is not a practi- 4.6. User-oriented tasks cal idea in a real-world application. Diferent from that User-oriented tasks include those tasks specific and and to overcome those shortcomings, end-to-end deals personalised for the project where the NLP lifting with row text and aims to extract all mentions and link architecture is implemented. Apart from including them to their entities in the knowledge base. End-to- SOTA NLP tasks like the previously described, the end entity linking has been recently proposed and is NLP lifting architecture takes into account purpose receiving increasing attention. Few studies have been specific tasks such as news angles detection, event published which claiming the application of the end- detection, IPTC media codes annotation, rumours deto-end approach [ 32, 33, 34, 35 ]. The most interest- tection and text completion. ing ones are the most recent neural-based end-to-end linking models [ 36, 37, 38, 39 ]. One of the most recent SOTA is [ 38 ] followed by [ 36 ]. Our NLP lifter aims 4.7. Knowledge graph at including such techniques as an alternative recent In a knowledge graph, the nodes represent either contrack for lifting news texts into a semantic knowledge crete objects, concepts, information resources, or data graph. about them, and the edges represent semantic relations between the nodes [ 40 ]. Knowledge graphs thus 4.5. Relation and concept extraction ofer a widely used format for representing information in computer-processable form. They build on, and are heavily inspired by, Tim Berners-Lee’s vision of the semantic web, a machine-processable web of data that Our NLP lifter aims at covering lifting of general concepts and of relations between entities. Many recent approaches also lift relations jointly with entities (both augments the original web of human-readable docu- keep updated the NLP models. ments [ 41 ]. Knowledge graphs can therefore leverage existing standards such as RDF, RDFS, and OWL.

Moreover, the constructed knowledge graph could Acknowledgments be used to implement more operations like question answering, knowledge graph-based sentence autocompletion, storytelling, fact-checking and so forth using semantic news analysis.

Supported by the News Angler project funded by the Norwegian Research Council’s IKTPLUSS programme as project 275872. 5. Conclusion

Lifting high-volume streams of news texts involves representing their content in machine-understandable formats. KGs is one such formats that has received much attention recently. NLP lifters are an important prerequisite for making the abundance of natural language news on the internet available as computerprocessable knowledge graphs. Thus, the presented NLP lifting pipeline provides with an structured and formalised process for transforming natural language text into computer-processable knowledge graphs.

The presented pipeline can incorporate any NLP technique like traditional or end-to-end approaches and combining its results or expand them with specificpurpose NLP method like sentiment analysis. Moreover, the introduced NLP lifter is designed to simplify its components replaceability by making use of docker technology, facilitating e.g., the update of all tasks and methods to SOTA approaches. Although the proposed JKP is designed mainly to help journalists, it could be used and customized for the public. The presented NLP lifting architecture aims to be used as reference for developers and researchers of JKP interested in real-time NEL. News organisations may need to adapt their systems, replace components, add new SOTA technologies, or integrate it with other JKP, thus having such NLP lifting pipeline as reference facilitate its management and understanding. Furthermore, it is not restricted to news text and could be used to lift other types of texts.

In our future work, we plan to validate the results of our proposed NLP lifter by using both a manually collected and annotated corpus of news and goldstandards, and compare the results of our proposed lifter with current NEL systems such as ADEL, SpaCy lifter, NewsReader, Stanford CoreNLP or DBpedia Spotlight. Besides, we want to explore the possibilities that validations tools like GERBIL [ 42 ] can provide when applied inside our NLP lifter. We believe that validation tools can provide insights about the evolution and performance of the applied NLP processes which can be incorporated to reinforce, improve and cations, Expert Systems with Applications 37 moroch, S. Shah, Pangloss: Fast entity linking (2010) 8694 – 8704. in noisy text environments, in: Proceedings of [12] G. Leban, B. Fortuna, J. Brank, M. Grobelnik, the 24th ACM SIGKDD, KDD ’18, ACM, 2018, p.

Event registry: learning about world events from 168–176. news, in: Proceedings of the 23rd WWW, ACM, [22] T. Boros, S. D. Dumitrescu, R. Burtica, NLP-cube: 2014, pp. 107–110. End-to-end raw text processing with neural net[13] P. Vossen, R. Agerri, I. Aldabe, A. Cybulska, works, in: Proceedings of the CoNLL 2018, ACL, M. van Erp, A. Fokkens, E. Laparra, A.-L. Mi- 2018, pp. 171–179. nard, A. P. Aprosio, G. Rigau, M. Rospocher, [23] A. Baevski, S. Edunov, Y. Liu, L. Zettlemoyer, R. Segers, Newsreader: Using knowledge re- M. Auli, Cloze-driven pretraining of selfsources in a cross-lingual reading machine to attention networks, in: Proceedings of the 2019 generate more knowledge from massive streams Conference on EMNLP and the 9th IJCNLP, ACL, of news, Knowledge-Based Systems 110 (2016) 2019, pp. 5359–5368.

60 – 85. [24] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, [14] M. Kattenberg, Z. Beloki, A. Soroa, X. Artola, Bert: Pre-training of deep bidirectional transA. Fokkens, P. Huygen, K. Verstoep, Two formers for language understanding, 2018. architectures for parallel processing of huge arXiv:1810.04805. amounts of text, in: Proceedings of the Tenth [25] M. Peters, M. Neumann, M. Iyyer, M. Gardner, LREC’16), European Language Resources Asso- C. Clark, K. Lee, L. Zettlemoyer, Deep contextuciation (ELRA), 2016, pp. 4513–4519. alized word representations, in: Proceedings of [15] U. Germann, P. v. d. Kreeft, G. Barzdins, A. Birch, the 2018 Conference of the NAACL, ACL, 2018, The summa platform: Scalable understanding of pp. 2227–2237. multilingual media, in: Proceedings of the 21st [26] L. Liu, X. Ren, J. Shang, X. Gu, J. Peng, J. Han, EfAnnual Conference of the European Association ifcient contextualized representation: Language for Machine Translation, 2018. model pruning for sequence labeling, in: Pro[16] C. Rudnik, T. Ehrhart, O. Ferret, D. Teyssou, ceedings of the 2018 Conference on EMNLP, R. Troncy, X. Tannier, Searching news articles ACL, 2018, pp. 1215–1225. using an event knowledge graph leveraged by [27] J. Plu, G. Rizzo, R. Troncy, Enhancing entity Wikidata, in: 30th WWW Conference, 13-17 May linking by combining ner models, in: H. Sack, 2019, 2019. S. Dietze, A. Tordai, C. Lange (Eds.), Semantic [17] N. Fernández, J. M. Blázquez, J. A. Fisteus, Web Challenges, Springer International PublishL. Sánchez, M. Sintek, A. Bernardi, M. Fuentes, ing, Cham, 2016, pp. 17–32.

A. Marrara, Z. Ben-Asher, News: Bringing [28] J. Raiman, O. Raiman, Deeptype: Multilingual semantic web technologies into news agencies, entity linking by neural type system evolution, in: I. Cruz, S. Decker, D. Allemang, C. Preist, 2018. arXiv:1802.01021.

D. Schwabe, P. Mika, M. Uschold, L. M. Aroyo [29] I. Yamada, H. Shindo, Pre-training of deep (Eds.), The Semantic Web - ISWC 2006, Springer contextualized embeddings of words and enBerlin Heidelberg, Berlin, Heidelberg, 2006, pp. tities for named entity disambiguation, 2019. 778–791. arXiv:1909.00426. [18] M. Rospocher, M. van Erp, P. Vossen, A. Fokkens, [30] Z. Fang, Y. Cao, Q. Li, D. Zhang, Z. Zhang, I. Aldabe, G. Rigau, A. Soroa, T. Ploeger, T. Bo- Y. Liu, Joint entity linking with deep reinforcegaard, Building event-centric knowledge graphs ment learning, in: The WWW Conference, from news, Journal of Web Semantics 37-38 WWW ’19, ACM, 2019, p. 438–447. (2016) 132 – 151. [31] A. Luo, S. Gao, Y. Xu, Deep semantic match [19] G. Zhu, C. A. Iglesias, Exploiting semantic simi- model for entity linking using knowledge graph larity for named entity disambiguation in knowl- and text, Procedia Computer Science 129 (2018) edge graphs, Expert Systems with Applications 110 – 114. 2017 International Conference on 101 (2018) 8 – 24. Identification, Information and Knowledge in the [20] M. Fossati, E. Dorigatti, C. Giuliano, N-ary rela- Internet of Things.

tion extraction for simultaneous t-box and a-box [32] A. Moro, A. Raganato, R. Navigli, Entity linkknowledge base augmentation, Semantic Web 9 ing meets word sense disambiguation: a unified (2018) 413–439. approach, Transactions of the ACL 2 (2014) 231– [21] M. Conover, M. Hayes, S. Blackburn, P. Sko- 244. arXiv:10.1162/tacl_a_00179.

[1]

Berven ,

Christensen ,

Moldeklev ,

Opdahl ,

Villanger , News hunter: building and mining knowledge graphs for newsroom systems , in: NOKOBIT , volume 26 , 2018 .

[2]

Gallofré Ocaña ,

Nyre ,

A. L.

Opdahl ,

Tessem ,

Trattner ,

Veres , Towards a big data platform for news angles , in: 4th Norwegian Big Data Symposium (NOBIDS) 2018 , 2018 .

[3]

Berven ,

Christensen ,

Moldeklev ,

Opdahl ,

Villanger , A knowledge graph platform for newsrooms, Computers in Industry ( 2020 ). To appear.

[4]

Tessem ,

A. L.

Opdahl , Supporting journalistic news angles with models and analogies , in: 2019 13th RCIS , IEEE, 2019 , pp. 1 - 7 .

[5]

A. L.

Opdahl ,

Tessem , Towards ontological support for journalistic angles , in: Enterprise, Business-Process and Information Systems Modeling , Springer International Publishing, 2019 , pp. 279 - 294 .

[6]

Tessem , Analogical news angles from text similarity , in: Artificial Intelligence XXXVI , Springer International Publishing, 2019 , pp. 449 - 455 .

[7]

A. L.

Opdahl ,

Tessem , Ontologies for finding journalistic angles , Software and Systems Modeling ( 2020 ) 1 - 17 .

[8]

Motta ,

Daga ,

A. L.

Opdahl ,

Tessem , Analysis and design of computational news angles , IEEE Access ( 2020 ).

[9]

Albared ,

M. Gallofré

Ocaña ,

Ghareb , T. AlMoslmi, Recent progress of named entity recognition over the most popular datasets , in: 2019 First International Conference of Intelligent Computing and Engineering (ICOICE) , 2019 , pp. 1 - 9 .

[10]

Al-Moslmi ,

M. Gallofré

Ocaña ,

A. L.

Opdahl ,

Veres , Named entity extraction for knowledge graphs: A literature overview , IEEE Access 8 ( 2020 ) 32862 - 32881 .

[11]

Fernández ,

Fuentes ,

Sánchez ,

J. A.

Fisteus , The news ontology: Design and appli-

[33]

O.-E.

Ganea ,

Lucchi ,

Eickhof ,

Hofmann , Probabilistic bag-of-hyperlinks model for entity linking , in: Proceedings of the 25th WWW, WWW '16 ,

WWW

Conference , 2016 , p. 927 - 938 .

[34] D. B. Nguyen , M. Theobald , G. Weikum, Jnerd: Joint named entity recognition and disambiguation with rich linguistic features , Transactions of the ACL 4 ( 2016 ) 215 - 229 . arXiv: 10 .1162/tacl_a_ 00094 .

[35]

O.-E.

Ganea , T. Hofmann, Deep joint entity disambiguation with local neural attention , in: Proceedings of the 2017 Conference on EMNLP, ACL , 2017 , pp. 2619 - 2629 .

[36]

Kolitsas ,

O.-E.

Ganea , T. Hofmann, End-to-end neural entity linking , in: Proceedings of the 22nd Conference on Computational Natural Language Learning , ACL, 2018 , pp. 519 - 529 .

[37]

Cao ,

Hou ,

Li ,

Liu , Neural collective entity linking , 2018 . arXiv: 1811 .08603.

[38]

Le , I. Titov , Improving entity linking by modeling latent relations between mentions , in: Proceedings of the 56th ACL, ACL , 2018 , pp. 1595 - 1604 .

[39]

P. H.

Martins ,

Marinho ,

A. F. T.

Martins , Joint learning of named entity recognition and entity linking , in: Proceedings of the 57th ACL, ACL , 2019 , pp. 190 - 196 .

[40]

Allemang ,

Hendler , Semantic Web for the Working Ontologist , second edition ed., Morgan Kaufmann, 2011 .

[41]

Berners-Lee ,

Hendler ,

Lassila , et al., The semantic web , Scientific american 284 ( 2001 ) 28 - 37 .

[42]

Röder ,

Usbeck ,

A. N.

Ngomo , GERBIL - benchmarking named entity recognition and linking consistently , Semantic Web 9 ( 2018 ) 605 - 625 .