=Paper=
{{Paper
|id=Vol-2699/paper42
|storemode=property
|title=Lifting News into a Journalistic Knowledge Platform
|pdfUrl=https://ceur-ws.org/Vol-2699/paper42.pdf
|volume=Vol-2699
|authors=Tareq Al-Moslmi,Marc Gallofré Ocaña
|dblpUrl=https://dblp.org/rec/conf/cikm/Al-MoslmiO20
}}
==Lifting News into a Journalistic Knowledge Platform==
Lifting News into a Journalistic Knowledge Platform Tareq Al-Moslmia , Marc Gallofré Ocañaa a University of Bergen, Fosswinckelsgt. 6, Postboks 7802, 5020 Bergen, Norway Abstract A massive amount of news is being shared online by individuals and news agencies, making it difficult to take advantage of these news and analyse them in traditional ways. In view of this, there is an urgent need to use recent technologies to analyse all news relevant information that is being shared in natural language and convert it into forms that can be more easily and precisely processed by computers. Knowledge Graphs (KGs) offer offer a good solution for such processing. Natural Language Processing (NLP) offers the possibility for mining and lifting natural language texts to knowledge graphs allowing to exploit its semantic capabilities, facilitating new possibilities for news analysis and understanding. However, the current available techniques are still away from perfect. Many approaches and frameworks have been proposed to track and analyse news in the last few years. The shortcomings of those systems are that they are static and not updateable, are not designed for large- scale data volumes, did not support real-time processing, dealt with limited data resources, used traditional lifting pipelines and supported limited tasks, or have neglected the use of knowledge graphs to represent news into a computer-processable form. Therefore, there is a need to better support lifting natural language into a KG. With the continuous development of NLP techniques, the design of new dynamic NLP lifters that can cope with all the previous shortcomings is required. This paper introduces a general NLP lifting architecture for automatically lifting and processing news reports in real-time based on the recent development of the NLP methods. Keywords Natural language processing (NLP), Journalistic knowledge platforms, Knowledge Graphs, Computational journalism, Stream data processing, Semantic technologies, Big data 1. Introduction about news being shared on the web and social media networks. JKPs have become crucial for press indus- For several years we have seen how the traditional try. Yet, many works have proposed to process the news press has moved to online content and new news texts in many different ways in order to apply online press has appeared, publishing more online different JKP processes. content than ever. Social networks enhanced that Our group have been developing a series of JKP pro- phenomenon facilitating real-time interactions and totypes called News Hunter [1, 2, 3] in collaboration sharing, allowing pre-news to come to the surface, with a developer of newsroom tools for the interna- and bringing users with newer ways to digest news. tional market. News Hunter moves forward the JKP Analysing news in real-time for supporting jour- to address the journalistic needs proposing a system nalist work requires lifting those news to machine- to harvest real-time news stories from RSS feeds and understandable formats. Semantic representation of social media, lifting news using SOTA approaches, and news using knowledge graphs is one of such formats representing stories into knowledge graphs using Se- that could be employed. Since news texts are ex- mantic Web standard technologies, Linked Open Data pressed as natural language, there is a crucial need and NIF formats. News Hunter also explores detection for processing and lifting these texts into a knowledge and suggestion of news angles and exploitation of Se- graph. mantic Web to support journalistic work [4, 5, 6, 7, 8]. This paper presents an NLP lifting architecture Differently from previous works, our introduced component of the Journalistic Knowledge Platforms NLP subsystem’s architecture for News Hunter aims (JKP) for lifting natural language news text into knowl- to lift all processed news into a semantic knowledge edge graphs. JKP is a system intended for analysing, graph in real-time. Moreover, two Natural Language lifting, and representing news using knowledge graphs Processing (NLP) lifting tracks could be chosen: the to support journalists exploiting knowledge from and traditional pipeline and the end-to-end which fol- lows the state-of-the-art (SOTA) development of deep Proceedings of the CIKM 2020 Workshops, October 19-20, Galway, Ireland. neural network. That would avoid some limitations email: Tareq.Al-Moslmi@uib.no (T. Al-Moslmi); reported in previous lifting tasks [9, 10]. Marc.Gallofre@uib.no (M. Gallofré Ocaña) The rest of the paper is organised as follows: Sec- orcid: 0000-0002-5296-2709 (T. Al-Moslmi); 0000-0001-7637-3303 (M. Gallofré Ocaña) tion 2 presents the background for our work. Section © 2020 Copyright for this paper by its authors. Use permitted under Creative 3 introduced the general architecture of JKP. Section Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 4 constitutes the bulk of the paper and introduces the general NLP lifting process for real-time news lifting tection and temporal relation detection over four dif- to a knowledge graph. Section 5 concludes the paper ferent languages dealing and millions of news arti- and outlines plans for future work. cles. The NLP pipeline processes each item starting with linguistic techniques (tokenizer, PoS, multiwords tagger), traditional NER and NEL (based on DBpe- 2. Related Work dia Spotlight), opinion miner, semantic role labeler, event resolution, temporal recognizer and causal and Current JKPs [11, 12, 13, 14, 15, 16] deal with big data factuality relation extraction. To overcome the large multilingual text and multimedia sources of news- amount of news articles, NewsReader implemented related items from which they have implemented their its NLP pipeline using Big Data oriented technologies different NLP pipelines. These JPKs implemented NLP (i.e., Hadoop and Storm) into an scalable and real-time pipelines for lifting news into knowledge graphs and system [14]. detect events normally by using traditional Named Big data, multimedia and multilingual sources to- Entity Recognition (NER) and Named Entity Linking gether are encountered in SUMMA project [15] which (NEL) systems, and pre-processed news text using is an open-source platform for automated, scalable and linguistic techniques such as Part-of-Speech tagging distributed monitoring of real-time media broadcasts (PoS), tokenisation, lemmatization and translation. In to support news agencies work like BBC or Deutsche addition, NEWS project [11] used pattern matching Welle. The platform is built using big data-oriented to detect events, implemented NEL using PageRank technologies and services running in Docker2 contain- and classified items, concepts and events using IPTC ers. SUMMA converts multimedia sources into text codes. NewsReader [13] used DBpedia Spotlight for which is translated into English when found in other NEL and mined opinion, causal, factual, temporal and languages. Then, the text is processed through a NLP semantic role information from news. ASRAEL [16] pipeline which classify them by topic using a hierar- used SpaCy for NER, ADEL for NEL and Wikidata for chical attention model, cluster them into storylines us- linking events. SUMMA [15] used support vector ma- ing clustering algorithms, and represent them using chines (SVM) for NEL and classified topics from news. traditional NER (dependency parsing) and NEL (SVM- And both EvenRegistry [12] and SUMMA [15] used Ranking) techniques. clustering techniques to detect events. Likewise, the previous works ASRAEL project [16] The NEWS project [17, 11] aimed to provide fresh uses knowledge graphs to represent events in news ar- multilingual information to news agencies (Spanish ticles for searching purposes. To do so, they map AFP EFE and Italian ANSA agencies) analysing both tex- articles to Wikidata using NER (based on spaCy) and tual and multimedia items. NEWS uses Ontology Ltd. the NEL system ADEL. (currently part of EXFO Nova Context real-time active As observed in the previous works there is a need topology platform1 to implement the NLP pipeline for big data, real-time and semantic technologies ap- to provide item categorization, concept representa- proaches to deal with high volumes of news items that tion, abstract generation, event recognition and NER comes from multilingual and multimedia sources, and using the ITPC codes. The NLP pipeline combines a common interest for detecting events among jour- both linguistic techniques (patterns and rules such as nalists and the different projects. Moreover, the pro- PoS tagging) and traditional NER and NEL techniques posed NLP techniques follow traditional approaches (statistical techniques and PageRank). For recogniz- and similar pipelines which may not be always suit- ing events, NEWS project used pattern recognition able for big data and real-time or for providing the best techniques to describe and find the desired events. results. The process of recognizing events is a relevant fea- Many approaches for lifting natural language to ture of such systems, which is approached in many knowledge graphs are based on previous-generation different ways. For example, Event Registry [12] uses NER techniques, and new lifting approaches that add clustering algorithms to detect and group similar ar- disambiguation and linking to recent best-of-breed ticles which represent the same event. Following the NE recognisers are needed . There is also a lack of central idea of events, NewsReader project [13, 18] standards for comparing lifting approaches[10]. This proposed a method, tools and a system to automati- can partly be attributed to a lack of commonly ac- cally leverage and represent events from news. cepted benchmarks, but it also a consequence of the The NewsReader NLP pipeline performs language recognition-disambiguation-linking pipeline. For ex- specific NER and NEL, event and semantic role de- 1 https://www.exfo.com/en/ontology/ 2 www.docker.com or multiple sources of interest. Due to the high amount of news items and their velocity of produc- tion, the harvested items are represented using stan- dard lightweight formats like JSON, in order to facil- itate its parsing, execution, transfer, sharing between Figure 1: News Hunter architecture [2] components and temporal storage. News items are gathered together with its associated metadata (e.g., URL, source, author, ID, timestamp) which is included ample, it is hard to fairly compare pure NER with com- in the JSON files to benefit, speed and simplify its bined NER-NED-NEL techniques, when the latter is further processing and NLP tasks. restricted to identifying named entities in the KB that News items are processed according to their source: is used for disambiguation and linking. Moreover, tra- social media or news agencies . The news histories ditional sequential steps are now being integrated by coming from news agencies (RSS feeds, news web- joint learning or end-to-end processes. Consequently, sties or archive) in JSON format are lifted into the mentions and entities that were previously analysed knowledge graph as RDF triples using the NLP lifter, in isolation are now being lifted in each other’s con- which can be adapted to the domain specific of the text. The current culmination of these trends are the news history (e.g., economics, politics, sports). On the deep-learning approaches that reported promising re- other hand, the news items coming from social me- sults recently. Most of those developments are not dia can be either pre-news (i.e., real-time information considered in previous works and this paper targets about events or something that is happening at the to cope with these gaps. moment but not yet or incomplete as news histories) or small summaries/abstracts about news. Thus, iden- tifying the topic they are related to and cluster them 3. Journalistic Knowledge into groups of pre-news items that represent the same event and topic facilitates its processing. As these Platform architecture clusters of pre-news items represent a potential event In our previous work on News Hunter[2] we have pro- with richer information that a single one item, they posed a general architecture for journalistic knowl- can be lifted using NLP techniques into the Knowledge edge platforms (Figure 1) which is intended for big Graphs. data real-time news lifting and processing. The still Furthermore, as the social media items are poten- evolving architecture consists of 5 main parts: (1) The tial real-time pre-news or events which can be break- harvesting system which harvests the news from the ing news, they are of highly importance for journal- web (e.g., RSS feeds, Facebook, Twitter) or daily pro- ists. Yet, the clusters are analysed and monitored in duced in-house news (e.g., agency daily news activity) order to find trends or breaking news events, that are and its associated metadata (e.g., URL, source, author, reported in real-time to journalists. ID, timestamp), and represents them using JSON in or- In this paper, we are introducing the NLP lifting ar- der to facilitate its parsing, transferring and simplify- chitecture that received the input from the harvester ing it further processing. (2) The data lake or storage that have been explained previously[2]. The harvester system for big data and real-time which is designed is taking care of getting the data from different sources for sharing the news items across the different pro- and standardise the data type into a unified format like cesses. (3) The semantic news component which con- JSON, XML, or NIF. The text can be stored in a big- tains the NLP lifter and the semantic DB (knowledge data oriented databases such as Apache Cassandra 3 or graph). (4) The semantic and streaming news analysis HBase 4 , which are oriented for distribution and large- services, which due to the importance of social media scale processing pipelines. Moreover, the text can be can provide real-time analysis like trend monitoring, distributed along the different NLP tasks using API or and event detection. (5) The service layer which al- distribution framework like Kafka 5 or RabbitMQ 6 . lows users interact with the JKP. The NLP liifter then has to deal with the data and lift News items can be collected from multiple sources: it into a proper semantic format that will then be in- online news (e.g., RSS feeds), social media (e.g., Face- serted to the KG. book, Twitter), archives or daily produced in-house 3 https://cassandra.apache.org news (e.g., agency daily news activity). The news 4 https://hbase.apache.org 5 https://kafka.apache.org crawler is oriented to harvest news from any source 6 https://www.rabbitmq.com 4. NLP lifter ing, and structural parsing. Recent works indicate that robust lifting systems require accurate tuning of sev- This section describes the NLP lifter for news natural eral steps, especially tokenization and semantic simi- language texts to knowledge graphs. The NLP lifter larity [21]. Recently, deep neural networks, especially which is a component of the JKP architecture consists end-to-end methods, have reduced the need for pre- of the main NLP lifting tasks as well as some addi- processing steps. Moreover, using deep neural net- tional related tasks. Differently from others proposed works for pre-processing tasks such as tokenization systems, our proposed NLP lifter is docker-based and has recently produced promising results [22]. The pro- contains the most possible tasks (traditional and re- posed NLP lifter could include as many pre-processing cently developed ones) as shown in Figure 2. This steps as possible, which will be in separate dockers, so allow the development of the platform and ensure us- the user can choose all suitable ones for the target data. ing the most recent technology all the time. There will be two main NLP tracks: the traditional pipeline that is updated by recent technologies and the end-to-end 4.2. Named entity recognition track which is the SOTA in many tasks. In addition, Named entity recognition is the task that identifies there is the ensemble service that could combine more the named entities contained in the text like per- than one lifter to produce better results. The purpose sons, locations, organizations, time, date, money, etc. and advantage of this is that the user can choose to NER approaches could be categorised into three main use the most suitable track for his case and data as groups: knowledge-based approaches, learning-based well as the most recent techniques. In the traditional methods, and feature-inferring neural network meth- pipeline the tasks like NER, NED, and NEL are imple- ods. Despite the existence of recent SOTA NER re- mented separately and mostly using the off-the-shelf sults (especially recent deep NN approaches) such software. The off-the-shelf systems are usually based as [23, 24, 25, 26], these approaches have not been uti- on old approaches and their performance is not the lized and exploited in the process of lifting natural lan- SOTA. Moreover, traditional lifting methods neglect guage to knowledge graphs as mentioned earlier. This the relations between entity types and entity context. paper aims to implement those SOTA NER methods in However, there will be a possibility in our introduced docker-based components to tackle this shortcoming. architecture to ensure the using of the most updated ones or using newest systems by just replacing or 4.3. Named entity linking adding their dockers to the related component. The news item annotation ontology that has already been NEL annotates each mention in a text with the iden- designed by[7] defines how the semantic annotations tifier of its corresponding entity that is described in a of news items should be represented in the knowledge KB in the LOD cloud. Our paper has defined NEL as a graph. Each harvested news item is associated with wider task that includes NED as one of its processes. one or more annotations, which may be, for example, Many NEL approaches are utilizing off-the-shelf sys- named entities, concepts, topics, times or geolocations tems for NER task. It is, however, a challenging task or relations between annotations. The ontology also to choose which particular model to use for those describes how the sources of news items and anno- systems. That is because it requires to estimate the tations are represented in the knowledge graph to similarity level between the system’s training datasets maintain provenance[7]. We describe the general NLP and the dataset that needs to be processed in which lifter components as the following: we strive to accurately recognize entities, according to [27]. Most recent SOTA systems on AIDA-CoNLL dataset includes [28, 29, 30, 31]. There is no perfect 4.1. Pre-processing NEL model for all datasets and one model might be the The quality of the data plays a key role in determin- best on one dataset but perform poorly on others. Ac- ing the suitable pre-processing techniques. Since we cordingly, having the top N best SOTA implemented are dealing with the real-time streaming, the cleaning in dockers will allow the user to pick the most suitable and normalization are required to remove unnecessary model for his data and/or replace or update them at or noisy terms (like ASCII codes, currency symbols, any time when needed. hashtags, and so forth). The most frequently used pre- processing techniques are tokenization and POS tag- ging [19, 20]. Other common steps are sentence split- ting, lemmatisation, chunking and dependency pars- Figure 2: General NLP lifting architecture 4.4. End-to-end track named entities and concepts) and reported the SOTA results. Similar to previous components, the proposed The majority of previous studies were mostly assum- lifter will implement those methods and include them ing the availability of mentions and entities and fo- as optional tasks as many others for the user. cused on the disambiguation process only. However, leveraging mutual dependency between mentions and their entities is neglected. Moreover, it is not a practi- 4.6. User-oriented tasks cal idea in a real-world application. Different from that User-oriented tasks include those tasks specific and and to overcome those shortcomings, end-to-end deals personalised for the project where the NLP lifting with row text and aims to extract all mentions and link architecture is implemented. Apart from including them to their entities in the knowledge base. End-to- SOTA NLP tasks like the previously described, the end entity linking has been recently proposed and is NLP lifting architecture takes into account purpose receiving increasing attention. Few studies have been specific tasks such as news angles detection, event published which claiming the application of the end- detection, IPTC media codes annotation, rumours de- to-end approach [32, 33, 34, 35]. The most interest- tection and text completion. ing ones are the most recent neural-based end-to-end linking models [36, 37, 38, 39]. One of the most recent SOTA is [38] followed by [36]. Our NLP lifter aims 4.7. Knowledge graph at including such techniques as an alternative recent In a knowledge graph, the nodes represent either con- track for lifting news texts into a semantic knowledge crete objects, concepts, information resources, or data graph. about them, and the edges represent semantic rela- tions between the nodes [40]. Knowledge graphs thus 4.5. Relation and concept extraction offer a widely used format for representing informa- tion in computer-processable form. They build on, and Our NLP lifter aims at covering lifting of general con- are heavily inspired by, Tim Berners-Lee’s vision of the cepts and of relations between entities. Many recent semantic web, a machine-processable web of data that approaches also lift relations jointly with entities (both augments the original web of human-readable docu- keep updated the NLP models. ments [41]. Knowledge graphs can therefore lever- age existing standards such as RDF, RDFS, and OWL. Moreover, the constructed knowledge graph could Acknowledgments be used to implement more operations like question Supported by the News Angler project funded by the answering, knowledge graph-based sentence auto- Norwegian Research Council’s IKTPLUSS programme completion, storytelling, fact-checking and so forth as project 275872. using semantic news analysis. 5. Conclusion References [1] A. Berven, O. Christensen, S. Moldeklev, A. Op- Lifting high-volume streams of news texts involves dahl, K. Villanger, News hunter: building and representing their content in machine-understandable mining knowledge graphs for newsroom sys- formats. KGs is one such formats that has received tems, in: NOKOBIT, volume 26, 2018. much attention recently. NLP lifters are an impor- [2] M. Gallofré Ocaña, L. Nyre, A. L. Opdahl, tant prerequisite for making the abundance of natural B. Tessem, C. Trattner, C. Veres, Towards a big language news on the internet available as computer- data platform for news angles, in: 4th Norwegian processable knowledge graphs. Thus, the presented Big Data Symposium (NOBIDS) 2018, 2018. NLP lifting pipeline provides with an structured and [3] A. Berven, O. Christensen, S. Moldeklev, A. Op- formalised process for transforming natural language dahl, K. Villanger, A knowledge graph platform text into computer-processable knowledge graphs. for newsrooms, Computers in Industry (2020). To The presented pipeline can incorporate any NLP tech- appear. nique like traditional or end-to-end approaches and [4] B. Tessem, A. L. Opdahl, Supporting journalistic combining its results or expand them with specific- news angles with models and analogies, in: 2019 purpose NLP method like sentiment analysis. More- 13th RCIS, IEEE, 2019, pp. 1–7. over, the introduced NLP lifter is designed to simplify [5] A. L. Opdahl, B. Tessem, Towards ontological its components replaceability by making use of docker support for journalistic angles, in: Enterprise, technology, facilitating e.g., the update of all tasks and Business-Process and Information Systems Mod- methods to SOTA approaches. Although the proposed eling, Springer International Publishing, 2019, JKP is designed mainly to help journalists, it could be pp. 279–294. used and customized for the public. The presented [6] B. Tessem, Analogical news angles from text sim- NLP lifting architecture aims to be used as reference ilarity, in: Artificial Intelligence XXXVI, Springer for developers and researchers of JKP interested in International Publishing, 2019, pp. 449–455. real-time NEL. News organisations may need to adapt [7] A. L. Opdahl, B. Tessem, Ontologies for finding their systems, replace components, add new SOTA journalistic angles, Software and Systems Mod- technologies, or integrate it with other JKP, thus hav- eling (2020) 1–17. ing such NLP lifting pipeline as reference facilitate its [8] E. Motta, E. Daga, A. L. Opdahl, B. Tessem, Anal- management and understanding. Furthermore, it is ysis and design of computational news angles, not restricted to news text and could be used to lift IEEE Access (2020). other types of texts. [9] M. Albared, M. Gallofré Ocaña, A. Ghareb, T. Al- In our future work, we plan to validate the results Moslmi, Recent progress of named entity recog- of our proposed NLP lifter by using both a manually nition over the most popular datasets, in: collected and annotated corpus of news and gold- 2019 First International Conference of Intelligent standards, and compare the results of our proposed Computing and Engineering (ICOICE), 2019, pp. lifter with current NEL systems such as ADEL, SpaCy 1–9. lifter, NewsReader, Stanford CoreNLP or DBpedia [10] T. Al-Moslmi, M. Gallofré Ocaña, A. L. Opdahl, Spotlight. Besides, we want to explore the possibili- C. Veres, Named entity extraction for knowl- ties that validations tools like GERBIL [42] can provide edge graphs: A literature overview, IEEE Access when applied inside our NLP lifter. We believe that 8 (2020) 32862–32881. validation tools can provide insights about the evo- [11] N. Fernández, D. Fuentes, L. Sánchez, J. A. Fis- lution and performance of the applied NLP processes teus, The news ontology: Design and appli- which can be incorporated to reinforce, improve and cations, Expert Systems with Applications 37 moroch, S. Shah, Pangloss: Fast entity linking (2010) 8694 – 8704. in noisy text environments, in: Proceedings of [12] G. Leban, B. Fortuna, J. Brank, M. Grobelnik, the 24th ACM SIGKDD, KDD ’18, ACM, 2018, p. Event registry: learning about world events from 168–176. news, in: Proceedings of the 23rd WWW, ACM, [22] T. Boros, S. D. Dumitrescu, R. Burtica, NLP-cube: 2014, pp. 107–110. End-to-end raw text processing with neural net- [13] P. Vossen, R. Agerri, I. Aldabe, A. Cybulska, works, in: Proceedings of the CoNLL 2018, ACL, M. van Erp, A. Fokkens, E. Laparra, A.-L. Mi- 2018, pp. 171–179. nard, A. P. Aprosio, G. Rigau, M. Rospocher, [23] A. Baevski, S. Edunov, Y. Liu, L. Zettlemoyer, R. Segers, Newsreader: Using knowledge re- M. Auli, Cloze-driven pretraining of self- sources in a cross-lingual reading machine to attention networks, in: Proceedings of the 2019 generate more knowledge from massive streams Conference on EMNLP and the 9th IJCNLP, ACL, of news, Knowledge-Based Systems 110 (2016) 2019, pp. 5359–5368. 60 – 85. [24] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, [14] M. Kattenberg, Z. Beloki, A. Soroa, X. Artola, Bert: Pre-training of deep bidirectional trans- A. Fokkens, P. Huygen, K. Verstoep, Two formers for language understanding, 2018. architectures for parallel processing of huge arXiv:1810.04805. amounts of text, in: Proceedings of the Tenth [25] M. Peters, M. Neumann, M. Iyyer, M. Gardner, LREC’16), European Language Resources Asso- C. Clark, K. Lee, L. Zettlemoyer, Deep contextu- ciation (ELRA), 2016, pp. 4513–4519. alized word representations, in: Proceedings of [15] U. Germann, P. v. d. Kreeft, G. Barzdins, A. Birch, the 2018 Conference of the NAACL, ACL, 2018, The summa platform: Scalable understanding of pp. 2227–2237. multilingual media, in: Proceedings of the 21st [26] L. Liu, X. Ren, J. Shang, X. Gu, J. Peng, J. Han, Ef- Annual Conference of the European Association ficient contextualized representation: Language for Machine Translation, 2018. model pruning for sequence labeling, in: Pro- [16] C. Rudnik, T. Ehrhart, O. Ferret, D. Teyssou, ceedings of the 2018 Conference on EMNLP, R. Troncy, X. Tannier, Searching news articles ACL, 2018, pp. 1215–1225. using an event knowledge graph leveraged by [27] J. Plu, G. Rizzo, R. Troncy, Enhancing entity Wikidata, in: 30th WWW Conference, 13-17 May linking by combining ner models, in: H. Sack, 2019, 2019. S. Dietze, A. Tordai, C. Lange (Eds.), Semantic [17] N. Fernández, J. M. Blázquez, J. A. Fisteus, Web Challenges, Springer International Publish- L. Sánchez, M. Sintek, A. Bernardi, M. Fuentes, ing, Cham, 2016, pp. 17–32. A. Marrara, Z. Ben-Asher, News: Bringing [28] J. Raiman, O. Raiman, Deeptype: Multilingual semantic web technologies into news agencies, entity linking by neural type system evolution, in: I. Cruz, S. Decker, D. Allemang, C. Preist, 2018. arXiv:1802.01021. D. Schwabe, P. Mika, M. Uschold, L. M. Aroyo [29] I. Yamada, H. Shindo, Pre-training of deep (Eds.), The Semantic Web - ISWC 2006, Springer contextualized embeddings of words and en- Berlin Heidelberg, Berlin, Heidelberg, 2006, pp. tities for named entity disambiguation, 2019. 778–791. arXiv:1909.00426. [18] M. Rospocher, M. van Erp, P. Vossen, A. Fokkens, [30] Z. Fang, Y. Cao, Q. Li, D. Zhang, Z. Zhang, I. Aldabe, G. Rigau, A. Soroa, T. Ploeger, T. Bo- Y. Liu, Joint entity linking with deep reinforce- gaard, Building event-centric knowledge graphs ment learning, in: The WWW Conference, from news, Journal of Web Semantics 37-38 WWW ’19, ACM, 2019, p. 438–447. (2016) 132 – 151. [31] A. Luo, S. Gao, Y. Xu, Deep semantic match [19] G. Zhu, C. A. Iglesias, Exploiting semantic simi- model for entity linking using knowledge graph larity for named entity disambiguation in knowl- and text, Procedia Computer Science 129 (2018) edge graphs, Expert Systems with Applications 110 – 114. 2017 International Conference on 101 (2018) 8 – 24. Identification, Information and Knowledge in the [20] M. Fossati, E. Dorigatti, C. Giuliano, N-ary rela- Internet of Things. tion extraction for simultaneous t-box and a-box [32] A. Moro, A. Raganato, R. Navigli, Entity link- knowledge base augmentation, Semantic Web 9 ing meets word sense disambiguation: a unified (2018) 413–439. approach, Transactions of the ACL 2 (2014) 231– [21] M. Conover, M. Hayes, S. Blackburn, P. Sko- 244. arXiv:10.1162/tacl_a_00179. [33] O.-E. Ganea, M. Ganea, A. Lucchi, C. Eickhoff, T. Hofmann, Probabilistic bag-of-hyperlinks model for entity linking, in: Proceedings of the 25th WWW, WWW ’16, WWW Conference, 2016, p. 927–938. [34] D. B. Nguyen, M. Theobald, G. Weikum, J- nerd: Joint named entity recognition and disambiguation with rich linguistic features, Transactions of the ACL 4 (2016) 215–229. arXiv:10.1162/tacl_a_00094. [35] O.-E. Ganea, T. Hofmann, Deep joint entity disambiguation with local neural attention, in: Proceedings of the 2017 Conference on EMNLP, ACL, 2017, pp. 2619–2629. [36] N. Kolitsas, O.-E. Ganea, T. Hofmann, End-to-end neural entity linking, in: Proceedings of the 22nd Conference on Computational Natural Language Learning, ACL, 2018, pp. 519–529. [37] Y. Cao, L. Hou, J. Li, Z. Liu, Neural collective en- tity linking, 2018. arXiv:1811.08603. [38] P. Le, I. Titov, Improving entity linking by mod- eling latent relations between mentions, in: Pro- ceedings of the 56th ACL, ACL, 2018, pp. 1595– 1604. [39] P. H. Martins, Z. Marinho, A. F. T. Martins, Joint learning of named entity recognition and entity linking, in: Proceedings of the 57th ACL, ACL, 2019, pp. 190–196. [40] D. Allemang, J. Hendler, Semantic Web for the Working Ontologist, second edition ed., Morgan Kaufmann, 2011. [41] T. Berners-Lee, J. Hendler, O. Lassila, et al., The semantic web, Scientific american 284 (2001) 28– 37. [42] M. Röder, R. Usbeck, A. N. Ngomo, GERBIL - benchmarking named entity recognition and linking consistently, Semantic Web 9 (2018) 605– 625.