<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Lifting News into a Journalistic Knowledge Platform</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tareq Al-Moslmi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marc Gallofré Ocaña</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bergen</institution>
          ,
          <addr-line>Fosswinckelsgt. 6, Postboks 7802, 5020 Bergen</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A massive amount of news is being shared online by individuals and news agencies, making it dificult to take advantage of these news and analyse them in traditional ways. In view of this, there is an urgent need to use recent technologies to analyse all news relevant information that is being shared in natural language and convert it into forms that can be more easily and precisely processed by computers. Knowledge Graphs (KGs) ofer ofer a good solution for such processing. Natural Language Processing (NLP) ofers the possibility for mining and lifting natural language texts to knowledge graphs allowing to exploit its semantic capabilities, facilitating new possibilities for news analysis and understanding. However, the current available techniques are still away from perfect. Many approaches and frameworks have been proposed to track and analyse news in the last few years. The shortcomings of those systems are that they are static and not updateable, are not designed for largescale data volumes, did not support real-time processing, dealt with limited data resources, used traditional lifting pipelines and supported limited tasks, or have neglected the use of knowledge graphs to represent news into a computer-processable form. Therefore, there is a need to better support lifting natural language into a KG. With the continuous development of NLP techniques, the design of new dynamic NLP lifters that can cope with all the previous shortcomings is required. This paper introduces a general NLP lifting architecture for automatically lifting and processing news reports in real-time based on the recent development of the NLP methods.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural language processing (NLP)</kwd>
        <kwd>Journalistic knowledge platforms</kwd>
        <kwd>Knowledge Graphs</kwd>
        <kwd>Computational journalism</kwd>
        <kwd>Stream data processing</kwd>
        <kwd>Semantic technologies</kwd>
        <kwd>Big data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        about news being shared on the web and social media
networks. JKPs have become crucial for press
indusFor several years we have seen how the traditional try. Yet, many works have proposed to process the
news press has moved to online content and new news texts in many diferent ways in order to apply
online press has appeared, publishing more online diferent JKP processes.
content than ever. Social networks enhanced that Our group have been developing a series of JKP
prophenomenon facilitating real-time interactions and totypes called News Hunter [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ] in collaboration
sharing, allowing pre-news to come to the surface, with a developer of newsroom tools for the
internaand bringing users with newer ways to digest news. tional market. News Hunter moves forward the JKP
Analysing news in real-time for supporting jour- to address the journalistic needs proposing a system
nalist work requires lifting those news to machine- to harvest real-time news stories from RSS feeds and
understandable formats. Semantic representation of social media, lifting news using SOTA approaches, and
news using knowledge graphs is one of such formats representing stories into knowledge graphs using
Sethat could be employed. Since news texts are ex- mantic Web standard technologies, Linked Open Data
pressed as natural language, there is a crucial need and NIF formats. News Hunter also explores detection
for processing and lifting these texts into a knowledge and suggestion of news angles and exploitation of
Segraph. mantic Web to support journalistic work [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7 ref8">4, 5, 6, 7, 8</xref>
        ].
      </p>
      <p>
        This paper presents an NLP lifting architecture Diferently from previous works, our introduced
component of the Journalistic Knowledge Platforms NLP subsystem’s architecture for News Hunter aims
(JKP) for lifting natural language news text into knowl- to lift all processed news into a semantic knowledge
edge graphs. JKP is a system intended for analysing, graph in real-time. Moreover, two Natural Language
lifting, and representing news using knowledge graphs Processing (NLP) lifting tracks could be chosen: the
to support journalists exploiting knowledge from and traditional pipeline and the end-to-end which
follows the state-of-the-art (SOTA) development of deep
Proceedings of the CIKM 2020 Workshops,
October 19-20, Galway, Ireland. neural network. That would avoid some limitations
email: Tareq.Al-Moslmi@uib.no (T. Al-Moslmi); reported in previous lifting tasks [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
Marc.Gallofre@uib.no (M. Gallofré Ocaña) The rest of the paper is organised as follows:
Sec(oMrc.iGda:l0lo0f0r0é-O00c0a2ñ-a5)296-2709 (T. Al-Moslmi); 0000-0001-7637-3303 tion 2 presents the background for our work. Section
© 2020 Copyright for this paper by its authors. Use permitted under Creative 3 introduced the general architecture of JKP. Section
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmUmoRns WLiceonrsekAsthtriobuptioPnr4o.0cIneteerdnaitniognasl ((CCC EBYU4R.0)-.WS.org) 4 constitutes the bulk of the paper and introduces the
general NLP lifting process for real-time news lifting tection and temporal relation detection over four
difto a knowledge graph. Section 5 concludes the paper ferent languages dealing and millions of news
artiand outlines plans for future work. cles. The NLP pipeline processes each item starting
with linguistic techniques (tokenizer, PoS, multiwords
tagger), traditional NER and NEL (based on
DBpe2. Related Work dia Spotlight), opinion miner, semantic role labeler,
event resolution, temporal recognizer and causal and
Current JKPs [
        <xref ref-type="bibr" rid="ref11">11, 12, 13, 14, 15, 16</xref>
        ] deal with big data factuality relation extraction. To overcome the large
multilingual text and multimedia sources of news- amount of news articles, NewsReader implemented
related items from which they have implemented their its NLP pipeline using Big Data oriented technologies
diferent NLP pipelines. These JPKs implemented NLP (i.e., Hadoop and Storm) into an scalable and real-time
pipelines for lifting news into knowledge graphs and system [14].
detect events normally by using traditional Named Big data, multimedia and multilingual sources
toEntity Recognition (NER) and Named Entity Linking gether are encountered in SUMMA project [15] which
(NEL) systems, and pre-processed news text using is an open-source platform for automated, scalable and
linguistic techniques such as Part-of-Speech tagging distributed monitoring of real-time media broadcasts
(PoS), tokenisation, lemmatization and translation. In to support news agencies work like BBC or Deutsche
addition, NEWS project [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] used pattern matching Welle. The platform is built using big data-oriented
to detect events, implemented NEL using PageRank technologies and services running in Docker2
containand classified items, concepts and events using IPTC ers. SUMMA converts multimedia sources into text
codes. NewsReader [13] used DBpedia Spotlight for which is translated into English when found in other
NEL and mined opinion, causal, factual, temporal and languages. Then, the text is processed through a NLP
semantic role information from news. ASRAEL [16] pipeline which classify them by topic using a
hierarused SpaCy for NER, ADEL for NEL and Wikidata for chical attention model, cluster them into storylines
uslinking events. SUMMA [15] used support vector ma- ing clustering algorithms, and represent them using
chines (SVM) for NEL and classified topics from news. traditional NER (dependency parsing) and NEL
(SVMAnd both EvenRegistry [12] and SUMMA [15] used Ranking) techniques.
clustering techniques to detect events. Likewise, the previous works ASRAEL project [16]
      </p>
      <p>
        The NEWS project [
        <xref ref-type="bibr" rid="ref11">17, 11</xref>
        ] aimed to provide fresh uses knowledge graphs to represent events in news
armultilingual information to news agencies (Spanish ticles for searching purposes. To do so, they map AFP
EFE and Italian ANSA agencies) analysing both tex- articles to Wikidata using NER (based on spaCy) and
tual and multimedia items. NEWS uses Ontology Ltd. the NEL system ADEL.
(currently part of EXFO Nova Context real-time active As observed in the previous works there is a need
topology platform1 to implement the NLP pipeline for big data, real-time and semantic technologies
apto provide item categorization, concept representa- proaches to deal with high volumes of news items that
tion, abstract generation, event recognition and NER comes from multilingual and multimedia sources, and
using the ITPC codes. The NLP pipeline combines a common interest for detecting events among
jourboth linguistic techniques (patterns and rules such as nalists and the diferent projects. Moreover, the
proPoS tagging) and traditional NER and NEL techniques posed NLP techniques follow traditional approaches
(statistical techniques and PageRank). For recogniz- and similar pipelines which may not be always
suiting events, NEWS project used pattern recognition able for big data and real-time or for providing the best
techniques to describe and find the desired events. results.
      </p>
      <p>
        The process of recognizing events is a relevant fea- Many approaches for lifting natural language to
ture of such systems, which is approached in many knowledge graphs are based on previous-generation
diferent ways. For example, Event Registry [12] uses NER techniques, and new lifting approaches that add
clustering algorithms to detect and group similar ar- disambiguation and linking to recent best-of-breed
ticles which represent the same event. Following the NE recognisers are needed . There is also a lack of
central idea of events, NewsReader project [13, 18] standards for comparing lifting approaches[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. This
proposed a method, tools and a system to automati- can partly be attributed to a lack of commonly
accally leverage and represent events from news. cepted benchmarks, but it also a consequence of the
      </p>
      <p>
        The NewsReader NLP pipeline performs language recognition-disambiguation-linking pipeline. For
exspecific NER and NEL, event and semantic role
de1https://www.exfo.com/en/ontology/
2www.docker.com
or multiple sources of interest. Due to the high
amount of news items and their velocity of
production, the harvested items are represented using
standard lightweight formats like JSON, in order to
facilitate its parsing, execution, transfer, sharing between
Figure 1: News Hunter architecture [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] components and temporal storage. News items are
gathered together with its associated metadata (e.g.,
URL, source, author, ID, timestamp) which is included
ample, it is hard to fairly compare pure NER with com- in the JSON files to benefit, speed and simplify its
bined NER-NED-NEL techniques, when the latter is further processing and NLP tasks.
restricted to identifying named entities in the KB that News items are processed according to their source:
is used for disambiguation and linking. Moreover, tra- social media or news agencies . The news histories
ditional sequential steps are now being integrated by coming from news agencies (RSS feeds, news
webjoint learning or end-to-end processes. Consequently, sties or archive) in JSON format are lifted into the
mentions and entities that were previously analysed knowledge graph as RDF triples using the NLP lifter,
in isolation are now being lifted in each other’s con- which can be adapted to the domain specific of the
text. The current culmination of these trends are the news history (e.g., economics, politics, sports). On the
deep-learning approaches that reported promising re- other hand, the news items coming from social
mesults recently. Most of those developments are not dia can be either pre-news (i.e., real-time information
considered in previous works and this paper targets about events or something that is happening at the
to cope with these gaps. moment but not yet or incomplete as news histories)
or small summaries/abstracts about news. Thus,
identifying the topic they are related to and cluster them
3. Journalistic Knowledge into groups of pre-news items that represent the same
Platform architecture event and topic facilitates its processing. As these
clusters of pre-news items represent a potential event
In our previous work on News Hunter[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] we have pro- with richer information that a single one item, they
posed a general architecture for journalistic knowl- can be lifted using NLP techniques into the Knowledge
edge platforms (Figure 1) which is intended for big Graphs.
data real-time news lifting and processing. The still Furthermore, as the social media items are
potenevolving architecture consists of 5 main parts: (1) The tial real-time pre-news or events which can be
breakharvesting system which harvests the news from the ing news, they are of highly importance for
journalweb (e.g., RSS feeds, Facebook, Twitter) or daily pro- ists. Yet, the clusters are analysed and monitored in
duced in-house news (e.g., agency daily news activity) order to find trends or breaking news events, that are
and its associated metadata (e.g., URL, source, author, reported in real-time to journalists.
ID, timestamp), and represents them using JSON in or- In this paper, we are introducing the NLP lifting
arder to facilitate its parsing, transferring and simplify- chitecture that received the input from the harvester
ing it further processing. (2) The data lake or storage that have been explained previously[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The harvester
system for big data and real-time which is designed is taking care of getting the data from diferent sources
for sharing the news items across the diferent pro- and standardise the data type into a unified format like
cesses. (3) The semantic news component which con- JSON, XML, or NIF. The text can be stored in a
bigtains the NLP lifter and the semantic DB (knowledge data oriented databases such as Apache Cassandra 3 or
graph). (4) The semantic and streaming news analysis HBase 4, which are oriented for distribution and
largeservices, which due to the importance of social media scale processing pipelines. Moreover, the text can be
can provide real-time analysis like trend monitoring, distributed along the diferent NLP tasks using API or
and event detection. (5) The service layer which al- distribution framework like Kafka 5 or RabbitMQ 6.
lows users interact with the JKP. The NLP liifter then has to deal with the data and lift
      </p>
      <p>News items can be collected from multiple sources: it into a proper semantic format that will then be
inonline news (e.g., RSS feeds), social media (e.g., Face- serted to the KG.
book, Twitter), archives or daily produced in-house
news (e.g., agency daily news activity). The news
crawler is oriented to harvest news from any source
3https://cassandra.apache.org
4https://hbase.apache.org
5https://kafka.apache.org
6https://www.rabbitmq.com</p>
    </sec>
    <sec id="sec-2">
      <title>4. NLP lifter</title>
      <p>
        ing, and structural parsing. Recent works indicate that
robust lifting systems require accurate tuning of
sevThis section describes the NLP lifter for news natural eral steps, especially tokenization and semantic
similanguage texts to knowledge graphs. The NLP lifter larity [21]. Recently, deep neural networks, especially
which is a component of the JKP architecture consists end-to-end methods, have reduced the need for
preof the main NLP lifting tasks as well as some addi- processing steps. Moreover, using deep neural
nettional related tasks. Diferently from others proposed works for pre-processing tasks such as tokenization
systems, our proposed NLP lifter is docker-based and has recently produced promising results [22]. The
procontains the most possible tasks (traditional and re- posed NLP lifter could include as many pre-processing
cently developed ones) as shown in Figure 2. This steps as possible, which will be in separate dockers, so
allow the development of the platform and ensure us- the user can choose all suitable ones for the target data.
ing the most recent technology all the time. There will
be two main NLP tracks: the traditional pipeline that 4.2. Named entity recognition
is updated by recent technologies and the end-to-end
track which is the SOTA in many tasks. In addition, Named entity recognition is the task that identifies
there is the ensemble service that could combine more the named entities contained in the text like
perthan one lifter to produce better results. The purpose sons, locations, organizations, time, date, money, etc.
and advantage of this is that the user can choose to NER approaches could be categorised into three main
use the most suitable track for his case and data as groups: knowledge-based approaches, learning-based
well as the most recent techniques. In the traditional methods, and feature-inferring neural network
methpipeline the tasks like NER, NED, and NEL are imple- ods. Despite the existence of recent SOTA NER
remented separately and mostly using the of-the-shelf sults (especially recent deep NN approaches) such
software. The of-the-shelf systems are usually based as [23, 24, 25, 26], these approaches have not been
ution old approaches and their performance is not the lized and exploited in the process of lifting natural
lanSOTA. Moreover, traditional lifting methods neglect guage to knowledge graphs as mentioned earlier. This
the relations between entity types and entity context. paper aims to implement those SOTA NER methods in
However, there will be a possibility in our introduced docker-based components to tackle this shortcoming.
architecture to ensure the using of the most updated
ones or using newest systems by just replacing or 4.3. Named entity linking
adding their dockers to the related component. The
news item annotation ontology that has already been
designed by[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] defines how the semantic annotations
of news items should be represented in the knowledge
graph. Each harvested news item is associated with
one or more annotations, which may be, for example,
named entities, concepts, topics, times or geolocations
or relations between annotations. The ontology also
describes how the sources of news items and
annotations are represented in the knowledge graph to
maintain provenance[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We describe the general NLP
lifter components as the following:
      </p>
      <sec id="sec-2-1">
        <title>NEL annotates each mention in a text with the iden</title>
        <p>tifier of its corresponding entity that is described in a
KB in the LOD cloud. Our paper has defined NEL as a
wider task that includes NED as one of its processes.</p>
        <p>Many NEL approaches are utilizing of-the-shelf
systems for NER task. It is, however, a challenging task
to choose which particular model to use for those
systems. That is because it requires to estimate the
similarity level between the system’s training datasets
and the dataset that needs to be processed in which
we strive to accurately recognize entities, according
to [27]. Most recent SOTA systems on AIDA-CoNLL
4.1. Pre-processing dataset includes [28, 29, 30, 31]. There is no perfect
NEL model for all datasets and one model might be the
The quality of the data plays a key role in determin- best on one dataset but perform poorly on others.
Acing the suitable pre-processing techniques. Since we cordingly, having the top N best SOTA implemented
are dealing with the real-time streaming, the cleaning in dockers will allow the user to pick the most suitable
and normalization are required to remove unnecessary model for his data and/or replace or update them at
or noisy terms (like ASCII codes, currency symbols, any time when needed.
hashtags, and so forth). The most frequently used
preprocessing techniques are tokenization and POS
tagging [19, 20]. Other common steps are sentence
splitting, lemmatisation, chunking and dependency
pars4.4. End-to-end track
named entities and concepts) and reported the SOTA
results. Similar to previous components, the proposed
lifter will implement those methods and include them
as optional tasks as many others for the user.</p>
        <p>
          The majority of previous studies were mostly
assuming the availability of mentions and entities and
focused on the disambiguation process only. However,
leveraging mutual dependency between mentions and
their entities is neglected. Moreover, it is not a practi- 4.6. User-oriented tasks
cal idea in a real-world application. Diferent from that User-oriented tasks include those tasks specific and
and to overcome those shortcomings, end-to-end deals personalised for the project where the NLP lifting
with row text and aims to extract all mentions and link architecture is implemented. Apart from including
them to their entities in the knowledge base. End-to- SOTA NLP tasks like the previously described, the
end entity linking has been recently proposed and is NLP lifting architecture takes into account purpose
receiving increasing attention. Few studies have been specific tasks such as news angles detection, event
published which claiming the application of the end- detection, IPTC media codes annotation, rumours
deto-end approach [
          <xref ref-type="bibr" rid="ref12 ref13 ref14">32, 33, 34, 35</xref>
          ]. The most interest- tection and text completion.
ing ones are the most recent neural-based end-to-end
linking models [
          <xref ref-type="bibr" rid="ref15 ref16 ref17 ref18">36, 37, 38, 39</xref>
          ]. One of the most recent
SOTA is [
          <xref ref-type="bibr" rid="ref17">38</xref>
          ] followed by [
          <xref ref-type="bibr" rid="ref15">36</xref>
          ]. Our NLP lifter aims 4.7. Knowledge graph
at including such techniques as an alternative recent In a knowledge graph, the nodes represent either
contrack for lifting news texts into a semantic knowledge crete objects, concepts, information resources, or data
graph. about them, and the edges represent semantic
relations between the nodes [
          <xref ref-type="bibr" rid="ref19">40</xref>
          ]. Knowledge graphs thus
4.5. Relation and concept extraction ofer a widely used format for representing
information in computer-processable form. They build on, and
are heavily inspired by, Tim Berners-Lee’s vision of the
semantic web, a machine-processable web of data that
Our NLP lifter aims at covering lifting of general
concepts and of relations between entities. Many recent
approaches also lift relations jointly with entities (both
augments the original web of human-readable docu- keep updated the NLP models.
ments [
          <xref ref-type="bibr" rid="ref20">41</xref>
          ]. Knowledge graphs can therefore
leverage existing standards such as RDF, RDFS, and OWL.
        </p>
        <p>Moreover, the constructed knowledge graph could Acknowledgments
be used to implement more operations like question
answering, knowledge graph-based sentence
autocompletion, storytelling, fact-checking and so forth
using semantic news analysis.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Supported by the News Angler project funded by the Norwegian Research Council’s IKTPLUSS programme as project 275872.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Conclusion</title>
      <p>Lifting high-volume streams of news texts involves
representing their content in machine-understandable
formats. KGs is one such formats that has received
much attention recently. NLP lifters are an
important prerequisite for making the abundance of natural
language news on the internet available as
computerprocessable knowledge graphs. Thus, the presented
NLP lifting pipeline provides with an structured and
formalised process for transforming natural language
text into computer-processable knowledge graphs.</p>
      <p>The presented pipeline can incorporate any NLP
technique like traditional or end-to-end approaches and
combining its results or expand them with
specificpurpose NLP method like sentiment analysis.
Moreover, the introduced NLP lifter is designed to simplify
its components replaceability by making use of docker
technology, facilitating e.g., the update of all tasks and
methods to SOTA approaches. Although the proposed
JKP is designed mainly to help journalists, it could be
used and customized for the public. The presented
NLP lifting architecture aims to be used as reference
for developers and researchers of JKP interested in
real-time NEL. News organisations may need to adapt
their systems, replace components, add new SOTA
technologies, or integrate it with other JKP, thus
having such NLP lifting pipeline as reference facilitate its
management and understanding. Furthermore, it is
not restricted to news text and could be used to lift
other types of texts.</p>
      <p>
        In our future work, we plan to validate the results
of our proposed NLP lifter by using both a manually
collected and annotated corpus of news and
goldstandards, and compare the results of our proposed
lifter with current NEL systems such as ADEL, SpaCy
lifter, NewsReader, Stanford CoreNLP or DBpedia
Spotlight. Besides, we want to explore the
possibilities that validations tools like GERBIL [
        <xref ref-type="bibr" rid="ref21">42</xref>
        ] can provide
when applied inside our NLP lifter. We believe that
validation tools can provide insights about the
evolution and performance of the applied NLP processes
which can be incorporated to reinforce, improve and
cations, Expert Systems with Applications 37 moroch, S. Shah, Pangloss: Fast entity linking
(2010) 8694 – 8704. in noisy text environments, in: Proceedings of
[12] G. Leban, B. Fortuna, J. Brank, M. Grobelnik, the 24th ACM SIGKDD, KDD ’18, ACM, 2018, p.
      </p>
      <p>Event registry: learning about world events from 168–176.
news, in: Proceedings of the 23rd WWW, ACM, [22] T. Boros, S. D. Dumitrescu, R. Burtica, NLP-cube:
2014, pp. 107–110. End-to-end raw text processing with neural
net[13] P. Vossen, R. Agerri, I. Aldabe, A. Cybulska, works, in: Proceedings of the CoNLL 2018, ACL,
M. van Erp, A. Fokkens, E. Laparra, A.-L. Mi- 2018, pp. 171–179.
nard, A. P. Aprosio, G. Rigau, M. Rospocher, [23] A. Baevski, S. Edunov, Y. Liu, L. Zettlemoyer,
R. Segers, Newsreader: Using knowledge re- M. Auli, Cloze-driven pretraining of
selfsources in a cross-lingual reading machine to attention networks, in: Proceedings of the 2019
generate more knowledge from massive streams Conference on EMNLP and the 9th IJCNLP, ACL,
of news, Knowledge-Based Systems 110 (2016) 2019, pp. 5359–5368.</p>
      <p>60 – 85. [24] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova,
[14] M. Kattenberg, Z. Beloki, A. Soroa, X. Artola, Bert: Pre-training of deep bidirectional
transA. Fokkens, P. Huygen, K. Verstoep, Two formers for language understanding, 2018.
architectures for parallel processing of huge arXiv:1810.04805.
amounts of text, in: Proceedings of the Tenth [25] M. Peters, M. Neumann, M. Iyyer, M. Gardner,
LREC’16), European Language Resources Asso- C. Clark, K. Lee, L. Zettlemoyer, Deep
contextuciation (ELRA), 2016, pp. 4513–4519. alized word representations, in: Proceedings of
[15] U. Germann, P. v. d. Kreeft, G. Barzdins, A. Birch, the 2018 Conference of the NAACL, ACL, 2018,
The summa platform: Scalable understanding of pp. 2227–2237.
multilingual media, in: Proceedings of the 21st [26] L. Liu, X. Ren, J. Shang, X. Gu, J. Peng, J. Han,
EfAnnual Conference of the European Association ifcient contextualized representation: Language
for Machine Translation, 2018. model pruning for sequence labeling, in:
Pro[16] C. Rudnik, T. Ehrhart, O. Ferret, D. Teyssou, ceedings of the 2018 Conference on EMNLP,
R. Troncy, X. Tannier, Searching news articles ACL, 2018, pp. 1215–1225.
using an event knowledge graph leveraged by [27] J. Plu, G. Rizzo, R. Troncy, Enhancing entity
Wikidata, in: 30th WWW Conference, 13-17 May linking by combining ner models, in: H. Sack,
2019, 2019. S. Dietze, A. Tordai, C. Lange (Eds.), Semantic
[17] N. Fernández, J. M. Blázquez, J. A. Fisteus, Web Challenges, Springer International
PublishL. Sánchez, M. Sintek, A. Bernardi, M. Fuentes, ing, Cham, 2016, pp. 17–32.</p>
      <p>A. Marrara, Z. Ben-Asher, News: Bringing [28] J. Raiman, O. Raiman, Deeptype: Multilingual
semantic web technologies into news agencies, entity linking by neural type system evolution,
in: I. Cruz, S. Decker, D. Allemang, C. Preist, 2018. arXiv:1802.01021.</p>
      <p>D. Schwabe, P. Mika, M. Uschold, L. M. Aroyo [29] I. Yamada, H. Shindo, Pre-training of deep
(Eds.), The Semantic Web - ISWC 2006, Springer contextualized embeddings of words and
enBerlin Heidelberg, Berlin, Heidelberg, 2006, pp. tities for named entity disambiguation, 2019.
778–791. arXiv:1909.00426.
[18] M. Rospocher, M. van Erp, P. Vossen, A. Fokkens, [30] Z. Fang, Y. Cao, Q. Li, D. Zhang, Z. Zhang,
I. Aldabe, G. Rigau, A. Soroa, T. Ploeger, T. Bo- Y. Liu, Joint entity linking with deep
reinforcegaard, Building event-centric knowledge graphs ment learning, in: The WWW Conference,
from news, Journal of Web Semantics 37-38 WWW ’19, ACM, 2019, p. 438–447.
(2016) 132 – 151. [31] A. Luo, S. Gao, Y. Xu, Deep semantic match
[19] G. Zhu, C. A. Iglesias, Exploiting semantic simi- model for entity linking using knowledge graph
larity for named entity disambiguation in knowl- and text, Procedia Computer Science 129 (2018)
edge graphs, Expert Systems with Applications 110 – 114. 2017 International Conference on
101 (2018) 8 – 24. Identification, Information and Knowledge in the
[20] M. Fossati, E. Dorigatti, C. Giuliano, N-ary rela- Internet of Things.</p>
      <p>tion extraction for simultaneous t-box and a-box [32] A. Moro, A. Raganato, R. Navigli, Entity
linkknowledge base augmentation, Semantic Web 9 ing meets word sense disambiguation: a unified
(2018) 413–439. approach, Transactions of the ACL 2 (2014) 231–
[21] M. Conover, M. Hayes, S. Blackburn, P. Sko- 244. arXiv:10.1162/tacl_a_00179.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berven</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Christensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Moldeklev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Opdahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Villanger</surname>
          </string-name>
          ,
          <article-title>News hunter: building and mining knowledge graphs for newsroom systems</article-title>
          ,
          <source>in: NOKOBIT</source>
          , volume
          <volume>26</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gallofré Ocaña</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Nyre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Opdahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tessem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Trattner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Veres</surname>
          </string-name>
          ,
          <article-title>Towards a big data platform for news angles</article-title>
          ,
          <source>in: 4th Norwegian Big Data Symposium (NOBIDS)</source>
          <year>2018</year>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berven</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Christensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Moldeklev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Opdahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Villanger</surname>
          </string-name>
          ,
          <article-title>A knowledge graph platform for newsrooms, Computers in Industry (</article-title>
          <year>2020</year>
          ). To appear.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Tessem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Opdahl</surname>
          </string-name>
          ,
          <article-title>Supporting journalistic news angles with models and analogies</article-title>
          ,
          <source>in: 2019 13th RCIS</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Opdahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tessem</surname>
          </string-name>
          ,
          <article-title>Towards ontological support for journalistic angles</article-title>
          ,
          <source>in: Enterprise, Business-Process and Information Systems Modeling</source>
          , Springer International Publishing,
          <year>2019</year>
          , pp.
          <fpage>279</fpage>
          -
          <lpage>294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Tessem</surname>
          </string-name>
          ,
          <article-title>Analogical news angles from text similarity</article-title>
          ,
          <source>in: Artificial Intelligence XXXVI</source>
          , Springer International Publishing,
          <year>2019</year>
          , pp.
          <fpage>449</fpage>
          -
          <lpage>455</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Opdahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tessem</surname>
          </string-name>
          ,
          <article-title>Ontologies for finding journalistic angles</article-title>
          ,
          <source>Software and Systems Modeling</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Motta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Daga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Opdahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tessem</surname>
          </string-name>
          ,
          <article-title>Analysis and design of computational news angles</article-title>
          , IEEE Access (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Albared</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Gallofré</given-names>
            <surname>Ocaña</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghareb</surname>
          </string-name>
          , T. AlMoslmi,
          <article-title>Recent progress of named entity recognition over the most popular datasets</article-title>
          ,
          <source>in: 2019 First International Conference of Intelligent Computing and Engineering (ICOICE)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Al-Moslmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Gallofré</given-names>
            <surname>Ocaña</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Opdahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Veres</surname>
          </string-name>
          ,
          <article-title>Named entity extraction for knowledge graphs: A literature overview</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>32862</fpage>
          -
          <lpage>32881</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Fernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fuentes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Fisteus</surname>
          </string-name>
          ,
          <article-title>The news ontology: Design and appli-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>O.-E.</given-names>
            <surname>Ganea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ganea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lucchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Eickhof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <article-title>Probabilistic bag-of-hyperlinks model for entity linking</article-title>
          ,
          <source>in: Proceedings of the 25th WWW, WWW '16</source>
          ,
          <string-name>
            <given-names>WWW</given-names>
            <surname>Conference</surname>
          </string-name>
          ,
          <year>2016</year>
          , p.
          <fpage>927</fpage>
          -
          <lpage>938</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [34]
          <string-name>
            <surname>D. B. Nguyen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Theobald</surname>
          </string-name>
          , G. Weikum, Jnerd:
          <article-title>Joint named entity recognition and disambiguation with rich linguistic features</article-title>
          ,
          <source>Transactions of the ACL 4</source>
          (
          <year>2016</year>
          )
          <fpage>215</fpage>
          -
          <lpage>229</lpage>
          . arXiv:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00094</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>O.-E.</given-names>
            <surname>Ganea</surname>
          </string-name>
          , T. Hofmann,
          <article-title>Deep joint entity disambiguation with local neural attention</article-title>
          ,
          <source>in: Proceedings of the 2017 Conference on EMNLP, ACL</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2619</fpage>
          -
          <lpage>2629</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kolitsas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.-E.</given-names>
            <surname>Ganea</surname>
          </string-name>
          , T. Hofmann,
          <article-title>End-to-end neural entity linking</article-title>
          ,
          <source>in: Proceedings of the 22nd Conference on Computational Natural Language Learning</source>
          , ACL,
          <year>2018</year>
          , pp.
          <fpage>519</fpage>
          -
          <lpage>529</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <source>Neural collective entity linking</source>
          ,
          <year>2018</year>
          . arXiv:
          <year>1811</year>
          .08603.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>P.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Titov</surname>
          </string-name>
          ,
          <article-title>Improving entity linking by modeling latent relations between mentions</article-title>
          ,
          <source>in: Proceedings of the 56th ACL, ACL</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1595</fpage>
          -
          <lpage>1604</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Marinho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F. T.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <article-title>Joint learning of named entity recognition and entity linking</article-title>
          ,
          <source>in: Proceedings of the 57th ACL, ACL</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>190</fpage>
          -
          <lpage>196</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>D.</given-names>
            <surname>Allemang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hendler</surname>
          </string-name>
          ,
          <article-title>Semantic Web for the Working Ontologist</article-title>
          , second edition ed., Morgan Kaufmann,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hendler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lassila</surname>
          </string-name>
          , et al.,
          <article-title>The semantic web</article-title>
          ,
          <source>Scientific american 284</source>
          (
          <year>2001</year>
          )
          <fpage>28</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>M.</given-names>
            <surname>Röder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          , GERBIL
          <article-title>- benchmarking named entity recognition and linking consistently</article-title>
          ,
          <source>Semantic Web</source>
          <volume>9</volume>
          (
          <year>2018</year>
          )
          <fpage>605</fpage>
          -
          <lpage>625</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>