<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Relation Linking to Knowledge Bases via CLOCQ</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Philipp Christmann</string-name>
          <email>pchristm@mpi-inf.mpg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rishiraj Saha Roy</string-name>
          <email>rsaharo@mpi-inf.mpg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gerhard Weikum</string-name>
          <email>weikum@mpi-inf.mpg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Question Answering, Knowledge Bases, Entity Linking, Relation Linking</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Max Planck Institute for Informatics and Saarland University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>Curated knowledge bases (KBs) contain billions of facts with millions of entities and thousands of predicates. Question answering (QA) systems are supposed to access this knowledge to answer users' factoid questions. Entity linking and relation linking are integral ingredients of many such QA systems, and aim to link mentions in the question to concepts in the KB. The quality of these linking modules is of high importance: a single error in linking can result in a failure for the whole QA system. The SMART 2022 Task poses challenges for entity and relation linking to evaluate the performance of diferent approaches. In this work, we adapt and extend our prior work CLOCQ. CLOCQ computes top- linkings for each mention to make up for potential errors, with  set automatically based on an ambiguity score. As an extension, we design a module that prunes linkings for irrelevant mentions which helps to improve precision. We found that there is a trade-of between recall and precision: higher  boosts recall (up to 0.87 for entity linking), while lower  leads to high precision performance. The best choice for the linking modules may highly depend on the specific QA system, and whether it can make use of higher recall in the presence of noise.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Question answering (QA) systems provide natural interfaces for accessing human knowledge.
Such human knowledge can be stored in large-scale knowledge bases (KBs) like Wikidata [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
DBpedia [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], YAGO [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Freebase [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and industrial counterparts (at Amazon, Apple, Google,
Microsoft, etc.). KBs contain facts consisting of entities, relations, types, and literals. The standard
way of storing KB facts are triples consisting of a subject, a predicate and an object.
Motivation and problem. QA systems operating over KBs mostly follow one of the following
two themes [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]: (i) approaches with an explicit query create a logical form, for example, a
SPARQL query, and fill the query slots with entities and relations linked with
mentions in the
question [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], or (ii) approaches without an explicit query first link entities and relations
to retrieve a search space consisting of KB facts, which is then searched for identifying the
answer [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. For either of the two approaches, being able to identify mentions of entities and
relations, and linking these mentions to KB items, is a key obstacle in the QA pipeline (linking
may also be referred to as disambiguating). Even single errors in these entity linking or relation
linking modules can lead to a complete failure of the QA pipeline, which is why their quality
is integral to the performance of the entire QA system. Note that without loss of generality,
mentions of types or general concepts may be linked as well, and can be used in the remainder
of the QA pipeline.
      </p>
      <p>Consider the following running example on the TV series House of the Dragon following the
narratives of George R.R. Martin:</p>
      <p>“Who plays Viserys in GRRM’s latest HBO series?”</p>
      <p>Linking the mentions in the question to the KB (Wikidata for this example) is a non-trivial
task that requires an understanding of the question as a whole. The entity mention “HBO”
may refer to the H B O c o m p a n y , the H B O n e t w o r k , or to the H o l l y w o o d B o w l O r c h e s t r a . Understanding
that the question is on a TV series helps to identify H B O n e t w o r k as the correct entity. Similarly,
“plays” semantically or lexically matches with many relations in the KB, like p l a y s f o r t e a m ,
i n s t r u m e n t , n u m b e r o f p l a y s , c h a r a c t e r r o l e , or t i m e p l a y e d . The intended relation c h a r a c t e r r o l e
only becomes clear from the question context.</p>
      <p>“Viserys” is even harder to link, since there are diferent characters named Viserys in the
Game of Thrones universe: V i s e r y s I I I T a r g a r y e n , the more well-known character from Game
of Thrones, and V i s e r y s I T a r g a r y e n from the more recent House of the Dragon series. Thus,
the mention “Viserys” is quite ambiguous, even if the general context of the question is clear. A
deep understanding of the question is required to correctly link “Viserys” to V i s e r y s I T a r g a r y e n .
Note that in case any of these disambiguations are incorrect, there is little hope to return the
correct answer P a d d y C o n s i d i n e to the user.</p>
      <p>
        To further investigate QA modules and pinpoint failure cases, the SMART Task 3.01
(colocated with ISWC 2022) poses tasks for entity linking (Task 3) and relation linking (Task 2).
There is also a task on answer type prediction (Task 1), which is not targeted in this work.
Approach and contribution. In this work we adapt our recently proposed CLOCQ
framework [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to these tasks. CLOCQ is an unsupervised framework that provides many
functionalities related to QA, and is made available as open-source code and as a public API2. These
functionalities include basic KB methods like retrieving the aliases, frequency, or 1-hop
neighborhood of a KB item, or computing the KB connectivity or the shortest path between two KB
items. The core algorithm presented in the paper aims to retrieve a search space for a user
question, faciliating QA methods without an explicit query. As an intermediate step and result,
mentions in the question are linked to KB items when retrieving the search space. For linking
to KB items CLOCQ implements two key ideas. First, all mentions should be linked jointly,
considering the coherence of the disambiguated KB items. This follows the intuition that the
question needs to be considered as a whole. CLOCQ links not only entities and relations, but
also types and general concepts, providing disambiguations for each mention in the question.
Second, the linking modules should make up for potential errors. When disambiguating highly
ambiguous mentions, like “Viserys” in the running example, the linking modules should take
this ambiguity into account and provide the QA system with several possible linkings. CLOCQ
provides a mechanism to detect the ambiguity of a mention, based on an entropy measure of
      </p>
      <sec id="sec-1-1">
        <title>1https://smart-task.github.io/2022/ 2https://clocq.mpi-inf.mpg.de</title>
        <p>1
2
3
…
15
: “plays”</p>
        <p>KB item
play
plays for team
instrument</p>
        <p>…
character role
coh 
0.93 1
0.84 2
0.81 4
0.61 5</p>
        <p>KB item
character role</p>
        <p>play
instrument
plays for team
rel  KB item
0.76 1 play
0.71 2 plays for team
0.65 3 instrument
0.55 15 character role</p>
        <p>: “Viserys”
 KB item
1 Viserys III Targaryen
2 Viserys I Targaryen
3 Viserys II Targaryen
4 Vinery Stud Stakes
… …
…</p>
        <p>: “HBO”
 KB item
1 HBO company
2 HBO network
3 Hollywood Bowl Orchestra
4 HBO Max (streaming)
… …</p>
        <p>…
One obstacle with adapting CLOCQ on entity or relation linking tasks, is that it, by design,
disambiguates all mentions in the question. It does not diferentiate between entities, relations,
types or other concepts. This helps when retrieving a search space, but can hurt precision of
linking results. For example, CLOCQ might link “latest” and “series” to the KB, even if these
mentions are irrelevant. We therefore propose a simple pruning module, that identifies which
mentions should be linked, and prunes linkings for other mentions. The module is implemented
with a fine-tuned sequence generation model that is trained using distant supervision.</p>
        <p>By evaluating CLOCQ on the entity and relation linking tasks of SMART 3.0 challenge, we
essentially investigate its applicability to QA approaches generating an explicit query. We show
that top- disambiguations can help boosting recall, at the cost of decreasing scores for precision
and F1 score. Further, we find that the mention-pruning module helps to improve the precision
and F1 score substantially on the entity linking task.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. The CLOCQ linking process</title>
      <sec id="sec-2-1">
        <title>We first introduce the complete workflow of the</title>
      </sec>
      <sec id="sec-2-2">
        <title>CLOCQ algorithm. For further discussion and details (e.g. on the fact-centric KB index underlying the CLOCQ framework), please refer to the original paper [8]. Fig. 1 shows an overview of the linking process.</title>
        <sec id="sec-2-2-1">
          <title>2.1. Retrieving disambiguation candidates</title>
          <p>Consider our running example “Who plays Viserys in GRRM’s latest HBO series?”. Our goal is to
link mentions in the question (“plays”, “Viserys”, “GRRM”, “HBO”, “series”) to items in the KB.
Mentions in the question can be single question words or phrases. Named entity phrases can
for example be detected using named entity recognition (NER).</p>
          <p>We first collect candidates from the KB using a standard lexical matching score (like
TFIDF or BM25) for each mention  1 …   .</p>
          <p>
            would be 5 in our example, and stopwords are
dropped. Here   is analogous to a search query, while each item  in the KB resembles
a document in a corpus. This “document” is created by concatenating the item label with
textual aliases and descriptions available in most KBs [
            <xref ref-type="bibr" rid="ref1 ref4">1, 4</xref>
            ]. This results in  ranked lists
{ 1 = ⟨ 11,  12, …⟩;  2 = ⟨ 21,  22, …⟩; …
          </p>
          <p>= ⟨ 1 ,  2 , …⟩} of KB items   , one list   for each   ,
scored by degree of match between the mentions and KB items.</p>
          <p>A ranked lexical match list for “plays” could look like:
 1 = ⟨1 : p l a y , 2 : p l a y s f o r t e a m , 3 : i n s t r u m e n t , 4 : n u m b e r o f p l a y s , 5 : t i m e p l a y e d ,
6 : p l a y w r i g h t , 7 : g u i t a r i s t , 8 : P l a y s c o l l e c t i o n , …, 1 5 : c h a r a c t e r r o l e , …⟩
with the ideal disambiguation being shown in bold. The list for “HBO” could be:
 4 = ⟨1 : H B O c o m p a n y , 2 : H B O n e t w o r k , 3 : H o l l y w o o d B o w l O r c h e s t r a , …⟩
Note that the correct KB item for   can sometimes be very deep in individual lists   . For example,
c h a r a c t e r r o l e is at rank 15 in  1.
combination for us would be:</p>
          <p>
            Next, each list   is traversed up to a depth  to fetch the top- items per mention. The goal

is to find combinations of KB items ⟨  ⟩=1 that best match the question. For instance, an ideal
{ c h a r a c t e r r o l e , V i s e r y s I T a r g a r y e n , G e o r g e R . R . M a r t i n , H B O n e t w o r k , T V s e r i e s }
These combinations come from the Cartesian product of items in the  lists, and would have  
possibilities if each combination is explicitly enumerated and scored. This is cost-prohibitive:
since we are only interested in some top- combinations, as opposed to a full or even extended
partial ordering, a more eficient way of doing this would be to apply top-  algorithms [
            <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
            ].
These prevent complete scans and return the top- best combinations eficiently.
          </p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2. Scoring candidates</title>
          <p>Thus, we propose an approach using top- algorithms to overcome this challenge. To go beyond
shallow lexical matching, our proposal is to construct multiple lists per question token, each
reflecting a diferent relevance signal
. Specifically, we obtain one list 
 for each mention   and
score  . Then, we apply top- algorithms on these lists to obtain the disambiguation of each
question token individually.</p>
          <p>Note that considering the question as a whole is a key criterion for our scoring mechanism.
Therefore, we integrate two global relevance signals. Specifically, a candidate KB item
combination that fits well with the intent in the question is expected to have high semantic coherence
and high graph connectivity within its constituents. These can be viewed as proximity in latent
and symbolic spaces. Further, candidates should match well on question and mention levels.
These motivate our four relevance signals for each item   in list   below.</p>
          <p>
            Coherence. We consider global signals for semantic coherence and graph connectivity, which
are inherently defined for KB item pairs, on a global level, instead of single items. Therefore,
we need a technique to convert these signals into item-level scores. The idea is to use a max
operator over all candidate KB item pairs involving a candidate at hand. More precisely, the
coherence score of an item   is defined as the maximum item-item similarity (averaged over
pairs of lists) this item can contribute to the combination. The pairwise similarity is obtained
by the cosine value between the embedding vectors of the two KB items, min-max normalized
from [−1, +1] to [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ]:
Connectivity. This is the second global signal considered, and captures a diferent form of
proximity. Every KB can be viewed as an equivalent knowledge graph (KG), where entities,
predicates and other KB items are nodes, and edges run between components of the same
fact [
            <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
            ]. We define KB items that are part of the same fact to be in the 1-hop neighborhood
of each other, those that are connected via members of another fact as in the 2-hop, and so on [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ].
We assign items in one hop of each other to have a distance of 1, those in two hops to have a
distance of 2, and ∞ otherwise. Almost all KB items are within three or four hops of each other,
and thus distances beyond two hops cease to be a discriminating factor. We define connectivity
scores as the inverse of this KB distance. So we obtain 1, 0.5, and 0, respectively for 1-, 2-, and
&gt;2-hop neighbors.
          </p>
          <p>ℎ(  ) =</p>
          <p>1
 − 1
∑ max ( ⃗
≠</p>
          <p>,  ⃗ )


(
 ) =</p>
          <p>
            1
 − 1
∑ max (
≠
 ,   )
Term match. This score is intended to take into account the degree of lexical term match (via
TF-IDF, BM25, or similar) for which   was admitted into   . However, such TF-IDF-like weights
are often unbounded and may have a disproportionate influence when aggregated with the
other signals, that are all in the closed interval [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ]. Thus, we simply take the reciprocal rank
of   in   as a representative match score to have it in the same [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] interval:
ℎ(
 ) = 1/ (
 ,   )
(1)
(2)
(3)
(4)
          </p>
          <p>The global connectivity score is then converted to an item-level score analogously to the
coherence, using max aggregation over pairs. Formally, we define the connectivity of   as:
 (  ) = avg  ≠ 
( ⃗
 ,  ⃗ )
Note that (</p>
          <p>
            ) ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] for all   .
          </p>
          <p>
            Question relatedness. We estimate semantic relatedness of the KB item   to the whole
input question  by averaging pairwise cosine similarities between the embeddings of the item
and each term   . The same min-max normalization as for coherence is applied. To avoid
confounding this estimate with the question term for which   was retrieved, we exclude this
from the average. We define semantic relatedness as:
2.3. Finding top- across sorted lists
We then sort each of these 4 ⋅  lists in descending score-order. Note that for each   and each
score  , all lists   hold the same items (those in the original   ). Top- algorithms operating
over such multiple score-ordered lists, where each list holds the same set of items, require
a monotonic aggregation function over the item scores in each list [
            <xref ref-type="bibr" rid="ref10 ref11 ref14 ref15">10, 11, 14, 15</xref>
            ]. Here,
we use a linear combination of the four relevance scores as this aggregate:   (  ) =
ℎℎ ⋅ ℎ(  ) + ℎ ⋅ (  ) + ℎ ⋅  (  ) + ℎℎ ⋅ ℎ(  ), where hyperparameters are
tuned on a dev set, and ℎℎ + ℎ + ℎ + ℎℎ = 1. Since each score lies in [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ], we also
have   (⋅) ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] . We use the threshold algorithm (TA) with early pruning [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] on these
score-ordered lists. TA is run over each set of 4 sorted lists ⟨ 1 ,  2 ,  3 ,  4 ⟩, corresponding to one
mention   , to obtain the top- best KB items { ∗} per   . These KB items are then the top-
linkings for a specific mention as predicted by our system.
2.4. Automatically setting 
Choosing an appropriate  is non-trivial, a process that is often mention-specific. Intuitively,
one would like to increase  for ambiguous mentions in the question. For example, “plays”
can refer to many KB items. By increasing  one can account for potential disambiguation
errors. On the other hand, “GRRM” is not as ambiguous, which is why setting  =1 should sufice.
The ambiguity of a mention is closely connected to that of uncertainty or randomness: the
more uncertainty there is in predicting what a mention refers to, the more ambiguous it is.
This makes entropy a suitable measure of ambiguity. More specifically, for each mention,  KB
items are retrieved initially. These items form the sample space of size  for the probability
distribution. The numbers of KB facts with these items form a frequency distribution that
can be normalized to obtain the required probability distribution. We compute the entropy of
this probability distribution as the ambiguity score of a mention, and denote it as (  ). By
definition, 0 ≤ (  ) ≤ log2  . Practical choices of  and  does not exceed 5 and 50 respectively,
and hence  and log2  are in the same ballpark (log2 50=5.6). This motivates us to make the
simple choice of directly setting  as (  ). Specifically, we use  = ⌊(  )⌋ + 1 to avoid the
situation of  =0. Fig. 1 shows a possible “auto- ” (automatic choice of  ) setting for our running
example, and the corresponding top- linkings.
          </p>
          <p>“plays” is highly ambiguous, and thus  is set to a relatively high value. “Viserys” and “HBO”
can also refer to diferent concepts. The word “GRRM” is relatively unambiguous.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Adapting CLOCQ to linking tasks</title>
      <p>The native CLOCQ method was primarily designed for retrieving a search space of relevant KB
facts for a given user question. Therefore, the linking of entities and relations is more of a means
to an end here, and is not optimized for the specific tasks. We identified two key obstacles when
using the plain CLOCQ method for entity and relation linking.</p>
      <p>The first obstacle is that CLOCQ links all mentions in the question. Not only entities and
relations are linked, but also types (like “series”) and other mentions (e.g. “latest” in the running
example). While linking such mentions is beneficial for coherence among linkings, and can
improve initiating a search space, it often adds undesired noise to the outputs when evaluating
entity or relation linking capabilities.</p>
      <p>For example, CLOCQ would link “series” to the entities TV series and the relation “latest” to
latest start date for the running example, which would both decrease precision.</p>
      <p>The second obstacle is that CLOCQ does not diferentiate between entity and relation
mentions. Any mention is disambiguated to the KB items that score best w.r.t. the specified scoring
mechanism. For example, the relation mention “plays” could also be linked to the entities play or
playwright, similarly as “director” could be linked to the relation director or to the type director.
Again, this does not hurt when initiating the search space, but definitely restricts the relation
linking capabilities of CLOCQ.</p>
      <p>In the following, we will discuss how we optimized CLOCQ for the entity and relation linking
tasks of the SMART 2022 challenge. The same intuitions apply to other entity or relation linking
problems as well.</p>
      <sec id="sec-3-1">
        <title>3.1. Post-hoc pruning module for entity linking</title>
        <p>As discussed, linking all mentions jointly is beneficial for linking results, since it considers
information on the whole question in the linking stage. This follows our intuition of
understanding the question in its entirety. Also, we did not want to touch the main CLOCQ algorithm
itself. Instead, our idea is to prune the linkings returned by CLOCQ. We propose a simple
approach: the decision, whether an entity should be included in the linking results or not,
should be made depending on the mention the entity was disambiguated for. If the mention
should be disambiguated, we add the linking, if not it is dropped from the results. For example,
the mentions “plays”, “latest” and “series” should not be disambiguated when solving an entity
linking task.</p>
        <p>
          Training. We aim to learn which mentions should be linked (and which not) using distant
supervision on the training data provided as part of the SMART task. Given a training instance,
we first obtain all ⟨mention, KB entity⟩ pairs (i.e. the linkings) using the native CLOCQ
method. From the training instance, we know the gold entities that should be linked. We
then consider all mentions, that are linked with a gold entity by CLOCQ, as mentions that
should be disambiguated. For the running example we would obtain the mentions “Viserys”,
“GRRM”, and “HBO”, assuming the gold entity set ⟨Viserys I Targaryen, George R.R. Martin, HBO
network⟩. With this information, we can create a training instance for learning the relevant
entity mentions. The input is the question, and the output is the concatenation of the mentions
linked to gold entities “Viserys|GRRM |HBO” separated by the special token “|”. We then simply
ifne-tune a pre-trained sequence generation model on this data. For this purpose we used
BART [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], which was found to be efective when text is copied and manipulated from the input
to autoregressively generate the output.
        </p>
        <p>Inference. At inference time, the pruning module is applied in a post-hoc manner. We first run
CLOCQ and our trained pruning module on the input question. We then keep a ⟨mention, KB
entity⟩ pair only if the disambiguated mention matches with any mention generated by our
pruning module. Here, matching is relaxed to substring matching. For example, if the pruning
module generates “in GRRM” or “GRR”, linking pairs for “GRRM” are still kept. In addition, for
the entity linking task, we remove all relations from the linkings (relation identifiers start with
a “P” in Wikidata).</p>
        <p>Note that the post-hoc pruning module is also capable of learning benchmark-specific
properties. The SMART 2022 entity linking task can often (but not always) require linking types or
concepts. For example, “airline” in “DC-3 is operated by which airline?” should be linked, but
not “continents” in “How many continents are in Antarctica?”. Such benchmark characteristics
are learned implicitly by our pruning module, which can help improve the performance.
3.2. Increasing  for relation linking
As mentioned earlier, relation mentions may also be linked to entities that are coherent with
the other linkings. We found that this can often be the case for CLOCQ linkings, and that the
appropriate relation can be deeper in the ranked linkings of a mention than the automatically set
cut-of length  . However, relations can easily be diferentiated from entities via the identifier
(relation identifiers start with a “P”, entity identifiers with a “Q” in Wikidata). We therefore
simply set  =50 and  =40 to increase the probability of obtaining relations, and prune all entities
from the linkings. Finally, we explore the efect of keeping either the top-ranked relation per
mention, or all relations per mention as the final result.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental setup</title>
        <p>SMART 2022 tasks. Statistics on the entity linking and relation linking tasks of the SMART
Task 2022 can be found in Table 1 and 2. For both tasks, the question and the corresponding
gold entities or relations are given for the train set. For the test set, only the question is given.
The datasets are made publicly available3.</p>
        <p>Metrics. We use the standard metrics of the SMART 2022 Task for both tasks: i) precision,
that measures what fraction of the predicted linkings are correct, ii) recall, that measures what
fraction of the gold linkings are found, and iii) F1 score, the harmonic mean of precision and
recall. The results on the test set were provided by the task organizers, after we submitted our
system results (since the gold standard is not publicly available).</p>
        <p>
          Initialization of CLOCQ. In our experiments, we use Wikidata [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] as the knowledge base. We
access CLOCQ via the public API4. The API currently uses a cleaned Wikidata dump5 from 31
January 2022, which has 94 million entities and 3, 000 predicates.
        </p>
        <p>
          All parameters are kept at default values ( =20, ℎℎ =0.1, ℎ =0.3, ℎ =0.2, ℎℎ =0.4) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]
unless stated otherwise. For the entity linking task, we randomly sample 10, 000 training
instances and use it as our development set (dev set) for choosing the best pruning module.
Since CLOCQ is an unsupervised method, the train set is only used for training the pruning
module on the entity linking task, and for tuning the parameters  (=50) and  (=40) on the
relation linking task.
        </p>
        <p>Initialization of pruning module. For implementing the pruning module, we use the
pretrained BART model available on the Hugging Face library6. We make the code for the pruning
module publicly available7. We choose  =1 for CLOCQ during distant supervision (Sec. 3.1).
The model is fine-tuned for 5 epochs, with 500 warm-up steps, a batch size of 10, and a weight
decay of 0.01. We employ cross-entropy as the loss function. After each epoch, we run the
model on the withheld dev set, and finally choose the model with the lowest loss there.
CLOCQ variants. On the entity linking task, we compare the linking results of the native
CLOCQ method with  =1 or  =AUTO, with the linking results after applying the post-hoc
pruning module (again,  =1 or  =AUTO). On the relation linking task, we consider either the
top-ranked relation per mention, or all relations per mention, as returned by CLOCQ.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Entity linking</title>
        <p>The results on the entity linking task are shown in Table 3. When considering only the top-1
entity per mention, CLOCQ obtains a recall of 0.766. Setting  =AUTO improves the recall by
≃ 0.1, indicating that potential errors can be overcome. Further, activating the pruning module
3https://github.com/smart-task/smart-2022-datasets
4https://clocq.mpi-inf.mpg.de
5https://github.com/PhilippChr/wikidata-core-for-QA
6https://huggingface.co/facebook/bart-base
7https://github.com/PhilippChr/CLOCQ-pruning-module
can drastically improve the precision of CLOCQ, and thus also the F1 score. When adding the
pruning module for  =1, precision jumps from 0.281 to 0.714. Also, the results indicate that
mostly noise is pruned, since the recall remains fairly stable. Again, recall can be substantially
improved (≃ 0.1) by setting  =AUTO, with the cost of a lower precision and F1 score.</p>
        <p>
          The results indicate that the pruning module can successfully reduce noise in the entity linking
results. Further, we found that there is a trade-of between precision and recall, which makes it
impossible to determine a best variant for all scenarios. The best choice may highly depend on
the specific QA system. Some QA systems require precise linkings for each mention [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], while
others can cope with some noise [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], and leverage the boosted recall.
        </p>
        <p>
          For example, a QA system optimized for eficiency may only issue exactly one explicit query
to the KB [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], and may therefore rather go with  =1 and an activated pruning module. On
the other hand, if executing multiple queries is afordable for the QA system [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], using top-
linkings might help.
        </p>
        <p>
          For example, the queries with incorrectly linked entities in top positions might not return
any result, while queries with lower ranked entities are able to identify the correct answer.
Re-ranking results after query execution might also be an option. When following a graph-based
approach without explicit queries, setting  =AUTO was found to be beneficial [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>An anecdotal example from the dev set for which an automatically increased  helps is: “What
was Toby Wright’s profession?”. There are diferent persons named “Toby Wright” in the KB, and
the context does not help to resolve the ambiguity. With  =1, only the incorrect Toby Wright
(football player) is returned. When setting  =AUTO, CLOCQ identifies the ambiguity of the
mention and sets  =2 for this mention. The correct entity Toby Wright (record producer) is then
fetched at second rank of the results.</p>
        <p>Another interesting question from the dev set is “which footballer was born in middlesbrough?”.
The mention “footballer” indicates that the question is on the topic of football, and therefore
CLOCQ provides Middlesbrough F.C. (football club) as the top-ranked linking for
“middlesbrough”. However, in this question “middlesbrough” refers to the corresponding town. Again, in
the auto- mode, CLOCQ chooses a higher  (=3), and includes the correct entity in these top-3
linkings ⟨Middlesbrough F.C. (football club), Middlesbrough (borough), Middlesbrough (town)⟩.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Relation linking</title>
        <p>The results on the relation linking task are shown in Table 4.</p>
        <p>As for the entity linking task, considering more linkings per mention can help to boost
recall, by 0.05 in this case. Again, precision drops substantially, leading to a decreased F1 score.
Considering only the top-ranked relation per mention achieves the better F1 score. Interestingly,
the average number of relations per question in the system results is quite close to the average
number of gold relations (we assume that this property is similar on the train and test sets).</p>
        <p>
          Overall, the results indicate that relation linking may require methods optimized specifically
for this purpose. Still, being a general linking method, CLOCQ can provide the correct relation
for a substantial part of the questions, often bridging the lexical gap between the relation
mention and the surface form of the relation in the KB. Note that there are very few existing
systems that can perform both entity and relation linking: this is one of the novelties in CLOCQ.
Another such system is [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>For example, for the question “Which child of John Adams died on February 23, 1848?” from
the training set, CLOCQ correctly links “child” to child, and “died on” to date of death. However,
for questions like “What is the point in time that Nicolaus Cusanus was made cardinal by the
Holy Roman Church?”, CLOCQ failed to link the correct relations start time and position held.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Related work</title>
      <sec id="sec-5-1">
        <title>5.1. Entity linking</title>
        <p>
          There has been extensive research on entity linking and we discuss some prominent works
here. TagMe [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], one of the early yet efective systems, makes use of Wikipedia anchors to
detect entity mentions, looks up possible mappings, and scores these with regard to a collective
agreement implemented by a voting scheme. In AIDA [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], a mention-entity graph is established.
Then, the entity mentions are linked jointly by approximating the densest subgraph.
        </p>
        <p>
          Coming to more recent neural systems, REL [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] is a framework for end-to-end entity linking,
building on state-of-the-art neural components. ELQ [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] jointly performs mention detection
and linking, leveraging a BERT-based bi-encoder. These methods are optimized for computing
the top-1 entity per mention, and mostly give only the top-ranked entity in the disambiguation.
Top-1 entity linking is prone to errors that can afect the whole QA pipeline [
          <xref ref-type="bibr" rid="ref25">25, 26</xref>
          ].
SMART [27] introduces structured multiple additive regression trees, and applies the statistical
model on a set of (mention, entity)-pairs and corresponding features. Unlike most other works,
S-MART returns the top- disambiguations per mention. However, since it is a proprietary
entity linking system, their code is not available.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Relation linking</title>
        <p>Relation linking is particularly useful for QA systems constructing an explicit query. Early
approaches used paraphrase-based dictionaries [28] or patterns [29] to link relation mentions.
Following approaches often leveraged semantic parses [ 30] for relation linking, which has also
been shown to be efective in combination with neural models [ 31]. There is also a line of work
that approaches relation linking as a classification task [ 32, 26, 33]. While these methods often
achieve high accuracy, a common bottleneck is that only a fraction of all KB relations that
are provided in the benchmark can be recognized. Therefore, they are mostly applied in the
context of information extraction (IE), rather than QA. Finally, for previous iterations of the
SMART Task in 2020 and 2021, a range of relation linking methodologies has been proposed
and evaluated [34, 35].</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Joint entity and relation linking</title>
        <p>
          Entity and relation linking are complementary problems, where the results of one task can help
solving the other. Thus, either linking is often an intrinsic part of the QA pipeline itself, in
which entity and relation linking are implicitly solved in a joint manner [
          <xref ref-type="bibr" rid="ref13">13, 28, 36</xref>
          ]. EARL [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]
is a dedicated linking system that aims to leverage this intuition of joint disambiguation for
entity and relation linking tasks. CLOCQ generalizes this idea further, by initially linking any
mention to the KB. In this work, we evaluate the applicability of CLOCQ to both tasks, entity
and relation linking.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>
        We apply CLOCQ [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] on the entity and relation linking challenges of the SMART 2022 Task.
Since the original unsupervised algorithm links all mentions in the question, leading to a
substantial amount of noise, we propose a post-hoc pruning module. This supervised module
works on top of the linking results by CLOCQ, and prunes linkings for irrelevant mentions. The
pipeline depicts a hybrid of supervised and unsupervised modules, leveraging the strengths of
both worlds. The results on the SMART entity linking task indicate that the module successfully
reduces noise in the linkings, and helps to achieve the overall best F1 score of the CLOCQ
variants. Future work could target entity linking and relation linking in conversational settings,
where linking mentions can require understanding the whole conversation [
        <xref ref-type="bibr" rid="ref18">18, 37</xref>
        ].
conversational question answering over a large-scale knowledge base, in: EMNLP-IJCNLP,
2019.
[26] W.-t. Yih, M.-W. Chang, X. He, J. Gao, Semantic parsing via staged query graph generation:
      </p>
      <p>Question answering with knowledge base, in: ACL-IJCNLP, 2015.
[27] Y. Yang, M.-W. Chang, S-mart: Novel tree-based structured learning algorithms applied to
tweet entity linking, in: ACL-IJCNLP, 2015.
[28] M. Yahya, K. Berberich, S. Elbassuoni, M. Ramanath, V. Tresp, G. Weikum, Natural language
questions for the web of data, in: EMNLP, 2012.
[29] C. Unger, L. Bühmann, J. Lehmann, A.-C. Ngonga Ngomo, D. Gerber, P. Cimiano,
Templatebased question answering over RDF data, in: WWW, 2012.
[30] W.-t. Yih, X. He, C. Meek, Semantic parsing for single-relation question answering, in:</p>
      <p>ACL), 2014.
[31] T. Naseem, S. Ravishankar, N. Mihindukulasooriya, I. Abdelaziz, Y.-S. Lee, P. Kapanipathi,
S. Roukos, A. Gliozzo, A. Gray, A semantics-aware transformer model of relation linking
for knowledge base question answering, in: ACL-IJCNLP, 2021.
[32] D. Zeng, K. Liu, S. Lai, G. Zhou, J. Zhao, Relation classification via convolutional deep
neural network, in: COLING, 2014.
[33] J. Feng, M. Huang, L. Zhao, Y. Yang, X. Zhu, Reinforcement learning for relation
classification from noisy data, in: AAAI, 2018.
[34] N. Mihindukulasooriya, M. Dubey, A. Gliozzo, J. Lehmann, A.-C. N. Ngomo, R. Usbeck,
Semantic answer type prediction task (smart) at iswc 2020 semantic web challenge, arXiv
(2020).
[35] N. Mihindukulasooriya, M. Dubey, A. Gliozzo, J. Lehmann, A.-C. N. Ngomo, R. Usbeck,
G. Rossiello, U. Kumar, Semantic answer type and relation prediction task (smart 2021),
arXiv (2021).
[36] A. Abujabal, M. Yahya, M. Riedewald, G. Weikum, Automated template generation for
question answering over knowledge graphs, in: WWW, 2017.
[37] H. Joko, F. Hasibi, K. Balog, A. P. de Vries, Conversational entity linking: Problem definition
and datasets, 2021.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <article-title>Wikidata: A free collaborative knowledgebase</article-title>
          ,
          <source>in: CACM</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z. Ives,</surname>
          </string-name>
          <article-title>DBpedia: A nucleus for a Web of open data</article-title>
          ,
          <source>in: The Semantic Web</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          , G. Kasneci,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Weikum, YAGO: A core of semantic knowledge</article-title>
          ,
          <source>in: WWW</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bollacker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Paritosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sturge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , Freebase:
          <article-title>A collaboratively created graph database for structuring human knowledge</article-title>
          ,
          <source>in: SIGMOD</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. Saha</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Anand</surname>
          </string-name>
          ,
          <article-title>Question Answering for the Curated Web: Tasks and Methods in QA over Knowledge Bases</article-title>
          and Text Collections, Springer,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <article-title>Dialog-to-action: conversational question answering over a large-scale knowledge base</article-title>
          ,
          <source>in: NeurIPS</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bast</surname>
          </string-name>
          , E. Haussmann,
          <article-title>More accurate question answering on freebase</article-title>
          ,
          <source>in: CIKM</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Christmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Saha</given-names>
            <surname>Roy</surname>
          </string-name>
          , G. Weikum,
          <string-name>
            <surname>Beyond</surname>
            <given-names>NED</given-names>
          </string-name>
          :
          <article-title>Fast and efective search space reduction for complex question answering over knowledge bases</article-title>
          ,
          <source>in: WSDM</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dhingra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaheer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mazaitis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          , W. Cohen,
          <article-title>Open domain question answering using early fusion of knowledge bases and text</article-title>
          , in: EMNLP,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Anh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mofat</surname>
          </string-name>
          ,
          <article-title>Pruned query evaluation using pre-computed impacts</article-title>
          ,
          <source>in: SIGIR</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fagin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lotem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naor</surname>
          </string-name>
          ,
          <article-title>Optimal aggregation algorithms for middleware</article-title>
          ,
          <source>Journal of computer and system sciences 66</source>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pramanik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Saha</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abujabal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , G. Weikum,
          <article-title>Answering complex questions by joining multi-document evidence with quasi knowledge graphs</article-title>
          ,
          <source>in: SIGIR</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Christmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Saha</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abujabal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Weikum, Look before you hop: Conversational question answering over knowledge graphs using judicious context expansion</article-title>
          ,
          <source>in: CIKM</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schenkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Theobald</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Weikum, IO-Top-k: Index-access Optimized Top-k Query Processing</article-title>
          , in: VLDB Conference,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Buckley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Lewit</surname>
          </string-name>
          ,
          <article-title>Optimization of inverted vector searches</article-title>
          ,
          <source>in: SIGIR</source>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghazvininejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , L. Zettlemoyer, BART:
          <article-title>Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension</article-title>
          , in: ACL,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Abujabal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Saha</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yahya</surname>
          </string-name>
          , G. Weikum,
          <article-title>Never-ending learning for open-domain question answering over knowledge bases</article-title>
          ,
          <source>in: WWW</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Christmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Saha</given-names>
            <surname>Roy</surname>
          </string-name>
          , G. Weikum,
          <article-title>Conversational question answering on heterogeneous sources</article-title>
          ,
          <source>in: SIGIR</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abujabal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Roy</surname>
          </string-name>
          , G. Weikum,
          <article-title>Eficiency-aware answering of compositional questions using answer type prediction</article-title>
          ,
          <source>in: IJCNLP</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Lehmann,</surname>
          </string-name>
          <article-title>EARL: joint entity and relation linking for question answering over knowledge graphs</article-title>
          ,
          <source>in: ISWC</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ferragina</surname>
          </string-name>
          , U. Scaiella, TAGME:
          <article-title>On-the-fly annotation of short text fragments (by Wikipedia entities)</article-title>
          ,
          <source>in: CIKM</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Yosef</surname>
          </string-name>
          , I. Bordino,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fürstenau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pinkal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spaniol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Taneva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thater</surname>
          </string-name>
          , G. Weikum,
          <article-title>Robust disambiguation of named entities in text</article-title>
          , in: EMNLP,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>J. M. van Hulst</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Hasibi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Dercksen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Balog</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. P. de Vries</surname>
          </string-name>
          ,
          <article-title>Rel: An entity linker standing on the shoulders of giants</article-title>
          , in: SIGIR,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>B. Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Iyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mehdad</surname>
          </string-name>
          , W.-t. Yih,
          <article-title>Eficient one-pass end-to-end entity linking for questions</article-title>
          ,
          <source>in: EMNLP</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>T.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Jiang, Multi-task learning for</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>