<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Evaluation of LLMs on Long-tail Entity Linking in Historical Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marta Boscariol</string-name>
          <email>marta.boscariol@unito.it</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luana Bulla</string-name>
          <email>luana.bulla@phd.unict.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lia Draetta</string-name>
          <email>lia.draetta@unito.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Beatrice Fiumanò</string-name>
          <email>beatrice.fiumano@unibo.it</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emanuele Lenzi</string-name>
          <email>emanuele.lenzi@isti.cnr.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leonardo Piano</string-name>
          <email>leonardo.piano@unica.it</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Catania</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, University of Turin</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Information Engineering (DII), University of Pisa</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Department of Management, University of Turin</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Department of Mathematics and Computer Science, University of Cagliari</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Department of Modern Languages, Literatures and Cultures, University of Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>Institute of Information Science and Technologies (ISTI), National Research Council of Italy (CNR)</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <fpage>26</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>Entity Linking (EL) plays a crucial role in Natural Language Processing (NLP) applications, enabling the disambiguation of entity mentions by linking them to their corresponding entries in a reference knowledge base (KB). Thanks to their deep contextual understanding capabilities, LLMs ofer a new perspective to tackle EL, promising better results than traditional methods. Despite the impressive generalization capabilities of LLMs, linking less popular, long-tail entities remains challenging as these entities are often underrepresented in training data and knowledge bases. Furthermore, the long-tail EL task is an understudied problem, and limited studies address it with LLMs. In the present work, we assess the performance of two popular LLMs, GPT and LLama3, in a long-tail entity linking scenario. Using MHERCL v0.1, a manually annotated benchmark of sentences from domain-specific historical texts, we quantitatively compare the performance of LLMs in identifying and linking entities to their corresponding Wikidata entries against that of ReLiK, a state-of-the-art Entity Linking and Relation Extraction framework. Our preliminary experiments reveal that LLMs perform encouragingly well in long-tail EL, indicating that this technology can be a valuable adjunct in filling the gap between head and long-tail EL.</p>
      </abstract>
      <kwd-group>
        <kwd>Entity linking</kwd>
        <kwd>Long-tail entities</kwd>
        <kwd>Large language models</kwd>
        <kwd>Historical Documents</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Entity Linking (EL) is a fundamental task in Natural Language Processing (NLP) that involves the
identification and disambiguation of entity mentions in text, linking them to corresponding entries
in a reference Knowledge Base (KB), such as Wikipedia or Wikidata. Accurate EL enhances the
understanding of text by connecting unstructured data to structured knowledge, thereby enriching the
content with contextual meaning and facilitating more advanced text analytics.</p>
      <p>
        The vast majority of traditional EL approaches typically rely on machine learning [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], some with
rule-based approaches [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and others based on graph optimization [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        These methods, although efective in many cases, often struggle with ambiguous or obscure mentions,
particularly when dealing with long-tail entities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], i.e. entities that are infrequently mentioned or
have limited representation in available KBs. The scarcity of training data and the inherent diversity of
long-tail entities make accurate linking a persistent challenge [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>The advent of Large Language Models (LLMs), such as GPT and Llama, has opened new avenues
for EL. Their ability to understand complex language constructs suggests that they could enhance EL
CEUR</p>
      <p>
        ceur-ws.org
performance, especially in contexts where traditional methods falter. LLMs’ extensive pre-training on
several and diverse corpora allows them to handle a broad range of entities, including those that are
less common or not explicitly covered in the training data [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        To investigate EL performances with long-tail entities and assess the efectiveness of LLMs in this
task, the present study addresses two main research questions:
• How does the most reliable state-of-the-art EL tool perform with long-tail entities?
• Are LLMs suitable for long-tail entity linking?
To do so we evaluate the performance of two LLMs (GPT and Llama), in a long-tail entity linking
scenario using as benchmark MHERCL v0.11 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a manually annotated collection of sentences from
domain-specific historical texts. By comparing the performance of these LLMs against that of ReLiK
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], a state-of-the-art Entity Linking and Relation Extraction framework, this study aims to shed light
on the potential and limitations of LLMs in handling long-tail entity linking in specialized domains.
      </p>
      <p>While long-tail entities are a fairly well-known phenomenon, relatively few researchers have
addressed the long-tail EL task. For this reason, our work is part of a research area that is still largely
unexplored. Additionally, this study is part of an innovative line of research that leverages LLMs for
various knowledge graph-related applications. There is a clear need for further investigation into the
potential roles these technologies could play across diferent contexts.</p>
      <p>The present work is organized as follows: in Section 2, we briefly present the related work on entity
linking and long-tail entities. Section 3 describes the methodology adopted in the experiment. In
Section 4, we introduce the experimental setup, including the dataset we use and the state-of-the-art
baseline. Sections 5 and 6 respectively present the results obtained and the final considerations.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Entity linking, the task of associating mentions in text with their corresponding entities in a knowledge
base, has been extensively studied. Early approaches relied on heuristic-based methods [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ], and
among them a prominent system is DbPedia Spotlight [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which automatically annotates text with
DbPedia URIs, combining lexical matching techniques with context-based disambiguation. Significant
advancements have been achieved by Neural Entity Linking approaches, that leverage Deep Neural
Networks and Languages Models. GENRE [12] employs a sequence-to-sequence approach to
autoregressively generate unique entity names. CHOLAN [13] improved EL performance by relying on a
modular approach. First, it detects mentions with a BERT transformer, then it retrieves a list of candidate
WikiData entities, and finally it employs another BERT model enhanced with local sentence context and
Wikipedia entity descriptions to classify and link the mention to the correct entity. The aforementioned
approaches involve extracting entity mentions and then linking them to a proper KB, whereas in [14]
the author reverses this order by first retrieving candidate entities from the KB and then finding the
respective mentions in the text employing a Question Answering strategy. In contrast, ReLiK [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
introduces a novel state-of-the-art Entity Linking and Relation Extraction system based on a
RetrieverReader architecture. Its novelty resides in using a single forward pass to link all candidate entities and
extract relations, unlike previous methods that require separate passes for each candidate. This strategy
permits ReLiK to achieve up to 40x faster inference compared to other methods, while maintaining
strong performances. More recently, with the advent of LLMs, several researchers have implemented EL
solutions that take advantage of such technology. ChatEL [15] is a three-step entity linking framework
where, after retrieving a set of candidate entities with BLINK [16], an LLM is prompted first to augment
the entities mentions with meaningful descriptions to improve disambiguation and then to choose
the correct entity. Similarly, the LLMAEL pipeline [17] leverages LLMs as context augmenters for
traditional EL models such as GENRE and BLINK, coupling their task specification capabilities with the
extensive world knowledge of LLMs. Xin et al. note that this approach also enhances EL performances
in long-tail scenarios, as LLMs enrich EL models with additional knowledge on low-frequency entities,
1https://github.com/arianna-graciotti/historical-entity-linking/tree/main/benchmark
facilitating entity identification and linking. Despite these advances, however, the domain of long-tail
entity linking remains largely underexplored in current research, as most of the developed systems and
datasets are mainly designed to capture head entities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. LLM-based Entity Linking</title>
      <p>Entity linking usually involves two core tasks: (i) Entity Recognition, which detects and extracts the
entity mentions from the text, and (ii) Entity Disambiguation, where the entity is correctly linked
to its respective Knowledge base entry. As LLMs excel at capturing complex relationships between
words, we tackled the EL problem as a sequence-to-sequence translation, jointly performing mention
detection and linking with a single model interrogation. Formally, given a sentence S, comprising
within it a set of entities E, where each entity is uniquely represented by a unique label (e.g Wikipedia
page title), the model needs to identify each entity   ∈  along with its unique identifier. To accomplish
this, we prompted the LLMs to generate a JSON-style output having as a key the textual span of the
identified mention and as a value the respective Wikipedia page title. In an autoregressive fashion, the
model detects the textual mentions that refer to an entity and consequently translates them into the
corresponding unique identifier by probing from its knowledge. To further assist the model, we supplied
an example in the prompt, thus following a one-shot approach. The employed prompt is detailed below.</p>
      <sec id="sec-3-1">
        <title>Entity Linking Prompt</title>
        <p>You are a powerful Entity Linking system.</p>
        <p>Given a sentence, identify the key entities and output their exact labels as found on the corresponding Wikipedia pages.
Generate a structured JSON output, formatted as [{"Entities":{"text entity span": "Wikipedia page title"}].
Here there are some examples:
#
Sentence:"of Rameau was represented in 1735, it was a balletopera Les Indes galantes."
Output: [{"Entities":{"Rameau":"Jean-Philippe Rameau","Les Indes galantes":"Les Indes galantes"}]
As an alternative identifier, it might be conceivable to exploit the QID, the unique identifier of Wikidata
entities. However, through a preliminary experiment, we noticed that LLMs tended to fictionalize QIDs.
In that experiment, GPT 3.5 achieved a precision of less than 1%. We hypothesize that this behavior is
caused by the fact that QIDs mainly consist of numbers and since they don’t follow linguistic patterns,
LLMs, which are trained primarily on text don’t intuitively know how to generate them accurately. As
a result, LLMs end up generating plausible-sounding yet fictional QIDs based on learned patterns. For
the aforementioned reasons, we decided to exploit the Wikipedia page title as the unique identifier. For
clarity, we also specify that we employ the same strategy and prompt for all the compared LLMs, which
are detailed in the following section.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental setup</title>
      <p>Models. For our study, we harness two advanced LLMs, namely GPT 3.5 and LLama 3 [18], both in
their instruct versions. Specifically, our experiments are conducted using OpenAI’s
GPT-3.5-turboinstruct2, Meta LLama-3-8B-instruct3 and Llama-3-70B4. GPT 3.5 Turbo is a cost-efective,
cuttingedge tool ensuring deep contextual understanding, heightened accuracy and faster processing speed
compared to other GPT models. LLama 3, available in configurations with 8 billion and 70 billion
parameters, is an open-source, highly versatile model that ofers state-of-the-art performances in a wide
variety of NLP tasks, outperforming GPT models in diferent benchmarks and having longer context
window compared to GPT-3.5 Turbo.</p>
      <p>Dataset. The performance of LLMs is evaluated on the Musical Heritage Historical named Entities
Recognition, Classification and Linking (MHERCL) benchmark. MHERCL v0.1 consists of
English</p>
      <sec id="sec-4-1">
        <title>2https://platform.openai.com/docs/models/gpt-3-5-turbo</title>
        <p>3https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
4https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct
language sentences extracted from the Periodicals module of the Polifonia Textual Corpus5 (PTC), a
diachronic corpus covering the domain of Musical Heritage. As part of the PTC creation, Optical
Character Recognition (OCR) was leveraged to extract text from scans of historical documents. Since
OCR on historical documents can be particularly challenging and prone to errors due to factors like
degraded image quality or archaic fonts, this process inevitably introduced some noise into the dataset.
Around 930 sentences were extracted from the PTC to create the MHERCL benchmark. Each sentence
was manually annotated with EL information, including entity type and Wikidata QID. In this work,
we use MHERCL v0.1.2, an expanded version of the benchmark that includes 928 sentences, 969 unique
named entity mentions (NE) identified by a QID, and 59 diferent NE types. Table 1 provides a synthetic
overview of the dataset statistics. Entities that were not assigned a QID, mainly due to the lack of a
corresponding Wikidata entry, were assigned a NIL label and are excluded from the dataset statistics.
Since the Periodicals module of the PTC consists of music-specialised documents published between 1823
and 1900, the MHERCL dataset features a high concentration of niche, domain-specific and historical
knowledge, serving as a robust benchmark for assessing EL performances in long-tail scenarios.</p>
        <sec id="sec-4-1-1">
          <title>Dataset</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>Lang. Sent. Tokens</title>
        </sec>
        <sec id="sec-4-1-3">
          <title>Unique NE</title>
        </sec>
        <sec id="sec-4-1-4">
          <title>MHERCL v0.1.2</title>
          <p>EN
928</p>
          <p>Baseline. As our baseline, we leverage ReLiK, a state-of-the-art framework for Entity Linking
and Relation Extraction, on the MHERCL dataset. Based on a retriever-reader architecture, ReLiK
outperforms its competitors in both in-domain and out-of-domain settings, achieving better results in
terms of performance, inference speed, and flexibility. ReLiK is available in three versions: small, base,
and large. In our study, we leverage ReLiK-base to identify and link entity mentions within MHERCL
sentences to their corresponding Wikidata entries. ReLiK links entities to a knowledge base other than
WikiData, providing Wikipedia page IDs instead of WikiData QIDs. To map the extracted entities to
their corresponding WikiData entries, we queried ReLiK’s reference knowledge base, KILT [19]. KILT,
derived from a Wikipedia dump from August 1, 2019, allowed us to retrieve WikiData IDs using either
Wikipedia page IDs or entity titles.</p>
          <p>Evaluation. To evaluate and compare the performance of the selected models, we employ confusion
matrix metrics such as precision, recall, and F1-score, which are formally defined as:
   =
 =
 1 =</p>
          <p>+</p>
          <p>+  
2 ∗    ∗</p>
          <p>+ 
In computing these scores, we count a True Positive (TP) when the model’s prediction matches the
ground truth, a False Positive (FP) when the model’s prediction does not match annotations in the
ground truth, and a False Negative (FN) when the model is not able to identify and link the entity.
Entities labeled as NIL in the MHERCL benchmark are excluded from our experiments because they
generally lack corresponding Wikidata or Wikipedia entries. We further specify that, in the case of
ReLiK’s results, we evaluated the match between the QID identified by ReLiK and the corresponding
QID in the ground truth. In contrast, for LLMs we assessed the correct match between the predicted
Wikipedia page title and the Wikipedia title retrieved using the baseline’s QID.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>5https://github.com/polifonia-project/Polifonia-Corpus</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>This section outlines the quantitative results of our preliminary study in historical long-tail Entity
Linking with Large Language Models. Table 2 highlights the comparison between LLMs against ReLiK,
over the entire dataset, with no diferentiation in the distribution of entities. This comparison shows
that ReLiK is highly accurate, generating a low number of false positives, reaching a precision of 72.8%.
Still, it struggles to find an adequate amount of entities in such a niche domain, retrieving the 45%
of annotated entities as evidenced by the recall. Indeed, exception made for LLama3-8b, the LLMs
recovered a higher number of entities, where LLama3 in the 70b configuration reached a recall score
of 60.3% exceeding the state-of-the-art ReLiK by about 15%. Given the high recall of entities made by
LLMs, we hypothesize that they could serve as entity retrievers or augment the retrieval of existing EL
retrievers and we believe that this aspect may provide insights for future studies and investigations from
the scientific community. Regardless of satisfactory recall results, the numbers dropped in the precision
instance, this is because the LLMs always tended to generate some text and find fictional entities that
were not annotated in the dataset, raising the number of false positives. Also, our evaluation metrics
are based on exact matching between the predicted Wikipedia label and the real one, so even a single
incorrectly generated character causes the prediction to be considered incorrect. Overall, It is worth
noting how the obtained results highlight the potential of LLMs in EL, as even though they are not
specifically trained on the EL task, they achieved competitive results with respect to the state of the art,
in an out-of-domain comparison. They even exceeded it when it comes to long-tail entity retrieval.</p>
      <p>Although the MHERCL benchmark is a historical, domain-specific dataset with a high number of
niche and less popular entities, we conducted an additional analysis to highlight better the models’
performance when varying the entities’ popularity. As a measure of popularity, we leveraged the
number of Wikidata triples associated with each entity, as also done in [20].</p>
      <p>(a)
(b)</p>
      <p>The plot depicted in Figure 1a, reports the variation of the F1 score in EL at the variation of a threshold
 , which has the role of diferentiating real and predicted entities based on their notoriety defined by
the number of Wikidata triples being associated to them. For example, a threshold  of 20, takes into
account all entities that have at most 20 triples associated with them. The plot demonstrates that higher
entity frequency thresholds  generally lead to better EL performance for all models, likely because
higher thresholds focus on more frequent, well-represented entities that are easier to disambiguate and
link correctly. LLama3-70 achieves the highest f1 on par with ReLiK in linking very rare entities, having
a threshold  = 20 . The plot in Figure 1b instead, highlights the recall fluctuation. GPT-3.5 and Llama
3-70b models perform better overall, with increasing recall scores as the threshold increases. ReLiK,
despite being a specialized tool stays below the two largest LLMs and performs poorly with infrequent
entities, obtaining the lowest recall score in the case of  = 20 . Llama 3-8b instead, demonstrates
consistently lower recall and does not show significant gains with increasing entity frequency. The
conducted analyses clearly show that the entity linking of long-tail entities is still an open challenge,
as one of the most performant state-of-the-art tools was only able to retrieve ≈ 15% of the annotated
entities when they were less known and possessed a low frequency index. On the other hand, LLMs,
at least in the larger configurations retrieved a higher number of entities, but the numbers remained
unsatisfactory with a recall below 30% and an F1 of ≈ 19% referring to the less popular entities.
Qualitative evaluation. For the sake of comprehensiveness and further interpretation of the results,
we conducted a brief qualitative analysis. Upon closer examination, we observed that a small number
of entities were not correctly disambiguated by the models, due to spelling errors introduced by the
OCR on the original documents. Noisy text elements are common when working with digitalized
texts, especially digitalized historical documents. While human annotators were able to easily detect
these mistakes and accurately identify the correct entities, many models struggled to move beyond the
surface-level text. Specifically, when limited semantic context was available around the entity, both the
baseline models and the LLMs struggled to accurately perform EL. For instance, given the sentence
'Mr. Mocre is the adaptor of words to this composition, which is a tirana,
arranged by Mr. Bishop.'
none of the models were able to associate the form Mocre with Thomas Moore (Q315346). On the
other hand, when provided with suficient contextual information about the entities, LLMs were more
likely to identify the correct entity even in the presence of lexical errors. For example, in the sentence
'One man may lived, who ean read the heart, and whose power was not: based upon,
his own experience but if so, we may well call William Shakspeare superhuman,
THenee it was that whiffe i m Rossint’s ‘Barber of Seville,’ ar Cimarosa’s
‘Seeret Marriage’
despite the inherent dificulty due to the OCR mistakes, both GPT and LLama80b correctly associated
’Rossint’ with the composer Gioachino Rossini and ’Seeret Marriage’ with Domenico Cimarosa’s work
The Secret Marriage.</p>
      <p>Nevertheless, when it comes to sentences that include less popular entities, even with appropriate
context, LLMs may struggle to properly disambiguate the involved entities. For example, when
encountering the phrase ’Teatro Santo Augustino in Genoa’, which should be linked to Teatro Sant’Agostino
(Q19060499), both GPT 3.5 Turbo and Llama-70B incorrectly linked the entity to the more renowned
Teatro Carlo Felice in Genoa.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and future works</title>
      <p>In conclusion, this study highlights the potential of large language models (LLMs), such as GPT and
Llama, for improving entity linking, particularly in challenging long-tail scenarios. While
state-ofthe-art systems like ReLiK perform well on frequent entities, LLMs show a significant advantage in
identifying and linking less common, domain-specific entities, as evidenced by their higher recall scores.
Despite lower precision due to occasional over-generation of entities, LLMs demonstrate the potential
to recover more long-tail entities compared to ReLiK. This suggests that LLMs can serve as valuable
tools in bridging the gap between frequent and infrequent entities in historical and domain-specific
contexts. Furthermore, this study represents an early, exploratory efort to understand the eficacy of
LLMs in the long-tail entity linking (EL) scenario, wherein we employed and tested relatively simple,
vanilla prompt-based approaches. Thus suggesting that LLMs, even with their base unmodified form,
possess inherent advantages over traditional entity-linking systems. However, there remains significant
potential for further refinement. More emphasis should be placed on optimizing the balance between
recall and precision. While recall is an important metric, especially for long-tail entities, precision
must not be overlooked. Thus, future work should focus on developing more sophisticated prompting
strategies or hybrid systems. Possible investigations include In-Context Learning (ICL) techniques, to
better tailor LLMs to the task of entity linking or Knowledge Injection, to augment the LLMs’ knowledge
and their contextual understanding. Such methods could potentially mitigate the over-generation issue,
while enhancing their accuracy in identifying and linking entities in more narrow contexts.
[12] N. De Cao, G. Izacard, S. Riedel, F. Petroni, Autoregressive entity retrieval, arXiv preprint
arXiv:2010.00904 (2020).
[13] M. P. K. Ravi, K. Singh, I. O. Mulang, S. Shekarpour, J. Hofart, J. Lehmann, Cholan: A modular
approach for neural entity linking on wikipedia and wikidata, arXiv preprint arXiv:2101.09969
(2021).
[14] W. Zhang, W. Hua, K. Stratos, Entqa: Entity linking as question answering, arXiv preprint
arXiv:2110.02369 (2021).
[15] Y. Ding, Q. Zeng, T. Weninger, Chatel: Entity linking with chatbots, arXiv preprint arXiv:2402.14858
(2024).
[16] L. Wu, F. Petroni, M. Josifoski, S. Riedel, L. Zettlemoyer, Scalable zero-shot entity linking with dense
entity retrieval, in: B. Webber, T. Cohn, Y. He, Y. Liu (Eds.), Proceedings of the 2020 Conference
on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational
Linguistics, Online, 2020, pp. 6397–6407. URL: https://aclanthology.org/2020.emnlp-main.519.
doi:10.18653/v1/2020.emnlp- main.519.
[17] A. Xin, Y. Qi, Z. Yao, F. Zhu, K. Zeng, X. Bin, L. Hou, J. Li, Llmael: Large language models
are good context augmenters for entity linking, 2024. URL: https://arxiv.org/abs/2407.04020.
arXiv:2407.04020.
[18] Llama Team, AI @ Meta, The llama 3 herd of models, 2024. URL: https://arxiv.org/abs/2407.21783.</p>
      <p>arXiv:2407.21783.
[19] F. Petroni, A. Piktus, A. Fan, P. Lewis, M. Yazdani, N. De Cao, J. Thorne, Y. Jernite, V. Karpukhin,
J. Maillard, et al., Kilt: a benchmark for knowledge intensive language tasks, arXiv preprint
arXiv:2009.02252 (2020).
[20] L. Chen, S. Razniewski, G. Weikum, Knowledge base completion for long-tail entities, arXiv
preprint arXiv:2306.17472 (2023).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <article-title>Pnel: Pointer network based end-to-end entity linking over knowledge graphs, in: The Semantic Web-ISWC</article-title>
          <year>2020</year>
          : 19th International Semantic Web Conference, Athens, Greece, November 2-
          <issue>6</issue>
          ,
          <year>2020</year>
          , Proceedings,
          <source>Part I 19</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Boros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Pontes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Cabrera-Diego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hamdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sidère</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doucet</surname>
          </string-name>
          ,
          <article-title>Robust named entity recognition and linking on historical multilingual documents</article-title>
          , in: Conference and
          <article-title>Labs of the Evaluation Forum (CLEF</article-title>
          <year>2020</year>
          ), volume
          <volume>2696</volume>
          ,
          <string-name>
            <surname>CEUR-WS Working</surname>
            <given-names>Notes</given-names>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sakor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patel</surname>
          </string-name>
          , M.-E. Vidal,
          <article-title>Falcon 2.0: An entity and relation linking tool over wikidata</article-title>
          ,
          <source>in: Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>3141</fpage>
          -
          <lpage>3148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Klang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nugues</surname>
          </string-name>
          ,
          <article-title>Hedwig: A named entity linker</article-title>
          ,
          <source>in: Proceedings of the Twelfth Language Resources and Evaluation Conference</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>4501</fpage>
          -
          <lpage>4508</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ilievski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vossen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schlobach</surname>
          </string-name>
          ,
          <article-title>Systematic study of long tail phenomena in entity linking</article-title>
          , in: E. M.
          <string-name>
            <surname>Bender</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Derczynski</surname>
          </string-name>
          , P. Isabelle (Eds.),
          <source>Proceedings of the 27th International Conference on Computational Linguistics</source>
          , Association for Computational Linguistics, Santa Fe, New Mexico, USA,
          <year>2018</year>
          , pp.
          <fpage>664</fpage>
          -
          <lpage>674</lpage>
          . URL: https://aclanthology.org/C18-1056.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Zhang,</surname>
          </string-name>
          <article-title>Evaluating chatgpt's information extraction capabilities: An assessment of performance, explainability, calibration</article-title>
          , and faithfulness [arxiv:
          <fpage>2304</fpage>
          .11633 [cs]],
          <source>arXiv preprint arXiv:2304.11633</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Graciotti</surname>
          </string-name>
          ,
          <article-title>Knowledge extraction from multilingual and historical texts for advanced question answering</article-title>
          , in: C.
          <string-name>
            <surname>d'Amato</surname>
            ,
            <given-names>J. Z.</given-names>
          </string-name>
          <string-name>
            <surname>Pan</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Doctoral Consortium at ISWC</source>
          <year>2023</year>
          co
          <article-title>-located with 22nd International Semantic Web Conference (ISWC</article-title>
          <year>2023</year>
          ), Athens, Greece, November 7,
          <year>2023</year>
          , volume
          <volume>3678</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Orlando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-L.</given-names>
            <surname>Huguet-Cabot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Barba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <article-title>Relik: Retrieve and link, fast and accurate entity linking and relation extraction on an academic budget</article-title>
          ,
          <source>arXiv preprint arXiv:2408.00103</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ichise</surname>
          </string-name>
          ,
          <article-title>Heuristic-based configuration learning for linked data instance matching</article-title>
          ,
          <source>in: Semantic Technology: 5th Joint International Conference, JIST</source>
          <year>2015</year>
          , Yichang, China,
          <source>November 11-13</source>
          ,
          <year>2015</year>
          ,
          <source>Revised Selected Papers 5</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>56</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Collective entity linking based on dbpedia</article-title>
          ,
          <source>in: Knowledge Graph and Semantic Computing. Language</source>
          , Knowledge, and Intelligence: Second China Conference,
          <string-name>
            <surname>CCKS</surname>
          </string-name>
          <year>2017</year>
          , Chengdu, China,
          <source>August 26-29</source>
          ,
          <year>2017</year>
          ,
          <source>Revised Selected Papers 2</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García-Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <article-title>Dbpedia spotlight: shedding light on the web of documents</article-title>
          ,
          <source>in: Proceedings of the 7th international conference on semantic systems</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>