<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Combining Automatic Annotation with Human Validation for the Semantic Enrichment of Cultural Heritage Metadata</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eirini Kaldeli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexandros Chortaras</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vassilis Lyberatos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jason Liartis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Spyridon Kantarelis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgos Stamou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AI and Learning Systems Lab, School of Electrical and Computer Engineering, National Technical University of Athens</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <fpage>353</fpage>
      <lpage>368</lpage>
      <abstract>
        <p>The addition of controlled terms from linked open datasets and vocabularies to metadata can increase the discoverability and accessibility of digital collections. However, the task of semantic enrichment requires a lot of efort and resources that cultural heritage organizations often lack. State-of-the-art AI technologies can be employed to analyse textual metadata and match it with external semantic resources. Depending on the data characteristics and the objective of the enrichment, diferent approaches may need to be combined to achieve high-quality results. What is more, human inspection and validation of the automatic annotations should be an integral part of the overall enrichment methodology. In the current paper, we present a methodology and supporting digital platform, which combines a suite of automatic annotation tools with human validation for the enrichment of cultural heritage metadata within the European data space for cultural heritage. The methodology and platform have been applied and evaluated on a set of datasets on crafts heritage, leading to the publication of more than 133K enriched records to the Europeana platform. A statistical analysis of the achieved results is performed, which allows us to draw some interesting insights as to the appropriateness of annotation approaches in diferent contexts. The process also led to the creation of an openly available annotated dataset, which can be useful for the in-domain adaptation of ML-based enrichment tools.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;semantic enrichment</kwd>
        <kwd>cultural heritage metadata</kwd>
        <kwd>named entity recognition and disambiguation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Semantic enrichment is the process of adding new semantics to unstructured data, such as free
text, so that machines can make sense of it and build connections to it. In the case of the
metadata that describes Cultural Heritage (CH) items, unstructured data comes in the form of free
text that details several aspects of the item, for example its main characteristics, its location,
creator, etc. Through the process of semantic enrichment, those textual descriptions are
analyzed and augmented with controlled terms from Linked Open datasets, such as Wikidata1
and Geonames2, or controlled vocabularies, such as the Getty Art &amp; Architecture Thesaurus3
(AAT). Those terms represent concepts and attributes (e.g. “costume”, “Renaissance”, colors),
named entities, such as persons, locations, and organisations, or chronological periods. For
example, the strings “Leonardo da Vinci” and “da Vinci, Leonardo” can be both linked to the
Wikidata term representing the Italian Renaissance polymath. This additional piece of
information associated with a CH resource is commonly referred to as an annotation, which links
the CH object with some URI (Unique Reference Identifier) derived from vocabularies or open
data sources.</p>
      <p>
        Semantic enrichment adds meaning and context to digital collections and makes them more
easily discoverable. Given its importance, it has been a main concern and focus of eforts by
the Europeana digital library4 as well as individual data aggregators and providers. Firstly,
linked data makes the meaning of textual metadata unambiguous [
        <xref ref-type="bibr" rid="ref24">25</xref>
        ]. For example, the string
“Leonardo da Vinci” may refer, depending on the context, to the Italian Renaissance polymath
or the homonymous airport in Fiumicino, Italy, or a battleship with the same name. By linking
the text with the correct URI, it becomes clear what the text refers to. Secondly, linked data
allows us to retrieve additional information about a certain entity in an automated way, build
connections between diferent resources and contextualize them [ 9]. For example, it allows
us to link items tagged with the term “ring” with the broader concept of “jewelry” and, thus,
interconnect them with items enriched with the term “bracelet”, which is also an instance of
”jewelry ”. Moreover, linked data usually comes with translated labels, thus improving the
capabilities for multilingual search [
        <xref ref-type="bibr" rid="ref11 ref9">10, 12</xref>
        ]:
      </p>
      <p>Semantic enrichment is a labour-intensive process, which requires efort and resources that
CH institutions often lack. State-of-the-art AI technologies can be employed to automate the
time-consuming and often mundane process of manual metadata enrichment. Natural
language processing (NLP) tools can be used to analyse textual metadata and detect and classify
concepts or named entities mentioned in unstructured text. Machine Learning (ML) approaches
are extensively used for the task of disambiguation, which is responsible for deciding if the
reference to ‘Leonardo da Vinci’ in the text refers to the Italian polymath or to the battleship.
However, the accuracy of the automatic results highly hinges on the specific task at hand
visa-vis the algorithm applied. For example, short textual descriptions, which are common in
CH metadata, lack context and thus ML algorithms trained on Wikipedia articles may result
in many incorrect matches. For similar reasons, they may often miss domain-specific matches
that are relevant in the specific CH context. What’s more, even if the automatically detected
links are correct, they may be considered undesirable for a certain case study. For example,
linking metadata records with terms representing colours may be important for a fashion
collection, but it may be undesirable for describing a manuscript that happens to mention a certain
colour.</p>
      <p>
        As a result, depending on a number of factors, such as the text characteristics (e.g. its length
and language), the vocabulary that we wish to link it to, and the type of entities to detect (e.g. do
we wish to identify a broad variety of concepts or to limit ourselves to certain domain-specific
terms?), a diferent combination of tools and steps is required to achieve the best possible
results for each specific task. For example, for certain tasks with a well-defined restricted context,
a simple lemmatisation and string matching approach may be more appropriate than complex
ML-based algorithms. Besides the need for flexibility in combining and experimenting with
different approaches and tools, another crucial aspect that needs to be considered is the need to
make human inspection and validation an integral part of the end-to-end semantic enrichment
workflow [
        <xref ref-type="bibr" rid="ref12">13</xref>
        ]. Given that manual validation is a resource-consuming task, practically,
evaluation focuses on an appropriately selected sample of all the automatic annotations, depending
on the collected feedback and the objective, appropriate filtering criteria are applied.
      </p>
      <p>To address the aforementioned challenges, in this paper, we define, implement, and test a
methodology and associated digital platform, called SAGE5, which combines automatic
annotation tools with human validation for the enrichment of CH items at scale. SAGE is an open
source tool6 that streamlines and facilitates the whole workflow of semantic enrichment, from
data import and the automatic production of semantic annotations to human validation and
data publication. The platform has been configured to serve the needs of the cultural sector
and supports seamless interoperability with the common European data space for CH7 and in
particular with Europeana.</p>
      <p>The methodology and platform have been applied to enrich the metadata records from
datasets on various aspects of crafts heritage (from furniture to jewelry and costumes to clocks)
coming from 8 diferent CH organisations, including the Fashion Museum Antwerp, the
Netherlands Institute for Sound and Vision, the Open University of the Netherlands, the Greek
National Documentation Centre, the Museum of Arts and Crafts in Zagreb, the Palais Galliera and
Mobilier National in Paris, and the Textile Museum of Prato. The rest of the paper is structured
as follows. After discussing related work, we present the steps of the methodology to semantic
enrichment that we followed along with the technical architecture and the supporting SAGE
platform, the evaluation performed and the results achieved. Finally, we conclude the paper
with some general lessons learned.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        State-of-the-art Natural Language Processing and Machine Learning technologies have been
extensively used in the CH domain to analyze unstructured text and extract structured
information from it. To achieve automated subject indexing, Annif [22] is an open-source multilingual
toolkit by the National Library of Finland that automatically assigns documents with subjects
from a controlled vocabulary. In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a topic detection approach is applied to group historical
documents into thematic collections. Additionally, the HerCulB system [
        <xref ref-type="bibr" rid="ref22">23</xref>
        ] has been
developed to automatically annotate the Balkans’ intangible CH. Other approaches propose the use
of semi-automatic tools to assist humans in the task of manual annotation by identifying
alignments between vocabularies, such as CultuurLINK [16].
      </p>
      <p>
        Among information retrieval approaches, there have been several attempts to apply Named
Entity Recognition (NER) as well as Disambiguation (NED) in the CH and digital humanities
sectors, considering diferent types of data. In [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ], NERD is applied to enrich metadata for the
5https://pro.europeana.eu/post/close-encounters-with-ai-an-interview-on-automatic-semantic-enrichment
6Source code: https://github.com/ails-lab/sage-backend and https://github.com/ails-lab/sage-frontend
Documentation: https://ails-lab.github.io/SAGE_Documentation/ and https://www.youtube.com/playlist?list=PL
Zhh656xkjIsxMKShH7aV7aR8TAwmU508
7https://dataspace-culturalheritage.eu/
exhibits of the Smithsonian Cooper–Hewitt National Design Museum in New York. In [8], an
overview of NER approaches applied to historical documents is provided. An entity matching
approach that works at the level of structured knowledge graphs, aiming to identify duplicate
entities in data sources containing historical data is presented in [
        <xref ref-type="bibr" rid="ref3">2</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref4">3</xref>
        ], the authors
conduct a comparative study of diferent NERD tools on digital archive collections in order to
link Engish textual metadata to Wikidata entities. In their study, the multilingual NERD tool
mGENRE [6], which we employ in the current study, outperforms other approaches including
BLINK [
        <xref ref-type="bibr" rid="ref23">24</xref>
        ] and EDGEL [
        <xref ref-type="bibr" rid="ref13">14</xref>
        ]. The need to deal with multilingual text is another important
concern in the CH domain, e.g. named entity recommendation has been explored as a means
to enhance multilingual retrieval on Europeana [
        <xref ref-type="bibr" rid="ref9">10</xref>
        ]. In this respect, the multilingual
autoregressive entity linking approach employed by mGENRE is another advantage of the particular
tool.
      </p>
      <p>
        It should also be noted that NERD tools are trained on generic corpora [
        <xref ref-type="bibr" rid="ref23">6, 24</xref>
        ], that have
limited overlap with CH-related textual metadata [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ]. Adapting these tools to new domains
by fine tuning them requires large amounts of well-annotated data, with labels that need to
be generated or validated by domain experts, as well as large computational power, time and
funds. These challenges are extensively discussed in [21] for the domain of Digital
Humanities. Although domain adaptation of ML models is beyond the scope of the current paper, the
methodology we advocate can lead to the production of high-quality ground truth data with
reduced costs: validators are provided with datasets that have been already automatically
annotated, an approach that highly facilitates their manual task, which becomes more focused
and less cumbersome. This process allows us to make openly available a selection of
appropriately processed annotated metadata from the CH domain (see Section 4), thus contributing to
increasing the availability of annotated metadata that can be used for the in-domain tuning of
NERD tools.
      </p>
      <p>
        As the uptake of AI tools is expanding, there is increasing need for validation and
moderation by humans to overcome the errors of the machine and achieve higher quality results [
        <xref ref-type="bibr" rid="ref21">20</xref>
        ].
Crowdsourcing methods and tools have been employed by CH organisations in this respect [
        <xref ref-type="bibr" rid="ref12">13</xref>
        ]
as a means to mobilise human participants in the evaluation and correction of AI algorithm
outcomes, also leading to the preparation of ground-truth data [
        <xref ref-type="bibr" rid="ref11 ref14">15, 12</xref>
        ]. For tasks that require
specialised expertise, in [
        <xref ref-type="bibr" rid="ref6">7</xref>
        ] a niche-sourcing methodology and tool for the annotation of CH
metadata is proposed, which, similar to our approach, uses an RDF triple store to store the
results. However, as opposed to the the current work, the methodology relies solely on manual
selections by experts with no use of automatic annotation tools.
      </p>
      <p>Overall, our work distinguishes itself from previous work on semantic enrichment mainly
in that it is based on a generic data management approach, which allows the combination of
various annotation tools with flexible parameterisation capabilities (such as the definition of
string matching and filtering rules); in that it includes human validation as an integral part
of its workflow; and that it supports integrations with other CH-specific data representations
and platforms, making it readily reusable in the CH data space. It should be noted that the
integration with external annotation tools and CH-related platforms is loosely coupled, via
interactions with the APIs (Application Programming Interface) and SPARQL endpoints exposed
by the third-party components.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology and Technical Architecture</title>
      <p>The methodology we followed for the semantic enrichment of CH metadata consists of the
following high-level steps:
1. A: Data aggregation and requirements analysis</p>
      <p>The first step concerns the preparatory tasks of aggregating the data and specifying the
requirements for the the enrichment (e.g. which metadata fields to analyse, which
vocabularies to link to etc).
2. B: Automatic metadata enrichment</p>
      <p>The second step involves the automatic analysis of the textual metadata, with the aim to
derive useful annotations in line with the identified requirements.
3. C: Human validation</p>
      <p>Humans are solicited to review and validate the automatically generated annotations as
well as to manually add new annotations, that the automatic algorithm has not been able
to detect.
4. D: Filtering and data publication</p>
      <p>The outcomes of the human validation are analysed to establish appropriate thresholds
for filtering and the filtered annotations are embedded as enrichments to the metadata
records. The enriched metadata records are ultimately published to the Europeana
platform.</p>
      <p>Figure 1 provides an overview of the main digital components that support the above
methodology.</p>
      <p>Publish enriched</p>
      <p>datasets</p>
      <p>New
records</p>
      <p>Data Aggregation,
Mapping, and</p>
      <p>Publication</p>
      <p>MINT is a metadata management tool8 that is part of the data space for CH and is used by
several aggregators to prepare and publish their data to Europeana. It acts as the link between</p>
      <p>Source
records</p>
      <p>EDM
records
Filtered
annotations</p>
      <p>Automatic
annotations</p>
      <p>Feedback
Semantic
analysis and
enrichment</p>
      <p>Human
Validation</p>
      <p>Vocabularies</p>
      <p>Knowledge</p>
      <p>
        Bases API
Human intelligence
SAGE and Europeana and supports steps A and D of the aforementioned methodology, serving
the following purposes: (i) aggregate the metadata records from the data providers and for
mapping them to the Europeana Data Model (EDM) [
        <xref ref-type="bibr" rid="ref5">4</xref>
        ] that is then passed to SAGE; and (ii)
embed the annotations produced by SAGE, after filtering in light of the human feedback, into
the original metadata records in line with the expected EDM extension that accommodates
for enrichments9 and ultimately publishing the results to Europeana. It should be noted that
data already published on the Europeana platform can also be sourced directly by SAGE for
annotation, via a direct interconnection with the Europeana search API10.
      </p>
      <sec id="sec-3-1">
        <title>3.1. The SAGE tool for automatic enrichment and validation</title>
        <p>SAGE is a web-based platform for generating, enriching, validating, publishing, and searching
RDF data. In the context of our methodology, it is responsible for the core steps B and C.
The RDF data can be produced from heterogeneous data sources and data formats using the
D2RML mapping language [5], and enriched using annotators that wrap web-based or other
third party services. The enrichments can then be manually validated, and finally, the entire
data can be published in an RDF store and indexed. The SAGE platform has been configured to
facilitate the semantic enrichment of CH metadata. In this respect, it ofers a suite of already
set-up annotators, i.e. parameterisable enrichment templates, that are connected with relevant
in-domain vocabularies and knowledge bases. It also facilitates the direct import/publication
of metadata from/to platforms of the European data space for CH, including Europeana and
MINT, making use of established APIs and formats .</p>
        <p>A dataset is annotated per property, i.e. the user can select from the schema preview a
property that links entities to values, and execute an annotator on the values of that property. An
annotator in SAGE is a mediator that retrieves all desired values from the triple store where
the dataset content is published, generates the appropriate calls to the web or other service,
and transforms the results to the RDF annotation specification. As in the case of datasets, the
results of an annotator execution are Terse RDF Triple Language11 files stored in the file system
of SAGE. In the framework of the data space of CH, annotations are also expressed in a
JSONLD equivalent representation model12, which bases on the W3C’s Web Annotation Model13
supported by Europeana. The annotation model is generic enough to accommodate for
various enrichment types (e.g. annotations resulting from automatic translation tools, from image
analysis etc) and provides sufÏcient provenance information, including information about the
annotations’ confidence scores and the validation feedback provided by humans. For
metadata records that are compatible with EDM, the annotations are ultimately embedded in the
metadata in line with the EDM extension that instructs the representation of metadata
statements resulting from semantic enrichment14. This way, the enrichments can be appropriately
9https://pro.europeana.eu/files/Europeana_Professional/Share_your_data/Technical_requirements/EDM_profiles/
EDM_provenance_profile_external_202111.pdf
10https://www.europeana.eu/en/apis
11https://www.w3.org/TR/turtle/
12https://docs.google.com/document/d/1Cq1Qqx0ji7Vw8iwLVis1CfpYKtv-72ojkcvjnQzrKjs/edit?usp=sharing
13https://www.w3.org/TR/annotation-model/
14https://pro.europeana.eu/files/Europeana_Professional/Share_your_data/Technical_requirements/EDM_profile
s/EDM_provenance_profile_external_202111.pdf
handled and presented to the end-user by the Europeana platform.</p>
        <p>SAGE supports three main types of annotators, which can be parameterised with respect to
diferent aspects (e.g. vocabulary, language, preprocessing functions etc) to serve diferent case
studies:
• Thesaurus annotators: They link texts to URIs from thesauri that can be imported to the
platform by performing smart string matching on the thesaurus labels using
lemmatizers (such as the ones provided by the Stanza library15) and other functions to produce
improved results (e.g. apply dedicated Regex rules). They are appropriate for
application both on generic textual fields and on focused short fields. By selecting thesauri that
represent concepts referring to specific domains (e.g. fashion), it is more likely that the
extracted terms are relevant to the object in question. Moreover, such annotators can
perform massive enrichments in a very short time compared to the other annotators,
since they rely on locally stored data. Figure 3 provides an overview of how a Thesaurus
Annotator works on a specific example.
• Generic NERD annotators: They employ pre-trained NERD tools to detect named
entities and link them to respective entities from Wikidata. SAGE supports two diferent
pipelines for generic NERD. The first pipeline makes use of the AIDA tool [ 18] for entity
detection and disambiguation. The second pipeline makes use of the spaCy library16 for
performing the NER part for diferent languages, i.e. for recognising entities and their
string boundaries within a sentence, and then of the multilingual mGENRE model [6]
for the disambiguation stage and for linking with a URI from Wikidata. Such
annotators can be used as they are, with minimal or no configurations and are appropriate for
general-purpose enrichments. They conduct disambiguation by using the context
contained in longer texts (e.g. description), since they are trained on textual corpora such as
Wikipedia articles. At the same time, this process is more likely than the other
annotators to link with terms that are too generic or irrelevant in the context of a specific case
study, while it is hard to infer with sufÏcient accuracy the type of the extracted entity
and its relation to the object in question (e.g. whether it represents the item’s creator, a
place of display etc). As a result, in practice, they often produce more accurate results
when applied in fields with pre-specified focused semantics.
• SPARQL Annotators: SPARQL annotators communicate with external knowledge bases
(such as Wikidata and Geonames) through SPARQL endpoints. Thus, they are the best
ift when dealing with large knowledge bases that cannot be downloaded locally.They
can be applied on focused fields that refer to a single entity. The values of such fields
often follow certain patterns (e.g. “surname, name”, “city/region/country” etc) and, thus,
pre-processing with Regex is key to the success of the method, so that a normal form of
the entity name can be extracted. An example of a query that matches Wikidata entities
with the occupation of a painter is presented in Figure 2.
15https://stanfordnlp.github.io/stanza/
16https://spacy.io/</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Human Validation</title>
        <p>Human validation was conducted via a dedicated environment provided by SAGE (see
Figure 4). Humans are invited to inspect the automatic annotations produced by the AI tools and
accept or reject them. Moreover, they can add missed annotations, i.e. relevant annotations
that the automatic algorithm failed to identify. During the validation of the results of the
semantic analysis, validators are also able to edit the predefined target metadata field in which
the URI will end up. It should be noted that SAGE groups together annotations repeated across
many records in a dataset and flags annotations referring to URIs that are already included in
the metadata. In total, 14 CH professionals with specialized knowledge about the considered
collections participated in the validation process, with two to three validators per collection.
Participants were instructed to accept or reject annotations based on what they consider as
desirable for inclusion in the final metadata. That is, they evaluated not only whether an
annotation is a correct match but also in terms of relevance (e.g. matches with the term ”human”
may be considered too generic) .</p>
        <p>The appropriate size and characteristics of the sample to be validated depend on the
available resources that can be invested in the validation process and the nature of the use case.
What is considered a ”sufÏcient” amount hinges on many factors, including the total number
of automatically produced annotations, their characteristics (e.g. what metadata fields they
refer to, their granularity, etc), the characteristics of the automated algorithm that produced
them (e.g. its accuracy, the reliability of the automatic confidence scores it assigned to them),
the number of participants and the amount of time they can devote to the task. The following
criteria were used to guide the selection of the annotations sample to be validated, so as to
ensure representativeness across various parameters:
• Inspect annotations that appear in a high number of records and thus will have a high
impact.
• Ensure a balanced representation of metadata fields, including fields with varying
semantics and expected text length.
• Take into consideration automatic confidence levels assigned by automatic algorithms,
if available: inspect annotations with a rather low confidence score but also a sufÏcient
number of annotations with a rather high one.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Analysis and Filtering of Annotations</title>
        <p>Validation feedback was analysed with the aim of establishing thresholds for annotations that
are considered acceptable for publication. To this end, the following metrics have been
calculated per dataset, per Annotator, and per analysed metadata field:
• Precision considering only unique annotations, that is, unique triples of field textual
values, matched sub-string, and identified URI.
• Precision considering all annotations, that is, without grouping together identical field
textual values (in other words, counting all times the same annotation, defined as a triple,
may appear in diferent items).</p>
        <p>In both cases, precision was calculated as   /(  +   ) , where   =   
and   =    . Precision was used as a threshold for filtering out not-reviewed
annotations on a field or Annotator basis. What is considered a sufÏciently high precision
depends on the requirements of each case study and the expectations of the data provider.</p>
        <p>For the use cases we considered, most human experts did not focus on the manual insertion
of new annotations and the few manually added annotations we collected do not allow us to
sufÏciently estimate false negatives and thus compute recall and the F-score. It should also be
noted that for publication to Europeana, a threshold based on precision is considered the most
appropriate metric to be used17.</p>
        <p>
          Human judgments can also be used as a means to assess the trustworthiness of the automatic
confidence scores assigned by the AI algorithms. For example, if humans tend to accept all
sample annotations above a certain score, then we may conclude that all annotations above that
score can be regarded as acceptable. In this vein, we explored whether there is a correlation
between the automatic confidence scores, when available, and human judgments. We therefore
plotted the logistic regression between the two variables considering the following metrics:
• The  −   [
          <xref ref-type="bibr" rid="ref20">19</xref>
          ]. A value greater than 0.05 means that no statistically significant
relationship between the automatic scores and the human judgments was observed.
• The expected automatic score for which the predicted probability is greater than 0.7, that
is annotations above this score have a probability above 0.7 to be accepted by humans,
based on the sample data.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results on Crafts Heritage Datasets</title>
      <p>The aforementioned methodology and supporting tools have been applied to metadata records
describing crafts heritage items as mentioned in Section 1. The analysed metadata comes in the
following languages: Dutch, Italian, French, Greek, English and Croatian. In total, the SAGE
annotators were applied on 216, 115 metadata records, giving rise to 915, 472 total annotations
and 549, 402 unique annotations. It should be noted that numbers calculated based on unique
annotations are considered more reliable since numbers that count in item impact are skewed
towards textual values that are repeated in multiple items. In total, 12 experts from 8 CH
organisations took part in the validation campaigns. Overall, 30, 910 unique annotations referring to
17https://pro.europeana.eu/post/methodology-for-validating-enrichments
more than 15K records were reviewed via SAGE (i.e. 5.6% of the automatically produced unique
annotations), with the sample being selected following the criteria outlined in Section 3.2. Of
those annotations, 23, 426 were accepted and 7, 474 were rejected.</p>
      <p>The overall precision, defined as the number of all accepted automatic annotations produced
by SAGE over the number of reviewed automatic annotations, is 0.76, considering unique
annotations. If all annotations are counted in, then the overall precision is 0.82. Precision varied
largely depending on the analysed metadata field, the type of annotator that was used, and the
datasets that were analysed. Table 1 provides an overview of the results achieved by diferent
annotators. The minimum and maximum precision reported in the table refer to a per metadata
ifeld level.</p>
      <p>The choice of the vocabulary used by the thesaurus annotators depended on the respective
dataset characteristics and providers’ objectives. The following vocabularies were used: the
Europeana fashion thesaurus18; AAT; the EUScreen vocabulary on audiovisual heritage19; and
a SKOS vocabulary on Greek crafts heritage 20. Thesaurus annotators were applied to both
longer (e.g. dc:description, dc:title) and shorter fields (e.g. dc:format, dc:type), often
after case-appropriate regex pre-processing, giving rise to generally satisfactory results in both
cases. SPARQL queries on Wikidata were used to retrieve creators for the dc:creator and
locations for the dc:spatial fields. Although in most cases it did not produce a high number of
annotations, it scored a high precision. mGENRE and AIDA were applied to dc:description and
dc:title fields as well as shorter fields (including dc:creator, dc:spatial, and dc:rights).
They both produced similar results, performing well for short fields but poorly for longer ones.
In the latter case, they both struggled with disambiguation between multiple candidate entities
and, even when producing matches that were in principle correct, those were often too generic
and considered irrelevant by validators.</p>
      <p>
        For annotators for which an automatic score was produced, we also attempted to plot the
logistic regression between the automatic score and the human judgments. However, no
correlation was found between the two variables and therefore automatic scores were not used
as factors in the filtering rules. A possible explanation for this is that in the case of thesauri
annotators, scores are usually quite high for all annotations: they reflect the string diference
(1-Levenshtein distance [
        <xref ref-type="bibr" rid="ref15">17</xref>
        ]) between words endings (since the matching is based on the
lemmatised versions of the textual metadata and the thesaurus terms). For the generic NERD tools
the scores turn to be quite unreliable: they are inversely proportional to the number of
candidate URIs and do not sufÏciently account for disambiguation.
      </p>
      <p>Annotations have been filtered by discarding all annotations rejected by humans, while
including all explicitly accepted ones. (considering a majority vote). For non-reviewed
annotations, a threshold based on precision between 0.75 and 0.8 (considering unique annotations)
was considered acceptable by data providers. In total, 549.460 have been regarded acceptable,
leading to the enrichment of 133.405 out of 216.115 analysed records. All enriched records
have been published to Europeana. Enrichments have been indexed to become searchable and
are visible as part of the item view via distinct tags, thus contributing to making the
respective items more discoverable, contextual, and multilingual. Figure 5 shows an example of how
automatic annotations look like on the Europeana platform.</p>
      <p>
        Although domain adaptation is beyond the scope of the current case study, the dataset that
resulted from the validation process can be valuable for the training and fine-tuning of NERD
tools in the field of CH. To this end, a curated selection of annotated metadata enriched and
validated via SAGE has been made openly available21 under a CC0 license, so that it can be
freely reused as data amenable for computational purposes. The dataset includes more than
10K unique annotations (pairs of analysed textual values and URIs). The in-domain adaptation
of NERD tools so that they can more efectively deal with the particular characteristics of CH
metadata [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ], such as short text and specialised terminology, remains part of future work.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In the current paper, we present a generic and reusable methodology and supporting digital
platform that combines automatic annotators with human expertise in order to enrich them
with terms from various linked data sources. The methodology has been applied and evaluated
on a case study involving crafts heritage datasets, leading to measurable improvements in the
quality of metadata and enhancing the discoverability and usability of the respective resources
on Europeana. Building on the practical experience we gained, the current case study allows
us to draw some lessons learned, which can prove useful for interested stakeholders who may
wish to follow a similar process to enrich their datasets.</p>
      <p>Before proceeding to the actual enrichment, it is crucial to scrutinise the data to be analysed,
gain a deep understanding of its characteristics and define feasible and meaningful enrichment
objectives. One should define the expected benefit of possible enrichments and how they will
bring value to the collection. In this respect, one should ask questions such as: What kind of
concepts are useful to detect (e.g. persons, locations, domain-specific concepts etc)? Which
metadata fields contain relevant information (e.g. descriptions make frequent references to
techniques and materials used)? In what languages are the metadata? It should also be noted
that the quality of the original metadata afects the quality of the automatic enrichment. If
the text contains many typos or is misaligned with the intended semantics of the respective
metadata field, then the outputs of the automatic enrichment tools will be less accurate. This
step is also crucial for detecting patterns in data that can be exploited in order to produce
annotations.</p>
      <p>The next step involves the selection and set-up of the semantic annotators that are most
appropriate for the specific use case, considering the advantages and disadvantages of each
approach as presented in Section 3.1. The selection of knowledge bases and vocabularies that
have the case-appropriate granularity and coverage is crucial. Generally, the more focused
the automatic enrichment, considering the terminology used (e.g. link with a domain-specific
vocabulary versus general-purpose NERD) and the metadata property that is parsed (e.g.
topicspecific fields such as dc:creator versus longer ones such as dc:description), the less the
risk of producing too many irrelevant or too generic enrichments and the more accurate the
resolution of disambiguation. One should opt for knowledge bases that are accessible on the Web
via an open license, well-documented, and compliant with Linked Data best practices. Their
multilingual coverage (also in relation to the language of your metadata) is also an important
aspect that should be taken into consideration.</p>
      <p>After the production of the automatic annotations, the validation process should be carefully
21See https://github.com/ails-lab/ai4culture-datasets for the actual dataset and the process that was used for the
data curation.
organised. The background of the validators is crucial: some tasks may require expert skills
(e.g., knowledge of a particular language, domain expertise etc.), while others can be performed
by appealing to a general audience. In the former case, it is wiser to keep the validation process
closed within a team of experts, while in the latter, organizing an open crowdsourcing
campaign will mobilize more people and thus speed up the process. The selection of the sample to
be validated is crucial: it does not need to be large but it should be well-balanced, following
the criteria outlined in Section 3.2.</p>
      <p>The final step involves the filtering of the automatic validation in light of the acquired
human feedback. For annotations reviewed by humans, majority vote can typically be used to
determine acceptability. Depending on the annotation type, additional criteria might be
enforced (e.g. for public validation campaigns where untrustworthy feedback is suspected, we
may require that an annotation is reviewed by multiple users). Automatic annotations that
have not been reviewed by humans or lack a reliable confidence score should be filtered using
automatic evaluation metrics. The appropriate metrics depend on the nature of the task, but
precision is a typical one when correctness is at high stake. Thresholds should be established
depending on what is considered acceptable given the specific use case requirements.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The work is co-funded by the European Union, under the projects “CRAFTED: Enrich and
promote traditional and contemporary crafts” and “AI4Culture: An AI platform for the cultural
heritage data space”. We would like to thank all partners of the CRAFTED project, and
particularly Panagiotis Tzortzis, for their valuable contributions to this work.</p>
      <p>A. Chortaras and G. Stamou. “D2RML: Integrating Heterogeneous Data and Web
Services into Custom RDF Graphs”. In: Workshop on Linked Data on the Web co-located with
The Web Conference. Vol. 2073. CEUR Workshop Proceedings. 2018.</p>
      <p>N. De Cao, L. Wu, K. Popat, M. Artetxe, N. Goyal, M. Plekhanov, L. Zettlemoyer, N.
Cancedda, S. Riedel, and F. Petroni. “Multilingual Autoregressive Entity Linking”. In:
Transactions of the Association for Computational Linguistics 10 (2022), pp. 274–290.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Andresel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gordea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stevanetic</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Schütz</surname>
          </string-name>
          . “
          <article-title>An Approach for Curating Collections of Historical Documents with the Use of Topic Detection Technologies”</article-title>
          .
          <source>In: Int. J.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>Digit. Curation 17.1</source>
          (
          <issue>2022</issue>
          ), p.
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Baas</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Dastani</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Feelders</surname>
          </string-name>
          . “
          <article-title>Entity Matching in Digital Humanities Knowledge Graphs”</article-title>
          .
          <source>In: Proc. of the Conf. on Computational Humanities Research, CHR2021</source>
          . Vol.
          <volume>2989</volume>
          . CEUR Workshop Proceedings.
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Benkhedda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Skapars</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Schlegel</surname>
          </string-name>
          , G. Nenadic, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Batista-Navarro</surname>
          </string-name>
          .
          <article-title>“Enriching the Metadata of Community-Generated Digital Content through Entity Linking: An Evaluative Comparison of State-of-the-Art Models”</article-title>
          .
          <source>In: Proc. of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage</source>
          ,
          <source>Social Sciences, Humanities and Literature. St. Julians</source>
          , Malta: Association for Computational Linguistics,
          <year>2024</year>
          , pp.
          <fpage>213</fpage>
          -
          <lpage>220</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Charles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Isaac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tzouvaras</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Hennicke</surname>
          </string-name>
          . “
          <article-title>Mapping Cross-Domain Metadata to the Europeana Data Model (EDM)”</article-title>
          .
          <source>In: Research and Advanced Technology for Digital Libraries</source>
          . Springer Berlin Heidelberg,
          <year>2013</year>
          , pp.
          <fpage>484</fpage>
          -
          <lpage>485</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dijkshoorn</surname>
          </string-name>
          , V. de Boer, L. Aroyo, and
          <string-name>
            <surname>G. Schreiber.</surname>
          </string-name>
          “Accurator:
          <article-title>Nichesourcing for Cultural Heritage”</article-title>
          .
          <source>In: Hum. Comput</source>
          .
          <volume>6</volume>
          (
          <issue>2019</issue>
          ), pp.
          <fpage>12</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Ehrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hamdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Pontes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Romanello</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Doucet</surname>
          </string-name>
          . “
          <article-title>Named Entity Recognition and Classification in Historical Documents: A Survey”</article-title>
          .
          <source>In: ACM Computing Surveys 56.2</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>N.</given-names>
            <surname>Freire</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Isaac</surname>
          </string-name>
          . “
          <article-title>Technical Usability of Wikidata's Linked Data”</article-title>
          .
          <source>In: Business Information Systems Workshops</source>
          . Ed. by
          <string-name>
            <given-names>W.</given-names>
            <surname>Abramowicz</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Corchuelo</surname>
          </string-name>
          . Springer International Publishing,
          <year>2019</year>
          , pp.
          <fpage>556</fpage>
          -
          <lpage>567</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gordea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Paramita</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Isaac</surname>
          </string-name>
          . “
          <article-title>Named Entity Recommendations to Enhance Multilingual Retrieval in Europeana.eu”</article-title>
          .
          <source>In: Foundations of Intelligent Systems</source>
          . Springer International Publishing,
          <year>2020</year>
          , pp.
          <fpage>102</fpage>
          -
          <lpage>112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hooland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wilde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Steiner</surname>
          </string-name>
          , and R. Van de Walle. “
          <article-title>Exploring Entity Recognition and Disambiguation for Cultural Heritage Collections”</article-title>
          .
          <source>In: Literary and Linguistic Computing</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kaldeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Garcıa</surname>
          </string-name>
          ́-Martıńez, A. Isaac,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Scalia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stabenau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. L.</given-names>
            <surname>Almor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Lacal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Ordóñez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Estela</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Herranz</surname>
          </string-name>
          . “Europeana Translate:
          <article-title>Providing multilingual access to digital cultural heritage”</article-title>
          .
          <source>In: Proc. of the 23rd Annual Conference of the European Association for Machine Translation, EAMT. European Association for Machine Translation</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>297</fpage>
          -
          <lpage>298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kaldeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Menis-Mastromichalakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bekiaris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ralli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tzouvaras</surname>
          </string-name>
          , and
          <string-name>
            <surname>G. Stamou. “</surname>
          </string-name>
          <article-title>CrowdHeritage: Crowdsourcing for Improving the Quality of Cultural Heritage Metadata”</article-title>
          .
          <source>In: Information 12.2</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>N.</given-names>
            <surname>Lai</surname>
          </string-name>
          . “LMN at SemEval-2022 Task 11:
          <article-title>A Transformer-based System for English Named Entity Recognition”</article-title>
          .
          <source>In: Proc. of the 16th International Workshop on Semantic Evaluation (SemEval-</source>
          <year>2022</year>
          ). Seattle, United States: Association for Computational Linguistics,
          <year>2022</year>
          , pp.
          <fpage>1438</fpage>
          -
          <lpage>1443</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Lyberatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kantarelis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kaldeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bekiaris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tzortzis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Menis - Mastromichalakis</surname>
          </string-name>
          , and
          <string-name>
            <surname>G. Stamou.</surname>
          </string-name>
          “
          <article-title>Employing Crowdsourcing for Enriching a Music Knowledge Base in Higher Education”</article-title>
          .
          <source>In: Artificial Intelligence in Education Technologies: New Development and Innovative Practices</source>
          . Springer Nature,
          <year>2023</year>
          , pp.
          <fpage>224</fpage>
          -
          <lpage>240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>F. P.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Vandome</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>McBrewster</surname>
          </string-name>
          . Levenshtein Distance:
          <article-title>Information theory</article-title>
          , Computer science,
          <source>String (computer science)</source>
          ,
          <article-title>String metric, Damerau?Levenshtein distance, Spell checker, Hamming distance</article-title>
          . Alpha Press,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>D. B. Nguyen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hofart</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Theobald</surname>
            , and
            <given-names>G. Weikum.</given-names>
          </string-name>
          “
          <article-title>AIDA-light: High-Throughput Named-Entity Disambiguation”</article-title>
          .
          <source>In: Proc. of the Workshop on Linked Data on the Web co-located with the 23rd Int. World Wide Web Conf. (WWW</source>
          . Vol.
          <volume>1184</volume>
          . CEUR Workshop Proceedings.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Atsidis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hildebrand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Brinkerink</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gordea</surname>
          </string-name>
          . “
          <article-title>Linking Subject Labels in Cultural Heritage Metadata to MIMO Vocabulary using CultuurLink”</article-title>
          .
          <source>In: Proc. of the 15th European Networked Knowledge Organization Systems Workshop (NKOS) co-located with the 20th Int. Conf. on Theory and Practice of Digital Libraries (TPDL)</source>
          .
          <source>Vol. 1676. CEUR Workshop Proceedings</source>
          .
          <year>2016</year>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>O.</given-names>
            <surname>Suissa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elmalech</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhitomirsky-Gefet</surname>
          </string-name>
          .
          <article-title>“Text analysis using deep neural networks in digital humanities and information science”</article-title>
          .
          <source>In: Journal of the Association for Information Science and Technology</source>
          <volume>73</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>O.</given-names>
            <surname>Suominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Inkinen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lehtinen</surname>
          </string-name>
          . “Annif and
          <string-name>
            <surname>Finto</surname>
            <given-names>AI</given-names>
          </string-name>
          :
          <article-title>Developing and Implementing Automated Subject Indexing”</article-title>
          .
          <source>In: Italian Journal of Library, Archives and Information Science 13.1</source>
          (
          <issue>2022</issue>
          ), pp.
          <fpage>265</fpage>
          -
          <lpage>282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S. Silvey. Statistical</given-names>
            <surname>Inference</surname>
          </string-name>
          .
          <source>Monographs on statistics and applied probability. Chapman &amp; Hall</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Stiller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Petras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gäde</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Isaac</surname>
          </string-name>
          . “
          <article-title>Automatic Enrichments with Controlled Vocabularies in Europeana: Challenges and Consequences”</article-title>
          . In: Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection. Springer International Publishing,
          <year>2014</year>
          , pp.
          <fpage>238</fpage>
          -
          <lpage>247</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>I.</given-names>
            <surname>Tanasijević</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Pavlović-Lažetić</surname>
          </string-name>
          .
          <article-title>“HerCulB: content-based information extraction and retrieval for cultural heritage of the Balkans”</article-title>
          .
          <source>In: The electronic library 38.5/6</source>
          (
          <year>2020</year>
          ), pp.
          <fpage>905</fpage>
          -
          <lpage>918</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Josifoski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          . “
          <article-title>Scalable Zero-shot Entity Linking with Dense Entity Retrieval”</article-title>
          .
          <source>In: Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <year>2020</year>
          , pp.
          <fpage>6397</fpage>
          -
          <lpage>6407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [25]
          <string-name>
            <surname>M. Wu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Brandhorst</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. Marinescu</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hlava</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Busch</surname>
          </string-name>
          . “
          <article-title>Automated metadata annotation: What is and is not possible with machine learning”</article-title>
          .
          <source>In: Data Intelligence</source>
          <volume>5</volume>
          .1 (
          <issue>2023</issue>
          ), pp.
          <fpage>122</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>