<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Combining Automatic Annotation with Human Validation for the Semantic Enrichment of Cultural Heritage Metadata</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Eirini</forename><surname>Kaldeli</surname></persName>
							<email>ekaldeli@image.ece.ntua.gr</email>
							<affiliation key="aff0">
								<orgName type="department">School of Electrical and Computer Engineering</orgName>
								<orgName type="laboratory">AI and Learning Systems Lab</orgName>
								<orgName type="institution">National Technical University of Athens</orgName>
								<address>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alexandros</forename><surname>Chortaras</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Electrical and Computer Engineering</orgName>
								<orgName type="laboratory">AI and Learning Systems Lab</orgName>
								<orgName type="institution">National Technical University of Athens</orgName>
								<address>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vassilis</forename><surname>Lyberatos</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Electrical and Computer Engineering</orgName>
								<orgName type="laboratory">AI and Learning Systems Lab</orgName>
								<orgName type="institution">National Technical University of Athens</orgName>
								<address>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jason</forename><surname>Liartis</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Electrical and Computer Engineering</orgName>
								<orgName type="laboratory">AI and Learning Systems Lab</orgName>
								<orgName type="institution">National Technical University of Athens</orgName>
								<address>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Spyridon</forename><surname>Kantarelis</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Electrical and Computer Engineering</orgName>
								<orgName type="laboratory">AI and Learning Systems Lab</orgName>
								<orgName type="institution">National Technical University of Athens</orgName>
								<address>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Giorgos</forename><surname>Stamou</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Electrical and Computer Engineering</orgName>
								<orgName type="laboratory">AI and Learning Systems Lab</orgName>
								<orgName type="institution">National Technical University of Athens</orgName>
								<address>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Combining Automatic Annotation with Human Validation for the Semantic Enrichment of Cultural Heritage Metadata</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">CAC68FEA3EE078C7AF0E35FC213B2973</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:49+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>semantic enrichment</term>
					<term>cultural heritage metadata</term>
					<term>named entity recognition and disambiguation</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The addition of controlled terms from linked open datasets and vocabularies to metadata can increase the discoverability and accessibility of digital collections. However, the task of semantic enrichment requires a lot of effort and resources that cultural heritage organizations often lack. State-of-the-art AI technologies can be employed to analyse textual metadata and match it with external semantic resources. Depending on the data characteristics and the objective of the enrichment, different approaches may need to be combined to achieve high-quality results. What is more, human inspection and validation of the automatic annotations should be an integral part of the overall enrichment methodology. In the current paper, we present a methodology and supporting digital platform, which combines a suite of automatic annotation tools with human validation for the enrichment of cultural heritage metadata within the European data space for cultural heritage. The methodology and platform have been applied and evaluated on a set of datasets on crafts heritage, leading to the publication of more than 133K enriched records to the Europeana platform. A statistical analysis of the achieved results is performed, which allows us to draw some interesting insights as to the appropriateness of annotation approaches in different contexts. The process also led to the creation of an openly available annotated dataset, which can be useful for the in-domain adaptation of ML-based enrichment tools.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Semantic enrichment is the process of adding new semantics to unstructured data, such as free text, so that machines can make sense of it and build connections to it. In the case of the metadata that describes Cultural Heritage (CH) items, unstructured data comes in the form of free text that details several aspects of the item, for example its main characteristics, its location, creator, etc. Through the process of semantic enrichment, those textual descriptions are analyzed and augmented with controlled terms from Linked Open datasets, such as Wikidata 1  and Geonames 2 , or controlled vocabularies, such as the Getty Art &amp; Architecture Thesaurus 3 (AAT). Those terms represent concepts and attributes (e.g. "costume", "Renaissance", colors), named entities, such as persons, locations, and organisations, or chronological periods. For example, the strings "Leonardo da Vinci" and "da Vinci, Leonardo" can be both linked to the Wikidata term representing the Italian Renaissance polymath. This additional piece of information associated with a CH resource is commonly referred to as an annotation, which links the CH object with some URI (Unique Reference Identifier) derived from vocabularies or open data sources.</p><p>Semantic enrichment adds meaning and context to digital collections and makes them more easily discoverable. Given its importance, it has been a main concern and focus of efforts by the Europeana digital library <ref type="foot" target="#foot_0">4</ref> as well as individual data aggregators and providers. Firstly, linked data makes the meaning of textual metadata unambiguous <ref type="bibr" target="#b24">[25]</ref>. For example, the string "Leonardo da Vinci" may refer, depending on the context, to the Italian Renaissance polymath or the homonymous airport in Fiumicino, Italy, or a battleship with the same name. By linking the text with the correct URI, it becomes clear what the text refers to. Secondly, linked data allows us to retrieve additional information about a certain entity in an automated way, build connections between different resources and contextualize them <ref type="bibr" target="#b8">[9]</ref>. For example, it allows us to link items tagged with the term "ring" with the broader concept of "jewelry" and, thus, interconnect them with items enriched with the term "bracelet", which is also an instance of "jewelry ". Moreover, linked data usually comes with translated labels, thus improving the capabilities for multilingual search <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b11">12]</ref>:</p><p>Semantic enrichment is a labour-intensive process, which requires effort and resources that CH institutions often lack. State-of-the-art AI technologies can be employed to automate the time-consuming and often mundane process of manual metadata enrichment. Natural language processing (NLP) tools can be used to analyse textual metadata and detect and classify concepts or named entities mentioned in unstructured text. Machine Learning (ML) approaches are extensively used for the task of disambiguation, which is responsible for deciding if the reference to 'Leonardo da Vinci' in the text refers to the Italian polymath or to the battleship. However, the accuracy of the automatic results highly hinges on the specific task at hand visa-vis the algorithm applied. For example, short textual descriptions, which are common in CH metadata, lack context and thus ML algorithms trained on Wikipedia articles may result in many incorrect matches. For similar reasons, they may often miss domain-specific matches that are relevant in the specific CH context. What's more, even if the automatically detected links are correct, they may be considered undesirable for a certain case study. For example, linking metadata records with terms representing colours may be important for a fashion collection, but it may be undesirable for describing a manuscript that happens to mention a certain colour.</p><p>As a result, depending on a number of factors, such as the text characteristics (e.g. its length and language), the vocabulary that we wish to link it to, and the type of entities to detect (e.g. do we wish to identify a broad variety of concepts or to limit ourselves to certain domain-specific terms?), a different combination of tools and steps is required to achieve the best possible results for each specific task. For example, for certain tasks with a well-defined restricted context, a simple lemmatisation and string matching approach may be more appropriate than complex ML-based algorithms. Besides the need for flexibility in combining and experimenting with different approaches and tools, another crucial aspect that needs to be considered is the need to make human inspection and validation an integral part of the end-to-end semantic enrichment workflow <ref type="bibr" target="#b12">[13]</ref>. Given that manual validation is a resource-consuming task, practically, evaluation focuses on an appropriately selected sample of all the automatic annotations, depending on the collected feedback and the objective, appropriate filtering criteria are applied.</p><p>To address the aforementioned challenges, in this paper, we define, implement, and test a methodology and associated digital platform, called SAGE <ref type="foot" target="#foot_1">5</ref> , which combines automatic annotation tools with human validation for the enrichment of CH items at scale. SAGE is an open source tool <ref type="foot" target="#foot_2">6</ref> that streamlines and facilitates the whole workflow of semantic enrichment, from data import and the automatic production of semantic annotations to human validation and data publication. The platform has been configured to serve the needs of the cultural sector and supports seamless interoperability with the common European data space for CH <ref type="foot" target="#foot_3">7</ref> and in particular with Europeana.</p><p>The methodology and platform have been applied to enrich the metadata records from datasets on various aspects of crafts heritage (from furniture to jewelry and costumes to clocks) coming from 8 different CH organisations, including the Fashion Museum Antwerp, the Netherlands Institute for Sound and Vision, the Open University of the Netherlands, the Greek National Documentation Centre, the Museum of Arts and Crafts in Zagreb, the Palais Galliera and Mobilier National in Paris, and the Textile Museum of Prato. The rest of the paper is structured as follows. After discussing related work, we present the steps of the methodology to semantic enrichment that we followed along with the technical architecture and the supporting SAGE platform, the evaluation performed and the results achieved. Finally, we conclude the paper with some general lessons learned.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>State-of-the-art Natural Language Processing and Machine Learning technologies have been extensively used in the CH domain to analyze unstructured text and extract structured information from it. To achieve automated subject indexing, Annif <ref type="bibr" target="#b21">[22]</ref> is an open-source multilingual toolkit by the National Library of Finland that automatically assigns documents with subjects from a controlled vocabulary. In <ref type="bibr" target="#b0">[1]</ref>, a topic detection approach is applied to group historical documents into thematic collections. Additionally, the HerCulB system <ref type="bibr" target="#b22">[23]</ref> has been developed to automatically annotate the Balkans' intangible CH. Other approaches propose the use of semi-automatic tools to assist humans in the task of manual annotation by identifying alignments between vocabularies, such as CultuurLINK <ref type="bibr" target="#b15">[16]</ref>.</p><p>Among information retrieval approaches, there have been several attempts to apply Named Entity Recognition (NER) as well as Disambiguation (NED) in the CH and digital humanities sectors, considering different types of data. In <ref type="bibr" target="#b10">[11]</ref>, NERD is applied to enrich metadata for the exhibits of the Smithsonian Cooper-Hewitt National Design Museum in New York. In <ref type="bibr" target="#b7">[8]</ref>, an overview of NER approaches applied to historical documents is provided. An entity matching approach that works at the level of structured knowledge graphs, aiming to identify duplicate entities in data sources containing historical data is presented in <ref type="bibr" target="#b1">[2]</ref>. In <ref type="bibr" target="#b2">[3]</ref>, the authors conduct a comparative study of different NERD tools on digital archive collections in order to link Engish textual metadata to Wikidata entities. In their study, the multilingual NERD tool mGENRE <ref type="bibr" target="#b5">[6]</ref>, which we employ in the current study, outperforms other approaches including BLINK <ref type="bibr" target="#b23">[24]</ref> and EDGEL <ref type="bibr" target="#b13">[14]</ref>. The need to deal with multilingual text is another important concern in the CH domain, e.g. named entity recommendation has been explored as a means to enhance multilingual retrieval on Europeana <ref type="bibr" target="#b9">[10]</ref>. In this respect, the multilingual autoregressive entity linking approach employed by mGENRE is another advantage of the particular tool.</p><p>It should also be noted that NERD tools are trained on generic corpora <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b23">24]</ref>, that have limited overlap with CH-related textual metadata <ref type="bibr" target="#b11">[12]</ref>. Adapting these tools to new domains by fine tuning them requires large amounts of well-annotated data, with labels that need to be generated or validated by domain experts, as well as large computational power, time and funds. These challenges are extensively discussed in <ref type="bibr" target="#b20">[21]</ref> for the domain of Digital Humanities. Although domain adaptation of ML models is beyond the scope of the current paper, the methodology we advocate can lead to the production of high-quality ground truth data with reduced costs: validators are provided with datasets that have been already automatically annotated, an approach that highly facilitates their manual task, which becomes more focused and less cumbersome. This process allows us to make openly available a selection of appropriately processed annotated metadata from the CH domain (see Section 4), thus contributing to increasing the availability of annotated metadata that can be used for the in-domain tuning of NERD tools.</p><p>As the uptake of AI tools is expanding, there is increasing need for validation and moderation by humans to overcome the errors of the machine and achieve higher quality results <ref type="bibr" target="#b19">[20]</ref>. Crowdsourcing methods and tools have been employed by CH organisations in this respect <ref type="bibr" target="#b12">[13]</ref> as a means to mobilise human participants in the evaluation and correction of AI algorithm outcomes, also leading to the preparation of ground-truth data <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b11">12]</ref>. For tasks that require specialised expertise, in <ref type="bibr" target="#b6">[7]</ref> a niche-sourcing methodology and tool for the annotation of CH metadata is proposed, which, similar to our approach, uses an RDF triple store to store the results. However, as opposed to the the current work, the methodology relies solely on manual selections by experts with no use of automatic annotation tools.</p><p>Overall, our work distinguishes itself from previous work on semantic enrichment mainly in that it is based on a generic data management approach, which allows the combination of various annotation tools with flexible parameterisation capabilities (such as the definition of string matching and filtering rules); in that it includes human validation as an integral part of its workflow; and that it supports integrations with other CH-specific data representations and platforms, making it readily reusable in the CH data space. It should be noted that the integration with external annotation tools and CH-related platforms is loosely coupled, via interactions with the APIs (Application Programming Interface) and SPARQL endpoints exposed by the third-party components.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology and Technical Architecture</head><p>The methodology we followed for the semantic enrichment of CH metadata consists of the following high-level steps:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">A: Data aggregation and requirements analysis</head><p>The first step concerns the preparatory tasks of aggregating the data and specifying the requirements for the the enrichment (e.g. which metadata fields to analyse, which vocabularies to link to etc).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">B: Automatic metadata enrichment</head><p>The second step involves the automatic analysis of the textual metadata, with the aim to derive useful annotations in line with the identified requirements.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">C: Human validation</head><p>Humans are solicited to review and validate the automatically generated annotations as well as to manually add new annotations, that the automatic algorithm has not been able to detect.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">D: Filtering and data publication</head><p>The outcomes of the human validation are analysed to establish appropriate thresholds for filtering and the filtered annotations are embedded as enrichments to the metadata records. The enriched metadata records are ultimately published to the Europeana platform.</p><p>Figure <ref type="figure" target="#fig_0">1</ref> provides an overview of the main digital components that support the above methodology. MINT is a metadata management tool <ref type="foot" target="#foot_4">8</ref> that is part of the data space for CH and is used by several aggregators to prepare and publish their data to Europeana. It acts as the link between SAGE and Europeana and supports steps A and D of the aforementioned methodology, serving the following purposes: (i) aggregate the metadata records from the data providers and for mapping them to the Europeana Data Model (EDM) <ref type="bibr" target="#b3">[4]</ref> that is then passed to SAGE; and (ii) embed the annotations produced by SAGE, after filtering in light of the human feedback, into the original metadata records in line with the expected EDM extension that accommodates for enrichments <ref type="foot" target="#foot_5">9</ref> and ultimately publishing the results to Europeana. It should be noted that data already published on the Europeana platform can also be sourced directly by SAGE for annotation, via a direct interconnection with the Europeana search API <ref type="foot" target="#foot_6">10</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Semantic analysis and enrichment</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">The SAGE tool for automatic enrichment and validation</head><p>SAGE is a web-based platform for generating, enriching, validating, publishing, and searching RDF data. In the context of our methodology, it is responsible for the core steps B and C. The RDF data can be produced from heterogeneous data sources and data formats using the D2RML mapping language <ref type="bibr" target="#b4">[5]</ref>, and enriched using annotators that wrap web-based or other third party services. The enrichments can then be manually validated, and finally, the entire data can be published in an RDF store and indexed. The SAGE platform has been configured to facilitate the semantic enrichment of CH metadata. In this respect, it offers a suite of already set-up annotators, i.e. parameterisable enrichment templates, that are connected with relevant in-domain vocabularies and knowledge bases. It also facilitates the direct import/publication of metadata from/to platforms of the European data space for CH, including Europeana and MINT, making use of established APIs and formats .</p><p>A dataset is annotated per property, i.e. the user can select from the schema preview a property that links entities to values, and execute an annotator on the values of that property. An annotator in SAGE is a mediator that retrieves all desired values from the triple store where the dataset content is published, generates the appropriate calls to the web or other service, and transforms the results to the RDF annotation specification. As in the case of datasets, the results of an annotator execution are Terse RDF Triple Language<ref type="foot" target="#foot_7">11</ref> files stored in the file system of SAGE. In the framework of the data space of CH, annotations are also expressed in a JSON-LD equivalent representation model <ref type="foot" target="#foot_8">12</ref> , which bases on the W3C's Web Annotation Model <ref type="foot" target="#foot_9">13</ref>supported by Europeana. The annotation model is generic enough to accommodate for various enrichment types (e.g. annotations resulting from automatic translation tools, from image analysis etc) and provides sufÏcient provenance information, including information about the annotations' confidence scores and the validation feedback provided by humans. For metadata records that are compatible with EDM, the annotations are ultimately embedded in the metadata in line with the EDM extension that instructs the representation of metadata statements resulting from semantic enrichment <ref type="foot" target="#foot_10">14</ref> . This way, the enrichments can be appropriately handled and presented to the end-user by the Europeana platform.</p><p>SAGE supports three main types of annotators, which can be parameterised with respect to different aspects (e.g. vocabulary, language, preprocessing functions etc) to serve different case studies:</p><p>• Thesaurus annotators: They link texts to URIs from thesauri that can be imported to the platform by performing smart string matching on the thesaurus labels using lemmatizers (such as the ones provided by the Stanza library <ref type="foot" target="#foot_11">15</ref> ) and other functions to produce improved results (e.g. apply dedicated Regex rules). They are appropriate for application both on generic textual fields and on focused short fields. By selecting thesauri that represent concepts referring to specific domains (e.g. fashion), it is more likely that the extracted terms are relevant to the object in question. Moreover, such annotators can perform massive enrichments in a very short time compared to the other annotators, since they rely on locally stored data. Figure <ref type="figure" target="#fig_2">3</ref> provides an overview of how a Thesaurus Annotator works on a specific example. • Generic NERD annotators: They employ pre-trained NERD tools to detect named entities and link them to respective entities from Wikidata. SAGE supports two different pipelines for generic NERD. The first pipeline makes use of the AIDA tool <ref type="bibr" target="#b17">[18]</ref> for entity detection and disambiguation. The second pipeline makes use of the spaCy library <ref type="foot" target="#foot_12">16</ref> for performing the NER part for different languages, i.e. for recognising entities and their string boundaries within a sentence, and then of the multilingual mGENRE model <ref type="bibr" target="#b5">[6]</ref> for the disambiguation stage and for linking with a URI from Wikidata. Such annotators can be used as they are, with minimal or no configurations and are appropriate for general-purpose enrichments. They conduct disambiguation by using the context contained in longer texts (e.g. description), since they are trained on textual corpora such as Wikipedia articles. At the same time, this process is more likely than the other annotators to link with terms that are too generic or irrelevant in the context of a specific case study, while it is hard to infer with sufÏcient accuracy the type of the extracted entity and its relation to the object in question (e.g. whether it represents the item's creator, a place of display etc). As a result, in practice, they often produce more accurate results when applied in fields with pre-specified focused semantics. • SPARQL Annotators: SPARQL annotators communicate with external knowledge bases (such as Wikidata and Geonames) through SPARQL endpoints. Thus, they are the best fit when dealing with large knowledge bases that cannot be downloaded locally.They can be applied on focused fields that refer to a single entity. The values of such fields often follow certain patterns (e.g. "surname, name", "city/region/country" etc) and, thus, pre-processing with Regex is key to the success of the method, so that a normal form of the entity name can be extracted. An example of a query that matches Wikidata entities with the occupation of a painter is presented in Figure <ref type="figure" target="#fig_1">2</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Human Validation</head><p>Human validation was conducted via a dedicated environment provided by SAGE (see Figure <ref type="figure" target="#fig_3">4</ref>). Humans are invited to inspect the automatic annotations produced by the AI tools and accept or reject them. Moreover, they can add missed annotations, i.e. relevant annotations that the automatic algorithm failed to identify. During the validation of the results of the semantic analysis, validators are also able to edit the predefined target metadata field in which the URI will end up. It should be noted that SAGE groups together annotations repeated across many records in a dataset and flags annotations referring to URIs that are already included in the metadata. In total, 14 CH professionals with specialized knowledge about the considered collections participated in the validation process, with two to three validators per collection.</p><p>Participants were instructed to accept or reject annotations based on what they consider as desirable for inclusion in the final metadata. That is, they evaluated not only whether an annotation is a correct match but also in terms of relevance (e.g. matches with the term "human" may be considered too generic) .</p><p>The appropriate size and characteristics of the sample to be validated depend on the available resources that can be invested in the validation process and the nature of the use case. What is considered a "sufÏcient" amount hinges on many factors, including the total number of automatically produced annotations, their characteristics (e.g. what metadata fields they refer to, their granularity, etc), the characteristics of the automated algorithm that produced them (e.g. its accuracy, the reliability of the automatic confidence scores it assigned to them), the number of participants and the amount of time they can devote to the task. The following criteria were used to guide the selection of the annotations sample to be validated, so as to ensure representativeness across various parameters:</p><p>• Inspect annotations that appear in a high number of records and thus will have a high impact. • Ensure a balanced representation of metadata fields, including fields with varying semantics and expected text length. • Take into consideration automatic confidence levels assigned by automatic algorithms, if available: inspect annotations with a rather low confidence score but also a sufÏcient number of annotations with a rather high one. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Analysis and Filtering of Annotations</head><p>Validation feedback was analysed with the aim of establishing thresholds for annotations that are considered acceptable for publication. To this end, the following metrics have been calculated per dataset, per Annotator, and per analysed metadata field:</p><p>• Precision considering only unique annotations, that is, unique triples of field textual values, matched sub-string, and identified URI. • Precision considering all annotations, that is, without grouping together identical field textual values (in other words, counting all times the same annotation, defined as a triple, may appear in different items).</p><p>In both cases, precision was calculated as 𝑇 𝑃/(𝑇 𝑃 + 𝐹 𝑃), where 𝑇 𝑃 = 𝑎𝑐𝑐𝑒𝑝𝑡𝑒𝑑 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛𝑠 and 𝐹 𝑃 = 𝑟𝑒𝑗𝑒𝑐𝑡𝑒𝑑 𝑎𝑛𝑛𝑜𝑡𝑎𝑡𝑖𝑜𝑛𝑠. Precision was used as a threshold for filtering out not-reviewed annotations on a field or Annotator basis. What is considered a sufÏciently high precision depends on the requirements of each case study and the expectations of the data provider.</p><p>For the use cases we considered, most human experts did not focus on the manual insertion of new annotations and the few manually added annotations we collected do not allow us to sufÏciently estimate false negatives and thus compute recall and the F-score. It should also be noted that for publication to Europeana, a threshold based on precision is considered the most appropriate metric to be used <ref type="foot" target="#foot_13">17</ref> .</p><p>Human judgments can also be used as a means to assess the trustworthiness of the automatic confidence scores assigned by the AI algorithms. For example, if humans tend to accept all sample annotations above a certain score, then we may conclude that all annotations above that score can be regarded as acceptable. In this vein, we explored whether there is a correlation between the automatic confidence scores, when available, and human judgments. We therefore plotted the logistic regression between the two variables considering the following metrics:</p><p>• The 𝑝 − 𝑣𝑎𝑙𝑢𝑒 <ref type="bibr" target="#b18">[19]</ref>. A value greater than 0.05 means that no statistically significant relationship between the automatic scores and the human judgments was observed. • The expected automatic score for which the predicted probability is greater than 0.7, that is annotations above this score have a probability above 0.7 to be accepted by humans, based on the sample data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results on Crafts Heritage Datasets</head><p>The aforementioned methodology and supporting tools have been applied to metadata records describing crafts heritage items as mentioned in Section 1. The analysed metadata comes in the following languages: Dutch, Italian, French, Greek, English and Croatian. In total, the SAGE annotators were applied on 216, 115 metadata records, giving rise to 915, 472 total annotations and 549, 402 unique annotations. It should be noted that numbers calculated based on unique annotations are considered more reliable since numbers that count in item impact are skewed towards textual values that are repeated in multiple items. In total, 12 experts from 8 CH organisations took part in the validation campaigns. Overall, 30, 910 unique annotations referring to more than 15K records were reviewed via SAGE (i.e. 5.6% of the automatically produced unique annotations), with the sample being selected following the criteria outlined in Section 3.2. Of those annotations, 23, 426 were accepted and 7, 474 were rejected. The overall precision, defined as the number of all accepted automatic annotations produced by SAGE over the number of reviewed automatic annotations, is 0.76, considering unique annotations. If all annotations are counted in, then the overall precision is 0.82. Precision varied largely depending on the analysed metadata field, the type of annotator that was used, and the datasets that were analysed. Table <ref type="table" target="#tab_0">1</ref> provides an overview of the results achieved by different annotators. The minimum and maximum precision reported in the table refer to a per metadata field level. The choice of the vocabulary used by the thesaurus annotators depended on the respective dataset characteristics and providers' objectives. The following vocabularies were used: the Europeana fashion thesaurus <ref type="foot" target="#foot_14">18</ref> ; AAT; the EUScreen vocabulary on audiovisual heritage <ref type="foot" target="#foot_15">19</ref> ; and a SKOS vocabulary on Greek crafts heritage <ref type="foot" target="#foot_16">20</ref> . Thesaurus annotators were applied to both longer (e.g. dc:description, dc:title) and shorter fields (e.g. dc:format, dc:type), often after case-appropriate regex pre-processing, giving rise to generally satisfactory results in both cases. SPARQL queries on Wikidata were used to retrieve creators for the dc:creator and locations for the dc:spatial fields. Although in most cases it did not produce a high number of annotations, it scored a high precision. mGENRE and AIDA were applied to dc:description and dc:title fields as well as shorter fields (including dc:creator, dc:spatial, and dc:rights). They both produced similar results, performing well for short fields but poorly for longer ones. In the latter case, they both struggled with disambiguation between multiple candidate entities and, even when producing matches that were in principle correct, those were often too generic and considered irrelevant by validators.</p><p>For annotators for which an automatic score was produced, we also attempted to plot the logistic regression between the automatic score and the human judgments. However, no correlation was found between the two variables and therefore automatic scores were not used as factors in the filtering rules. A possible explanation for this is that in the case of thesauri annotators, scores are usually quite high for all annotations: they reflect the string difference (1-Levenshtein distance <ref type="bibr" target="#b16">[17]</ref>) between words endings (since the matching is based on the lemmatised versions of the textual metadata and the thesaurus terms). For the generic NERD tools the scores turn to be quite unreliable: they are inversely proportional to the number of candidate URIs and do not sufÏciently account for disambiguation.</p><p>Annotations have been filtered by discarding all annotations rejected by humans, while including all explicitly accepted ones. (considering a majority vote). For non-reviewed annotations, a threshold based on precision between 0.75 and 0.8 (considering unique annotations) was considered acceptable by data providers. In total, 549.460 have been regarded acceptable, leading to the enrichment of 133.405 out of 216.115 analysed records. All enriched records have been published to Europeana. Enrichments have been indexed to become searchable and are visible as part of the item view via distinct tags, thus contributing to making the respective items more discoverable, contextual, and multilingual. Figure <ref type="figure" target="#fig_4">5</ref> shows an example of how automatic annotations look like on the Europeana platform.</p><p>Although domain adaptation is beyond the scope of the current case study, the dataset that resulted from the validation process can be valuable for the training and fine-tuning of NERD tools in the field of CH. To this end, a curated selection of annotated metadata enriched and validated via SAGE has been made openly available <ref type="foot" target="#foot_17">21</ref> under a CC0 license, so that it can be freely reused as data amenable for computational purposes. The dataset includes more than 10K unique annotations (pairs of analysed textual values and URIs). The in-domain adaptation of NERD tools so that they can more effectively deal with the particular characteristics of CH metadata <ref type="bibr" target="#b11">[12]</ref>, such as short text and specialised terminology, remains part of future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>In the current paper, we present a generic and reusable methodology and supporting digital platform that combines automatic annotators with human expertise in order to enrich them with terms from various linked data sources. The methodology has been applied and evaluated on a case study involving crafts heritage datasets, leading to measurable improvements in the quality of metadata and enhancing the discoverability and usability of the respective resources on Europeana. Building on the practical experience we gained, the current case study allows us to draw some lessons learned, which can prove useful for interested stakeholders who may wish to follow a similar process to enrich their datasets.</p><p>Before proceeding to the actual enrichment, it is crucial to scrutinise the data to be analysed, gain a deep understanding of its characteristics and define feasible and meaningful enrichment objectives. One should define the expected benefit of possible enrichments and how they will bring value to the collection. In this respect, one should ask questions such as: What kind of concepts are useful to detect (e.g. persons, locations, domain-specific concepts etc)? Which metadata fields contain relevant information (e.g. descriptions make frequent references to techniques and materials used)? In what languages are the metadata? It should also be noted that the quality of the original metadata affects the quality of the automatic enrichment. If the text contains many typos or is misaligned with the intended semantics of the respective metadata field, then the outputs of the automatic enrichment tools will be less accurate. This step is also crucial for detecting patterns in data that can be exploited in order to produce annotations.</p><p>The next step involves the selection and set-up of the semantic annotators that are most appropriate for the specific use case, considering the advantages and disadvantages of each approach as presented in Section 3.1. The selection of knowledge bases and vocabularies that have the case-appropriate granularity and coverage is crucial. Generally, the more focused the automatic enrichment, considering the terminology used (e.g. link with a domain-specific vocabulary versus general-purpose NERD) and the metadata property that is parsed (e.g. topicspecific fields such as dc:creator versus longer ones such as dc:description), the less the risk of producing too many irrelevant or too generic enrichments and the more accurate the resolution of disambiguation. One should opt for knowledge bases that are accessible on the Web via an open license, well-documented, and compliant with Linked Data best practices. Their multilingual coverage (also in relation to the language of your metadata) is also an important aspect that should be taken into consideration.</p><p>After the production of the automatic annotations, the validation process should be carefully organised. The background of the validators is crucial: some tasks may require expert skills (e.g., knowledge of a particular language, domain expertise etc.), while others can be performed by appealing to a general audience. In the former case, it is wiser to keep the validation process closed within a team of experts, while in the latter, organizing an open crowdsourcing campaign will mobilize more people and thus speed up the process. The selection of the sample to be validated is crucial: it does not need to be large but it should be well-balanced, following the criteria outlined in Section 3.2. The final step involves the filtering of the automatic validation in light of the acquired human feedback. For annotations reviewed by humans, majority vote can typically be used to determine acceptability. Depending on the annotation type, additional criteria might be enforced (e.g. for public validation campaigns where untrustworthy feedback is suspected, we may require that an annotation is reviewed by multiple users). Automatic annotations that have not been reviewed by humans or lack a reliable confidence score should be filtered using automatic evaluation metrics. The appropriate metrics depend on the nature of the task, but precision is a typical one when correctness is at high stake. Thresholds should be established depending on what is considered acceptable given the specific use case requirements.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Architectural Overview</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: An example of a SPARQL query searching Wikidata. It matches labels with English Wikidata labels of items having an occupation (wdt:P106) painter (wd:Q1028181). It also estimates a confidence score as 1/(𝑛𝑢𝑚𝑏𝑒𝑟_𝑜𝑓 _𝑚𝑎𝑡𝑐ℎ𝑒𝑠).</figDesc><graphic coords="8,139.91,84.17,315.46,102.60" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Overview of the SAGE Thesaurus Annotator workflow on a metadata description.</figDesc><graphic coords="8,89.28,455.99,416.72,191.22" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Screenshot from the SAGE validation environment. Strings of metadata that have been matched are shown on the left and the URI(s) they have been linked to on the right.</figDesc><graphic coords="9,126.33,346.13,342.62,273.32" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: View on Europeana of an item provided by the Museum of Arts and Crafts in Zagreb. The 'Reed', 'Wood', and 'Beech Wood' terms are all automatic enrichments added by SAGE that are visible on the item page.</figDesc><graphic coords="12,109.45,84.17,376.38,280.18" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Precision of used SAGE annotators</figDesc><table><row><cell>Annotator</cell><cell cols="3">Min Precision Max Precision Avg precision /</cell><cell></cell></row><row><cell></cell><cell>(all/unique)</cell><cell>(all/unique)</cell><cell>(all/unique)</cell><cell></cell></row><row><cell>Fashion Thesaurus Annotator</cell><cell>0.436/0.672</cell><cell>0.964/0.943</cell><cell>0.801/0.832</cell><cell></cell></row><row><cell>AAT Annotator</cell><cell>0.644/0.658</cell><cell>0.994/0.987</cell><cell>0.819/0.822</cell><cell>.</cell></row><row><cell>Greek Crafts Thesaurus Annotator</cell><cell>0.982/0.947</cell><cell>0.982/0.947</cell><cell>0.982/0.947</cell><cell></cell></row><row><cell>EUScreen Thesaurus Annotator</cell><cell>0.607/0.878</cell><cell>0.952/0.927</cell><cell>0.779/0.902</cell><cell></cell></row><row><cell>Wikidata SPARQL</cell><cell>0.894/0.817</cell><cell>1/1</cell><cell>0.981/0.963</cell><cell></cell></row><row><cell>Generic NERD with Wikidata -mGENRE</cell><cell>0.4/0.4</cell><cell>1/1</cell><cell>0.935/0.748</cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_0">https://www.europeana.eu</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_1">https://pro.europeana.eu/post/close-encounters-with-ai-an-interview-on-automatic-semantic-enrichment</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_2">Source code: https://github.com/ails-lab/sage-backend and https://github.com/ails-lab/sage-frontend Documentation: https://ails-lab.github.io/SAGE_Documentation/ and https://www.youtube.com/playlist?list=PL Zhh656xkjIsxMKShH7aV7aR8TAwmU508</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_3">https://dataspace-culturalheritage.eu/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_4">https://mint-wordpress.image.ntua.gr/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_5">https://pro.europeana.eu/files/Europeana_Professional/Share_your_data/Technical_requirements/EDM_profiles/ EDM_provenance_profile_external_202111.pdf</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_6">https://www.europeana.eu/en/apis</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_7">https://www.w3.org/TR/turtle/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12" xml:id="foot_8">https://docs.google.com/document/d/1Cq1Qqx0ji7Vw8iwLVis1CfpYKtv-72ojkcvjnQzrKjs/edit?usp=sharing</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="13" xml:id="foot_9">https://www.w3.org/TR/annotation-model/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="14" xml:id="foot_10">https://pro.europeana.eu/files/Europeana_Professional/Share_your_data/Technical_requirements/EDM_profile s/EDM_provenance_profile_external_202111.pdf</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="15" xml:id="foot_11">https://stanfordnlp.github.io/stanza/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="16" xml:id="foot_12">https://spacy.io/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="17" xml:id="foot_13">https://pro.europeana.eu/post/methodology-for-validating-enrichments</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="18" xml:id="foot_14">http://thesaurus.europeanafashion.eu/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="19" xml:id="foot_15">http://thesaurus.euscreen.eu/EUscreenXL/v1</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="20" xml:id="foot_16">https://www.semantics.gr/authorities/vocabularies/craft-item-types/vocabulary-entries</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="21" xml:id="foot_17">See https://github.com/ails-lab/ai4culture-datasets for the actual dataset and the process that was used for the data curation.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The work is co-funded by the European Union, under the projects "CRAFTED: Enrich and promote traditional and contemporary crafts" and "AI4Culture: An AI platform for the cultural heritage data space". We would like to thank all partners of the CRAFTED project, and particularly Panagiotis Tzortzis, for their valuable contributions to this work.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">An Approach for Curating Collections of Historical Documents with the Use of Topic Detection Technologies</title>
		<author>
			<persName><forename type="first">M</forename><surname>Andresel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gordea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Stevanetic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Schütz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. J. Digit. Curation</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">12</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Entity Matching in Digital Humanities Knowledge Graphs</title>
		<author>
			<persName><forename type="first">J</forename><surname>Baas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Dastani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Feelders</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the Conf. on Computational Humanities Research</title>
				<meeting>of the Conf. on Computational Humanities Research</meeting>
		<imprint>
			<date type="published" when="2021">CHR2021. 2989. 2021</date>
			<biblScope unit="page" from="1" to="15" />
		</imprint>
	</monogr>
	<note>CEUR Workshop Proceedings</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Enriching the Metadata of Community-Generated Digital Content through Entity Linking: An Evaluative Comparison of State-of-the-Art Models</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Benkhedda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Skapars</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Schlegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Nenadic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Batista-Navarro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature</title>
				<meeting>of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature<address><addrLine>St. Julians, Malta</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="213" to="220" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Mapping Cross-Domain Metadata to the Europeana Data Model (EDM)</title>
		<author>
			<persName><forename type="first">V</forename><surname>Charles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Isaac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Tzouvaras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hennicke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Research and Advanced Technology for Digital Libraries</title>
				<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="484" to="485" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">D2RML: Integrating Heterogeneous Data and Web Services into Custom RDF Graphs</title>
		<author>
			<persName><forename type="first">A</forename><surname>Chortaras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Stamou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Workshop on Linked Data on the Web co-located with The Web Conference</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">2073</biblScope>
		</imprint>
	</monogr>
	<note>CEUR Workshop Proceedings</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Multilingual Autoregressive Entity Linking</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">De</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Popat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Artetxe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Plekhanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Cancedda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Riedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Petroni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Transactions of the Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="274" to="290" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Accurator: Nichesourcing for Cultural Heritage</title>
		<author>
			<persName><forename type="first">C</forename><surname>Dijkshoorn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>De Boer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Aroyo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Schreiber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Hum. Comput</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="12" to="41" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Named Entity Recognition and Classification in Historical Documents: A Survey</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ehrmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hamdi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">L</forename><surname>Pontes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Romanello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Doucet</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">56</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Technical Usability of Wikidata&apos;s Linked Data</title>
		<author>
			<persName><forename type="first">N</forename><surname>Freire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Isaac</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Business Information Systems Workshops</title>
				<editor>
			<persName><forename type="first">W</forename><surname>Abramowicz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Corchuelo</surname></persName>
		</editor>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="556" to="567" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Named Entity Recommendations to Enhance Multilingual Retrieval in Europeana</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gordea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Paramita</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Isaac</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Foundations of Intelligent Systems</title>
				<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="102" to="112" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Exploring Entity Recognition and Disambiguation for Cultural Heritage Collections</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hooland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wilde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Verborgh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Steiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Van De Walle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Literary and Linguistic Computing</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Europeana Translate: Providing multilingual access to digital cultural heritage</title>
		<author>
			<persName><forename type="first">E</forename><surname>Kaldeli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Garcıá-Martıńez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Isaac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Scalia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Stabenau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">L</forename><surname>Almor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">G</forename><surname>Lacal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">B</forename><surname>Ordóñez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Estela</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Herranz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 23rd Annual Conference of the European Association for Machine Translation, EAMT. European Association for Machine Translation</title>
				<meeting>of the 23rd Annual Conference of the European Association for Machine Translation, EAMT. European Association for Machine Translation</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="297" to="298" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">CrowdHeritage: Crowdsourcing for Improving the Quality of Cultural Heritage Metadata</title>
		<author>
			<persName><forename type="first">E</forename><surname>Kaldeli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Menis-Mastromichalakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bekiaris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ralli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Tzouvaras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Stamou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">LMN at SemEval-2022 Task 11: A Transformer-based System for English Named Entity Recognition</title>
		<author>
			<persName><forename type="first">N</forename><surname>Lai</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 16th International Workshop on Semantic Evaluation (SemEval-2022)</title>
				<meeting>of the 16th International Workshop on Semantic Evaluation (SemEval-2022)<address><addrLine>Seattle, United States</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="1438" to="1443" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Employing Crowdsourcing for Enriching a Music Knowledge Base in Higher Education</title>
		<author>
			<persName><forename type="first">V</forename><surname>Lyberatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kantarelis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kaldeli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bekiaris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Tzortzis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Menis -Mastromichalakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Stamou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Artificial Intelligence in Education Technologies: New Development and Innovative Practices</title>
				<imprint>
			<publisher>Springer Nature</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="224" to="240" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Linking Subject Labels in Cultural Heritage Metadata to MIMO Vocabulary using CultuurLink</title>
		<author>
			<persName><forename type="first">H</forename><surname>Manguinhas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Charles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Isaac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Miles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neroulidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Ginouvès</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Atsidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hildebrand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brinkerink</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gordea</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 15th European Networked Knowledge Organization Systems Workshop (NKOS) co-located with the 20th Int. Conf. on Theory and Practice of Digital Libraries (TPDL)</title>
				<meeting>of the 15th European Networked Knowledge Organization Systems Workshop (NKOS) co-located with the 20th Int. Conf. on Theory and Practice of Digital Libraries (TPDL)</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">1676</biblScope>
			<biblScope unit="page" from="32" to="35" />
		</imprint>
	</monogr>
	<note>CEUR Workshop Proceedings</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Levenshtein Distance: Information theory, Computer science, String (computer</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">P</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">F</forename><surname>Vandome</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mcbrewster</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">String metric, Damerau?Levenshtein distance, Spell checker, Hamming distance</title>
				<imprint>
			<publisher>Alpha Press</publisher>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">AIDA-light: High-Throughput Named-Entity Disambiguation</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">B</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hoffart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Theobald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Weikum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the Workshop on Linked Data on the Web co-located with the 23rd Int. World Wide Web Conf. (WWW</title>
				<meeting>of the Workshop on Linked Data on the Web co-located with the 23rd Int. World Wide Web Conf. (WWW</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">1184</biblScope>
		</imprint>
	</monogr>
	<note>CEUR Workshop Proceedings</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Statistical Inference</title>
		<author>
			<persName><forename type="first">S</forename><surname>Silvey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Monographs on statistics and applied probability</title>
				<imprint>
			<publisher>Chapman &amp; Hall</publisher>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Stiller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Petras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gäde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Isaac</surname></persName>
		</author>
		<title level="m">Automatic Enrichments with Controlled Vocabularies in Europeana: Challenges and Consequences</title>
				<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="238" to="247" />
		</imprint>
	</monogr>
	<note>Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Text analysis using deep neural networks in digital humanities and information science</title>
		<author>
			<persName><forename type="first">O</forename><surname>Suissa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Elmalech</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhitomirsky-Geffet</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the Association for Information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">73</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Annif and Finto AI: Developing and Implementing Automated Subject Indexing</title>
		<author>
			<persName><forename type="first">O</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Inkinen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lehtinen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Italian Journal of Library, Archives and Information Science</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="265" to="282" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">HerCulB: content-based information extraction and retrieval for cultural heritage of the Balkans</title>
		<author>
			<persName><forename type="first">I</forename><surname>Tanasijević</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Pavlović-Lažetić</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The electronic library</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="issue">5/6</biblScope>
			<biblScope unit="page" from="905" to="918" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Scalable Zero-shot Entity Linking with Dense Entity Retrieval</title>
		<author>
			<persName><forename type="first">L</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Petroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Josifoski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Riedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP)</title>
				<meeting>of the Conf. on Empirical Methods in Natural Language essing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="6397" to="6407" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Automated metadata annotation: What is and is not possible with machine learning</title>
		<author>
			<persName><forename type="first">M</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Brandhorst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-C</forename><surname>Marinescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Lopez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hlava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Busch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Data Intelligence</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="122" to="138" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
