<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">INVENiT: Exploring cultural heritage collections while adding annotations</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Chris</forename><surname>Dijkshoorn</surname></persName>
							<email>c.r.dijkshoorn@vu.nl</email>
							<affiliation key="aff0">
								<orgName type="department">Computer Science</orgName>
								<orgName type="institution" key="instit1">The Network Institute</orgName>
								<orgName type="institution" key="instit2">VU University Amsterdam</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jacco</forename><surname>Van Ossenbruggen</surname></persName>
							<email>jacco.van.ossenbruggen@cwi.nl</email>
							<affiliation key="aff1">
								<orgName type="institution">Centrum Wiskunde en Informatica</orgName>
								<address>
									<settlement>Amsterdam</settlement>
									<country key="NL">the Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lora</forename><surname>Aroyo</surname></persName>
							<email>lora.aroyo@vu.nl</email>
							<affiliation key="aff0">
								<orgName type="department">Computer Science</orgName>
								<orgName type="institution" key="instit1">The Network Institute</orgName>
								<orgName type="institution" key="instit2">VU University Amsterdam</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Guus</forename><surname>Schreiber</surname></persName>
							<email>guus.schreiber@vu.nl</email>
							<affiliation key="aff0">
								<orgName type="department">Computer Science</orgName>
								<orgName type="institution" key="instit1">The Network Institute</orgName>
								<orgName type="institution" key="instit2">VU University Amsterdam</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">INVENiT: Exploring cultural heritage collections while adding annotations</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E0E9638678D13E769D200F5E17E54F34</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T11:25+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The growing number of cultural heritage collections published as Linked Data has given rise to a vast source of collection objects to explore. To provide an experience which goes beyond traditional search, the links from objects to terms from structured vocabularies can be used to create new paths to explore. We present INVENiT, a semantic search system which leverages these paths for result diversification and clustering. Users can freely explore the collection, but are also able to contribute their knowledge by annotating collection objects. The added information is directly incorporated in the search results. The demo can be found at http://sealinc.ops.few.vu.nl/invenit/.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The increasing number of cultural heritage collections published as Linked Data promises to be an incredible source of rich content for end users to explore <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b1">2]</ref>. Explorability of the collections heavily depends on the quality of the metadata describing the objects <ref type="bibr" target="#b2">[3]</ref>. The ability to explore collections is increased when a dense network of links between objects is created. These relations can be realised by linking objects to other collection objects and entities from structured vocabularies.</p><p>For example, the Rijksmuseum Amsterdam publishes its collection online and for this purposes employs catalogers to register, annotate and digitise collection objects. They use a limited set of structured vocabularies to annotate the subject matter, the material, techniques and artists. However, there is a multitude of LOD vocabularies that can be used in addition to support the desired exploration of the collection.</p><p>Many catalogers have a background in art-history allowing them to only provide basic information about different subject matter domains. To fill the missing domain expertise, and provide annotations in all the domains represented in the Rijksmuseum collection, we involve in the curation process people from outside the museum that have expert knowledge in each of those domains. In this paper we discuss a use case demonstrator, which allows external experts to annotate parts of images with terms from structured vocabularies. The contribution of this work is three-fold. First, we align the new vocabularies to the existing annotation vocabulary structure of the Rijksmuseum, following standardised data models, e.g. Europeana Data Model. Second, we explore linked data patterns to optimise the use of these aligned vocabularies in the presentation and exploration of search results. Finally, we integrate the annotation results of the external annotators in a common semantic search system http://sealinc.ops.few.vu.nl/invenit/.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Data</head><p>The Rijksmuseum collection comprises around 1,000,000 artworks, of which 159,661 have a digital representation. The RDF data is modelled according to the Europeana Data Model <ref type="bibr" target="#b1">[2]</ref>. Objects are linked to multiple vocabularies: the Iconclass vocabulary<ref type="foot" target="#foot_0">3</ref> for describing subject matter, the Art and Architecture Thesaurus (AAT) for materials and techniques and the Union List of Artist Names<ref type="foot" target="#foot_1">4</ref> (ULAN) for artists.</p><p>The Rijksmuseum collection contains links to 11,945 of the 39,578 concepts in Iconclass. These concepts are hierarchically structured, with more specific resources further down the hierarchy. While there are many links from collection object to AAT these concern a limited number of materials and techniques. In contrast many distinct links are made to ULAN, the Rijksmuseum has a diverse collection with works made by many different artists. In addition ULAN defines interesting relations between the concepts, for example teacher of and uncle of. For the current demo we take a subset of 1,598 object from the Rijksmuseum collection: artworks with depictions of birds. The catalogers might not have enough knowledge to classify which species of bird is depicted while there are many bird enthusiasts who do. To test the use of additional structured vocabularies we made a conversion of the IOC world birdlist<ref type="foot" target="#foot_2">5</ref> , including 31,644 species and sub species. Figure <ref type="figure" target="#fig_0">1</ref> shows an example of an artwork with an added annotation. The INVENiT demonstrator has been also instantiated with other Rijksmuseum sub-collections, e.g. prints related to biblical topics and books http://invenit.wmprojects.nl/.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">System</head><p>INVENiT is based on the Cliopatria semantic web server <ref type="bibr" target="#b3">[4]</ref>, extended with an annotation module and a cluster search module, of which the corresponding interfaces are depicted in Figure <ref type="figure" target="#fig_2">2</ref>. The annotation module provides functionality to add annotations to images. The annotation fields can be tailored to the use case and autocompletion is based on a specified vocabulary. Relevant objects in the image can be identified by drawing bounding boxes and all of the provided information is stored in a triple store.  Users can use this search functionality to explore the collection and find artworks to annotate. We adapted the algorithm to interpret the added annotation as subject matter metadata, which allows the user to directly inspect the result of their efforts in the search results. The demo in its initial state (without user contributed content), supports exploration based on the metadata provided by the Rijksmuseum Amsterdam.</p><p>The presented clusters are generated based on paths in the graph. These paths can be based on a direct link between a literal and artworks, but also longer paths are used. When possible properties are abstracted to their (SKOS) root properties. The recourses used in paths are abstracted to their class. Below three examples of paths can be found:</p><formula xml:id="formula_0">1) Literal → title → Artworks 2) Literal → subject → Owls → broader → Birds → subject → Artworks 3) Literal → prefLabel → Artist → teacherOf → Artist → creator → Artworks</formula><p>These examples illustrate the characteristics of the dataset and vocabularies. The first example includes results based on metadata in the collection. The second example generalises the results based on links in Iconclass. The third example uses links within the ULAN vocabulary to cluster results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Discussion and Future Work</head><p>The INVENiT demo uses the semantics in structured vocabularies to diversify and cluster results. Users can make new connections by annotating collection objects with terms from structured vocabularies. We believe that providing users with the possibility to find the objects they like to annotate and directly inspect the results of their efforts will have a positive effect on their motivation.</p><p>Currently all annotations are accepted and incorporated in the system. This is not something a museum would allow, since unknowledgeable or malicious users might add incorrect information. We therefore plan to incorporate trust assessment in current work, providing an indication whether an annotation is trustworthy or not, for example based on the assessed expertise of an annotator <ref type="bibr" target="#b0">[1]</ref>.</p><p>There are still open issues to address regarding the clustering of search results based on paths in the graph. The relevance of the results in a cluster are influenced by the path used to retrieve them. At this moment the clusters are ranked according to the number of results within the cluster. We plan on improving this, since the meaningfulness of a path depends on the perception of the user.</p><p>The clusters of results are named by the paths used to create them. These paths are hard to interpret for a user. Take for example Literal → title → Artworks. This could be translated into the name "works titled". Especially longer paths are difficult to concisely describe. Automatically generating more user-friendly names is a problem we want to address in future work.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 :</head><label>1</label><figDesc>Fig.1:The print "Eagle owl in magnolia" modelled according to the Europeana Data Model, with an annotation added specifying the species of depicted bird. The annotation resource has two targets, the cultural heritage object and a generated target resource, linking the digital representation with the area specified by the annotator.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>(a) Annotation interface showing a bounding box and autocompletion. (b) Search interface showing clustered search results.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 2 :</head><label>2</label><figDesc>Fig. 2: Annotation and search interface of the INVENiT demo system.</figDesc><graphic coords="3,140.15,347.41,165.99,131.64" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">http://www.iconclass.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">http://www.getty.edu/research/tools/vocabularies/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">http://github.com/rasvaan/naturalis</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgements. This publication is supported by the Dutch national program COMMIT/. We like to thank the members of the SEALINCMedia worktable and in particular the Rijksmuseum for their support.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Efficient semi-automated assessment of annotations trustworthiness</title>
		<author>
			<persName><forename type="first">D</forename><surname>Ceolin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nottamkandath</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Fokkink</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Trust Management</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">3</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Europeana linked open data -data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Isaac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Haslhofer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">europeana</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Connecting the smithsonian american art museum to the linked data cloud</title>
		<author>
			<persName><forename type="first">P</forename><surname>Szekely</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Knoblock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fink</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Allen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Goodlander</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web: Semantics and Big Data</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Thesaurus-Based Search in Large Heterogeneous Collections</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wielemaker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hildebrand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Van Ossenbruggen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Schreiber</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page">C2008</biblScope>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
