<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>INVENiT: Exploring cultural heritage collections while adding annotations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chris Dijkshoorn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jacco van Ossenbruggen</string-name>
          <email>jacco.van.ossenbruggen@cwi.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lora Aroyo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guus Schreiber</string-name>
          <email>guus.schreiberg@vu.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>"Eagle owl in</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Centrum Wiskunde en Informatica</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Computer Science, The Network Institute,VU University Amsterdam</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The growing number of cultural heritage collections published as Linked Data has given rise to a vast source of collection objects to explore. To provide an experience which goes beyond traditional search, the links from objects to terms from structured vocabularies can be used to create new paths to explore. We present INVENiT, a semantic search system which leverages these paths for result diversi cation and clustering. Users can freely explore the collection, but are also able to contribute their knowledge by annotating collection objects. The added information is directly incorporated in the search results. The demo can be found at http://sealinc.ops.few.vu.nl/invenit/.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The increasing number of cultural heritage collections published as Linked Data
promises to be an incredible source of rich content for end users to explore [
        <xref ref-type="bibr" rid="ref2 ref3">3,2</xref>
        ].
Explorability of the collections heavily depends on the quality of the metadata
describing the objects [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The ability to explore collections is increased when
a dense network of links between objects is created. These relations can be
realised by linking objects to other collection objects and entities from structured
vocabularies.
      </p>
      <p>For example, the Rijksmuseum Amsterdam publishes its collection online and
for this purposes employs catalogers to register, annotate and digitise collection
objects. They use a limited set of structured vocabularies to annotate the subject
matter, the material, techniques and artists. However, there is a multitude of
LOD vocabularies that can be used in addition to support the desired exploration
of the collection.</p>
      <p>Many catalogers have a background in art-history allowing them to only
provide basic information about di erent subject matter domains. To ll the missing
domain expertise, and provide annotations in all the domains represented in the
Rijksmuseum collection, we involve in the curation process people from outside
the museum that have expert knowledge in each of those domains. In this paper
we discuss a use case demonstrator, which allows external experts to annotate
parts of images with terms from structured vocabularies. The contribution of this
work is three-fold. First, we align the new vocabularies to the existing annotation
vocabulary structure of the Rijksmuseum, following standardised data models,
e.g. Europeana Data Model. Second, we explore linked data patterns to optimise
the use of these aligned vocabularies in the presentation and exploration of search
results. Finally, we integrate the annotation results of the external annotators in
a common semantic search system http://sealinc.ops.few.vu.nl/invenit/.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data</title>
      <p>
        The Rijksmuseum collection comprises around 1,000,000 artworks, of which
159,661 have a digital representation. The RDF data is modelled according to
the Europeana Data Model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Objects are linked to multiple vocabularies: the
Iconclass vocabulary3 for describing subject matter, the Art and Architecture
Thesaurus (AAT) for materials and techniques and the Union List of Artist
Names4 (ULAN) for artists.
      </p>
      <p>The Rijksmuseum collection contains links to 11,945 of the 39,578 concepts
in Iconclass. These concepts are hierarchically structured, with more speci c
resources further down the hierarchy. While there are many links from collection
object to AAT these concern a limited number of materials and techniques. In
contrast many distinct links are made to ULAN, the Rijksmuseum has a diverse
collection with works made by many di erent artists. In addition ULAN de nes
interesting relations between the concepts, for example teacher of and uncle of.</p>
      <p>For the current demo we take a subset of 1,598 object from the
Rijksmuseum collection: artworks with depictions of birds. The catalogers might not
3 http://www.iconclass.org/
4 http://www.getty.edu/research/tools/vocabularies/
dc:title
skos:prefLabel
dc:
subject
ico: 25F34
birds:speciesbubo_bubo
ann:h "0.27"^^xsd:float
ann:w "0.38"^^xsd:float
ann:x "0.18"^^xsd:float
ann:y
"0.40"^^xsd:float
agg:COLLEC</p>
      <p>T.504055</p>
      <p>edm:
isShownBy</p>
      <p>edm:
aggregated</p>
      <p>CHO
#xywh=percent:
19,40,39,27
hdl:
504055
oa:hasTarget
ann:id_6111f
edf002e9
oa:hasTarget
rdf:value
oa:hasSource
genid:target
_44478b9eb
have enough knowledge to classify which species of bird is depicted while there
are many bird enthusiasts who do. To test the use of additional structured
vocabularies we made a conversion of the IOC world birdlist5, including 31,644
species and sub species. Figure 1 shows an example of an artwork with an added
annotation. The INVENiT demonstrator has been also instantiated with other
Rijksmuseum sub-collections, e.g. prints related to biblical topics and books
http://invenit.wmprojects.nl/.
3</p>
    </sec>
    <sec id="sec-3">
      <title>System</title>
      <p>
        INVENiT is based on the Cliopatria semantic web server [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], extended with
an annotation module and a cluster search module, of which the corresponding
interfaces are depicted in Figure 2. The annotation module provides
functionality to add annotations to images. The annotation elds can be tailored to
the use case and autocompletion is based on a speci ed vocabulary. Relevant
objects in the image can be identi ed by drawing bounding boxes and all of the
provided information is stored in a triple store.
      </p>
      <p>(a) Annotation interface showing a (b) Search interface showing clustered
bounding box and autocompletion. search results.</p>
      <p>
        The cluster search module utilises a graph search algorithm, which matches
keyword queries with literals, uses the graph structure to nd connected artworks
and clusters similar objects together [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Literals in the database are assigned a
matching score according to a ranking function. Literals with a score above a
speci ed threshold are used as starting point for backward graph traversal. The
graph is traversed in a backward fashion until a speci ed class of resource is
reached, in this case edm:ProvidedCho. Objects with similar paths are clustered
together.
5 http://github.com/rasvaan/naturalis
      </p>
      <p>Users can use this search functionality to explore the collection and nd
artworks to annotate. We adapted the algorithm to interpret the added annotation
as subject matter metadata, which allows the user to directly inspect the result
of their e orts in the search results. The demo in its initial state (without user
contributed content), supports exploration based on the metadata provided by
the Rijksmuseum Amsterdam.</p>
      <p>The presented clusters are generated based on paths in the graph. These
paths can be based on a direct link between a literal and artworks, but also
longer paths are used. When possible properties are abstracted to their (SKOS)
root properties. The recourses used in paths are abstracted to their class. Below
three examples of paths can be found:
1) Literal ! title ! Artworks
2) Literal ! subject ! Owls ! broader ! Birds ! subject ! Artworks
3) Literal ! prefLabel ! Artist ! teacherOf ! Artist ! creator ! Artworks
These examples illustrate the characteristics of the dataset and vocabularies. The
rst example includes results based on metadata in the collection. The second
example generalises the results based on links in Iconclass. The third example
uses links within the ULAN vocabulary to cluster results.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Future Work</title>
      <p>The INVENiT demo uses the semantics in structured vocabularies to diversify
and cluster results. Users can make new connections by annotating collection
objects with terms from structured vocabularies. We believe that providing users
with the possibility to nd the objects they like to annotate and directly inspect
the results of their e orts will have a positive e ect on their motivation.</p>
      <p>
        Currently all annotations are accepted and incorporated in the system. This
is not something a museum would allow, since unknowledgeable or malicious
users might add incorrect information. We therefore plan to incorporate trust
assessment in current work, providing an indication whether an annotation is
trustworthy or not, for example based on the assessed expertise of an
annotator [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>There are still open issues to address regarding the clustering of search
results based on paths in the graph. The relevance of the results in a cluster are
in uenced by the path used to retrieve them. At this moment the clusters are
ranked according to the number of results within the cluster. We plan on
improving this, since the meaningfulness of a path depends on the perception of
the user.</p>
      <p>The clusters of results are named by the paths used to create them. These
paths are hard to interpret for a user. Take for example Literal ! title !
Artworks. This could be translated into the name \works titled". Especially
longer paths are di cult to concisely describe. Automatically generating more
user-friendly names is a problem we want to address in future work.
Acknowledgements. This publication is supported by the Dutch national
program COMMIT/. We like to thank the members of the SEALINCMedia
worktable and in particular the Rijksmuseum for their support.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ceolin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nottamkandath</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fokkink</surname>
          </string-name>
          , W.:
          <article-title>E cient semi-automated assessment of annotations trustworthiness</article-title>
          .
          <source>Journal of Trust Management</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <volume>3</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Isaac</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haslhofer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Europeana linked open data { data.europeana.eu</article-title>
          .
          <source>Semantic Web Journal</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Szekely</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knoblock</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fink</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allen</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodlander</surname>
          </string-name>
          , G.:
          <article-title>Connecting the smithsonian american art museum to the linked data cloud</article-title>
          .
          <source>In: The Semantic Web: Semantics and Big Data</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Wielemaker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hildebrand</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>van Ossenbruggen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schreiber</surname>
          </string-name>
          , G.:
          <article-title>ThesaurusBased Search in Large Heterogeneous Collections</article-title>
          .
          <source>In: ISWC2008</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>