<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Linking subject labels in Cultural Heritage Metadata to MIMO vocabulary using CultuurLink</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hugo Manguinhas</string-name>
          <email>hugo.manguinhas@europeana.eu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentine Charles</string-name>
          <email>valentine.charles@europeana.eu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antoine Isaac</string-name>
          <email>antoine.isaac@europeana.eu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tom Miles</string-name>
          <email>tom.miles@bl.uk</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aude Lima</string-name>
          <email>aude.da-cruz-lima@mae.u-paris10.fr</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ar- iane Néroulidis</string-name>
          <email>ariane.neroulidis@gmail.com</email>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Véronique Ginouvès</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dimitra Atsidis</string-name>
          <email>datsidis@beeldengeluid.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michiel Hildebrand</string-name>
          <email>michiel@spinque.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maar- ten Brinkerink</string-name>
          <email>mbrinkerink@beeldengeluid.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergiu Gordea</string-name>
          <email>sergiu.gordea@ait.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Austrian Institute of Technology</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Europeana Foundation</institution>
          ,
          <addr-line>The Hague</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Netherlands Institute for Sound and Vision</institution>
          ,
          <addr-line>Hilversum</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Spinque B.V.</institution>
          ,
          <addr-line>Utrecht</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>The British Library</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>The Centre de Recherche en Ethnomusicologie</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>The Maison Méditerranéenne des Sciences de l'Homme</institution>
          ,
          <addr-line>Aix-en-Provence</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Europeana Sounds1 project aims to increase the amount of cultural audio content in Europeana. It also strongly focuses on enriching the metadata records that are aggregated by Europeana. To provide metadata to Europeana, Data Providers are asked to convert their records from the format and model they use internally to a specific profile of the Europeana Data Model2 (EDM) for sound resources. These metadata include subjects, which typically use a vocabulary internal to each partner. The problem is that the values in subject fields come too often as simple literals (strings) that are specific to one (or a couple of) language(s) - the one(s) of the Data Provider. For Europeana to take full advantage of subjects from these vocabularies for purposes such as cross-lingual search, it is essential that they are connected with richer, multilingual data. A first solution to this problem is to semantically enrich metadata for individual cultural objects with links to concepts from a (multilingual) vocabulary (say, 'vocM'). Such new object-vocM links can be used to later provide more semantics and labels in multiple languages for search indexes or display functions. A second option is to perform alignment at the level of vocabularies, linking the elements of an original</p>
      </abstract>
      <kwd-group>
        <kwd>Vocabulary Alignment</kwd>
        <kwd>Metadata</kwd>
        <kwd>Cultural Heritage</kwd>
        <kwd>Europeana</kwd>
        <kwd>MIMO</kwd>
        <kwd>CultuurLink</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        (local) vocabulary (say, 'vocL') to semantically related elements from a richer
vocabulary, i.e., creating new vocL-vocM links that can be used to enhance the value of
existing object-vocL links. Both solutions present many challenges3 [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ]. In the cultural
sector, more experience needs to be gained, in order to determine their level of
feasibility for obtaining 'good enough' results, and answer basic questions as (1) can we find
good vocabularies? (2) can we identify suitable processes and tools? (3) how much
manual effort is needed and how much can it be automatized?
      </p>
      <p>This paper focuses on exploring the feasibility of vocabulary alignment. This task is
often deemed more successful when done manually by domain experts. However, it
becomes too labour intensive for large vocabularies. Some tools have proposed a
semiautomatic approach to make the task less labour-intensive yet still taking benefit from
the user expertise required to assert the right alignments. We conducted an experiment
with some Europeana Sounds Data Providers to use of a vocabulary alignment tool,
CultuurLink4, to identify alignments between the subject terms from local vocabularies
and a semantically richer target vocabulary.</p>
      <p>
        CultuurLINK5 is an agile vocabulary alignment tool developed by Spinque. It aids
the user in the process of identifying alignments between vocabularies, by seamlessly
combining both automatic and manual approaches. It is the successor of the Amalgame
framework [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] developed in the EuropeanaConnect project [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>We asked Data Providers that were contributing data to Europeana as part of WP1
of the Europeana Sounds project to select a collection of metadata records (about
sounds recordings, interviews, radio programmes) that could contain musical
instruments terms. The data being mapped using the Europeana Sounds EDM profile6, an
extension of the Europeana Data Model, which is itself heavily based on Dublin Core.
We considered only musical instruments terms within subject fields (dc:subject) as
instructed by collection owners. A total of 6 datasets containing a total of 10,406 metadata
records were obtained from the providers and evaluated in this experiment:
 The British Library (BL) participated with 3 collections: A selection of Asian
instruments (1,099 records) from the "Colin Huehns Asia Collection"7; a selection from
the “Peter Cooke Uganda Collection”8 (1,312 records); and the “Keith Summers
English Folk Music Collection”9 (1,326 records). All three collections were chosen
for their rich variety of different musical instruments from each region.
3 See special session of the 13th European NKOS Workshop:
https://atweb1.comp.glam.ac.uk/pages/research/hypermedia/nkos/nkos2014/programme.html
4 http://cultuurlink.beeldengeluid.nl/app/#
5 http://2015.semantics.cc/michiel-hildebrand
6
http://pro.europeana.eu/files/Europeana_Professional/EuropeanaTech/EuropeanaTech_taskforces/EDMSound//TF_Report_EDM_Profile_Sound_301214.pdf
7 http://sounds.bl.uk/World-and-traditional-music/Colin-Huehns-Pakistan
8 http://sounds.bl.uk/World-and-traditional-music/Peter-Cooke-Uganda
9 http://sounds.bl.uk/World-and-traditional-music/Keith-Summers-Collection
 The Centre de Recherche en Ethnomusicologie (CREM) participated with a test
collection10 of 25 records published in the CD “Musical Instruments of the World”
which shows a great variety of traditional instruments with generic terms in french
(from the 4 organological families of the SH classification) and corresponding
vernacular terms.
 The Maison Méditerranéenne des Sciences de l'Homme (MMSH) participated with
a collection of 25 records about folk music.
 The Netherlands Institute of Sound and Vision (NISV) participated with a collection
of 6,608 records (not available online) containing commercial 78 rpm records
(Handelsplaten) from different genres like light music, classical music and opera.</p>
      <p>
        As a significant number of terms within subject fields of the Europeana Sounds data
are related to musical instruments we chose the Musical Instruments Museums Online11
(MIMO) vocabulary, a reference vocabulary used in a previous Europeana-related
project12, as target vocabulary for our experiment following the recommendations made in
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The MIMO vocabulary is a multilingual controlled vocabulary of musical
instruments built to ensure consistency of classification for the musical instruments13. It is a
result of an alignment of a vernacular classification with the professional
“HornbostelSachs” classification14. The vocabulary has been built with English as pivot language,
and translations in seven other languages have been added after.
      </p>
      <p>The goals of our experiments were to:
 evaluate the use of a semi-automatic tool like CultuurLink for a concrete vocabulary
alignment case, and;
 assess the coverage of the MIMO vocabulary for enriching Europeana Sounds
datasets.</p>
      <p>We decided to focus on the vocabulary terms as they are used, i.e. present within the
subject fields of the metadata sent to Europeana. We chose to do this, as opposed to
aligning the full vocabulary used by the providing institution, since:
 these were not available for use outside the organization and/or in a data structure
that suits a vocabulary alignment tool (e.g. SKOS), and furthermore, we did not have
the opportunity or the resources to develop an export to SKOS for each vocabulary;
and,
 we preferred to report on alignments for the subjects used in the source datasets and
not on all possible subjects.</p>
      <p>We asked the providing institutions to design and apply alignment strategies in
cultuurLink and then evaluate the alignments (i.e., validate the links and assign them a
10 http://archives.crem-cnrs.fr/archives/collections/CNRSMH_E_1990_014_001/
11 http://www.mimo-international.com/MIMO/
12 http://pro.europeana.eu/project/mimo
13 http://www.mimo-db.eu/InstrumentsKeywords/
14 http://www.mimo-international.com/documents/Hornbostel%2520Sachs.pdf
type of SKOS mapping link). Once all the participants had finished their task we
collected the alignment results and summarized the findings.</p>
      <p>In general, the Data Providers found that applying a simple matching technique
using just an exact (using equals comparison) string matching of preferred labels on
source and target (i.e. the initial strategy we had created), was enough to identify more
than 50% (reaching 80% in some cases) of all possible alignments for musical
instruments. When using this strategy also incorrect alignments were identified due to
polysemy reasons (e.g. “ban” or “zang” which means singing or song was a candidate match
to the instrument “zang”, a sort of cymbals or clapper bells). They were able to use the
tool to discard it by manually confirming the ones they were interested on. All Data
Providers also tried more elaborate strategies to discover the possible remaining
alignments, typically by using a less restrictive string matching function.</p>
      <p>There was a consensus from the Data Providers that the experiment was successful
and they were able to understand and work with the vocabulary alignment tool with
good level of success.</p>
      <p>In our presentation, we will report in more details on the most notable achievements
and findings from our experiment.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Isaac</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manguinhas</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stiller</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Charles</surname>
          </string-name>
          , V.
          <source>: Report on Enrichment and Evaluation</source>
          . The Hague,
          <string-name>
            <surname>Netherlands</surname>
          </string-name>
          (
          <year>2015</year>
          ), http://pro.europeana.eu/taskforce/evaluation-and
          <article-title>-enrichments.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Isaac</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manguinhas</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Charles</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stiller</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al:
          <article-title>Comparative evaluation of semantic enrichments</article-title>
          .
          <source>Technical report</source>
          (
          <year>2015</year>
          ). Report available at http://pro.europeana.eu/taskforce/evaluation-and
          <article-title>-enrichments</article-title>
          . Data archive available at: https://www.assembla.com/spaces/europeana-r-d/documents?folder=
          <fpage>58725383</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ossenbruggen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hildebrand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boer</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Interactive vocabulary alignment</article-title>
          .
          <source>In: Proc. 15th International Conference on Theory and Practice of Digital Libraries</source>
          , pp.
          <fpage>296</fpage>
          -
          <lpage>307</lpage>
          . ACM (
          <year>2011</year>
          ). http://semanticweb.cs.vu.nl/lod/tpdl2011/paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Wielemaker</surname>
          </string-name>
          , V. de Boer,
          <string-name>
            <given-names>A.</given-names>
            <surname>Isaac</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. van Ossenbruggen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hildebrand</surname>
          </string-name>
          , G. Schreiber,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hennicke</surname>
          </string-name>
          .
          <article-title>Semantic workflow tool available</article-title>
          .
          <source>EuropeanaConnect Deliverable D1.3.1. October</source>
          <year>2011</year>
          . http://pro.europeana.eu/files/Europeana_Professional/Projects/Project_list/EuropeanaConnect/Deliverables/ECONNECT-D1.
          <article-title>3.1- Semantic%20Workflow%20Automation%20Method%20Implementation</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Isaac</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manguinhas</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Charles</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stiller</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al:
          <article-title>Selecting target datasets for semantic enrichment</article-title>
          .
          <source>Technical report</source>
          (
          <year>2015</year>
          ). Report available at http://pro.europeana.eu/taskforce/evaluation-and
          <article-title>-enrichments.</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>