<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic Excavation of the City of Books</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anna Tordai</string-name>
          <email>atordai@cs.vu.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Borys Omelayenko</string-name>
          <email>b.omelayenko@cs.vu.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guus Schreiber</string-name>
          <email>schreiber@cs.vu.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>VU University</institution>
          ,
          <addr-line>1081a De Boelelaan, Amsterdam</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>As the Semantic Web gains momentum, so grows the interest in making knowledge kept in various repositories available. In this paper we describe a case study using a methodological approach for porting cultural repositories to the Semantic Web. The approach consists of thesaurus conversion, meta-data schema mapping, meta-data value mapping, and thesauri alignment. It is derived from our experience collected in a number of conversions we have performed for the E-Culture project, and in this paper we apply it to a collection of data about images related to book printing.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>For the representation of thesauri the project uses the SKOS
Core Schema8. It was designed to support vocabulary
interoperability and is currently undergoing standardization by
the World-Wide Web Consortium (W3C). SKOS has already
been adopted by large organizations such as NASA.
This paper is organized as follows. We discuss related work
in Section 2. We present our approach in Section 3
followed by a short presentation of the Bibliopolis data in
Section 4. Next, we devote four sections to describe the case
study based on the following four activities: thesaurus
con5Acronym for Stichting Volkenkundige Collectie Nederland
http://www.svcn.nl/thesaurus.asp
6http://www.bibliopolis.nl/
7http://www.vraweb.org/
8http://www.w3.org/TR/swbp-skos-core-guide/
version, metadata schema mapping, metadata mapping and
thesaurus alignment. Finally, we conclude this paper with a
discussion in Section 9.</p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
      <p>In the area of thesaurus conversion Miles et. al. [3] propose
guidelines for migrating thesauri to the Semantic Web using
the SKOS Core schema. They distinguish between standard
and non-standard thesauri, and propose to preserve all
information in the thesaurus by using sub-class and sub-property
statements where necessary.</p>
      <p>The work of Van Assem et. al. [6] is based on these
guidelines, and they propose a three step method consisting of the
analysis of the thesaurus, mapping to the SKOS schema and
the creation of the conversion program. The case studies do
show however that non standard thesauri are more difficult
to convert completely as some features cannot be mapped
to the SKOS schema.</p>
      <p>The problem of interoperability between two collections has
been discussed in [1]. Within the SIMILE project Butler
et.al. report on the conversion and linkage of a visual works
dataset and learning object dataset using XSLT. The first
dataset was converted using the VRA schema and the
second using Dublin Core, although non standard properties
were created as extensions. Issues discussed range from the
creation of URIs to dealing with hierarchical terms.
In [2] Hyvo¨nen et. al. describe the MuseumFinland project
encompassing multiple collections and ontologies. The
collections of various Finnish museums and additional
ontologies were converted into RDF/OWL. The metadata of the
collections was transformed using a common term ontology,
while the additional ontologies form an additional semantic
link between the collections and were further enhanced by
manual editing and enrichment.</p>
    </sec>
    <sec id="sec-3">
      <title>3. APPROACH</title>
      <p>The process developed within the E-Culture project for
converting datasets to an interoperable Semantic Web format
was presented in [5]. Once again, our goal is syntactic and
semantic integration of data. In achieving this goal we are
driven by the practical needs of the E-culture project: the
need to integrate multiple collections. Accordingly, we
follow a practical bottom-up approach where we enrich
realworld data with a thin layer of semantics to achieve
interoperability. This approach may be seen as an alternative
to the top-down approach that is very common in the
Semantic Web community. With the top-down approach we
would first need to develop a conceptual model of the
cultural heritage world in order to be able to perform semantic
enrichment of the data. This ontology development effort
has not been started yet and such efforts would take
several years to be finished. However, there are a number of
thesauri available at the moment which are widely used by
the cultural communities. In our approach we perform
syntactic integration and take the first step towards semantic
integration by performing terminological integration. The
task of integrating collections and vocabularies from both
a structural and terminological perspective has evolved into
four activities which are summarized in Fig. 1:</p>
      <sec id="sec-3-1">
        <title>Thesaurus</title>
        <p>schema mapping</p>
      </sec>
      <sec id="sec-3-2">
        <title>Metadata</title>
        <p>schema mapping</p>
      </sec>
      <sec id="sec-3-3">
        <title>Metadata</title>
        <p>mapping</p>
      </sec>
      <sec id="sec-3-4">
        <title>Thesaurus</title>
        <p>alignment</p>
        <p>• Thesaurus conversion, including thesaurus schema
mapping. This step is a relatively well-researched area,
e.g. [6], with SKOS being the default option for
thesaurus schema.
• Metadata schema mapping. Here we are looking at
generic schemas like Dublin Core and its
specializations to the cultural domain, such as VRA.
• Metadata conversion. At this step the data values are
converted and looked up in the local thesaurus or
external vocabularies using information extraction
techniques. Data interpretation is also common here,
especially for data that does not directly fit the standard
vocabularies.
• Thesaurus alignment. Here we align the thesaurus to
external (standard) vocabularies with ontology
alignment techniques.</p>
        <p>Structural integration is performed during thesaurus schema
mapping for vocabularies, and metadata schema mapping
for collections. The terminological integration performed
during metadata mapping and thesaurus alignment is
dependent on the schema mapping activities, which we denote
with vertical arrows. As vocabularies tend to be used in
collection metadata making this link explicit is part of the
semantic enrichment process. Collection metadata in turn
may contain implicit vocabularies hidden in data values that
are candidates for thesaurus alignment.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. BIBLIOPOLIS DATA</title>
      <p>The Bibliopolis data from the Koninklijke Bibliotheek (KB),
the National Library of the Netherlands, consists of two
XML files: collection and thesaurus. The collection file
contains the metadata of 1,645 images related to the printing of
books and book illustrations. The thesaurus contains 1,033
terms used as keywords for indexing images. These two files
are a part of the Bibliopolis website. Both the thesaurus
and the metadata are bilingual (English and Dutch).
Thesaurus. The thesaurus contains core terms, augmented
with their synonyms in plural, and variants of these terms
&lt;inm:Record&gt;
&lt;inm:NUM&gt;2&lt;/inm:NUM&gt;
&lt;inm:TWOND&gt;academiedrukkers&lt;/inm:TWOND&gt;
&lt;inm:TWVAR&gt;academiedrukker&lt;/inm:TWVAR&gt;
&lt;inm:TWVAR&gt;universiteitsdrukker&lt;/inm:TWVAR&gt;
&lt;inm:DEF&gt;aan een universiteit verbonden...&lt;/inm:DEF&gt;
&lt;inm:TWRT&gt;academische geschriften&lt;/inm:TWRT&gt;
&lt;inm:TWRT&gt;overheidsdrukkers&lt;/inm:TWRT&gt;
&lt;inm:ENG&gt;university printer&lt;/inm:ENG&gt;
&lt;inm:INVOERDER&gt;emo&lt;/inm:INVOERDER&gt;
&lt;inm:INVDAT&gt;12/13/01&lt;/inm:INVDAT&gt;
&lt;inm:TWSYN&gt;universiteitsdrukkers&lt;/inm:TWSYN&gt;
&lt;inm:TWBT&gt;drukkers&lt;/inm:TWBT&gt;
&lt;inm:TWNT/&gt;
&lt;inm:TWOND_EN&gt;university printers&lt;/inm:TWOND_EN&gt;
&lt;inm:TWVAR_EN&gt;university printer&lt;/inm:TWVAR_EN&gt;
&lt;inm:TWVAR_EN&gt;academy printer&lt;/inm:TWVAR_EN&gt;
&lt;inm:TWVAR_EN&gt;academic printer&lt;/inm:TWVAR_EN&gt;
&lt;inm:DEF_EN&gt;a printer appointed by...&lt;/inm:DEF_EN&gt;
&lt;inm:TWSYN_EN&gt;academy printers&lt;/inm:TWSYN_EN&gt;
&lt;inm:TWSYN_EN&gt;academic printers&lt;/inm:TWSYN_EN&gt;
&lt;/inm:Record&gt;</p>
      <p>Thesaurus record for term
University
in singular along with a descriptive note. Each record may
also contain related, broader and narrower terms.
Additionally, it contains some administrative data: initials of the
record creator, the date of entry, and the date of
modification. A sample XML element for the term university
printer is shown in Fig. 2.</p>
      <p>Metadata. The metadata forms the description of images
related to book printing. The data consists of titles and
descriptions of the objects, names of their creator(s) with
signatures of their roles, such as a for author. The works
are also classified according to the technique used, their
type, and a library classification of the subject matter. The
metadata includes copyright information, measurements and
other administrative information. An example collection
object plus corresponding metadata is shown in Fig. 3.</p>
    </sec>
    <sec id="sec-5">
      <title>5. THESAURUS CONVERSION</title>
      <p>Thesaurus schema mapping and conversion is a relatively
well-researched area. In our work we used the method for
thesauri conversion proposed by van Assem [6]. As for the
thesaurus schema, we use SKOS within the E-Culture
project.</p>
      <p>Mapping the Bibliopolis thesaurus turned out to be
relatively straightforward as it fit the SKOS template. Table 1
shows the details of the mapping of the thesaurus
representation in Fig. 2 to SKOS. Two XML elements were not
converted, as they contained bookkeeping information and
are not meant for public consumption. One XML element
(see last column in the table) turned out to be a duplicate
piece of information and was therefore omitted. It should be
noted that this conversion was guided by the requirements
of the project which does not include complete conversion
of the data.</p>
      <p>The creation of the URI deserves special mention. When
creating a URI we derive it from the real term identifier
followed by the disambiguation signature and the thesaurus
version. For example, in the Bibliopolis case the real
identifiers are stored in field TWOND (and not NUM that contains
c Koninklijke Bibliotheek (http://www.kb.nl/)
Den Haag, Koninklijke Bibliotheek, 169 E 56
&lt;inm:Record&gt;
&lt;inm:NUMMER&gt;6&lt;/inm:NUMMER&gt;
&lt;inm:TITEL&gt;Delftse Bijbel...&lt;/inm:TITEL&gt;
&lt;inm:TITEL_EN&gt;Delft Bible...&lt;/inm:TITEL_EN&gt;
&lt;inm:MAKER&gt;Yemantszoon, Mauricius : d&lt;/inm:MAKER&gt;
&lt;inm:OBJECT&gt;tekstbladzijde&lt;/inm:OBJECT&gt;
&lt;inm:TECHNIEK&gt;boekdruk&lt;/inm:TECHNIEK&gt;
&lt;inm:DATERING&gt;10 jan. 1477&lt;/inm:DATERING&gt;
&lt;inm:CLASSIFICATIE&gt;D&lt;/inm:CLASSIFICATIE&gt;
&lt;inm:ORIGINEEL&gt;Bijbel. Oude</p>
      <p>Testament...&lt;/inm:ORIGINEEL&gt;
&lt;/inm:REPRODUCTIE&gt;
&lt;inm:TWNAAM/&gt;
&lt;inm:TWOND&gt;typografische vormgeving&lt;/inm:TWOND&gt;
&lt;inm:TWOND&gt;bijbels&lt;/inm:TWOND&gt;
&lt;inm:TWGEO&gt;Delft&lt;/inm:TWGEO&gt;
&lt;inm:OMSCHRIJVING&gt;Eerste bijbel die in het</p>
      <p>Nederlands verscheen...&lt;/inm:OMSCHRIJVING&gt;
&lt;inm:OMSCHRIJVING_EN&gt;The first Bible to
appear in the Dutch language...&lt;/inm:OMSCHRIJVING_EN&gt;
&lt;inm:AFMETINGEN&gt;27 x 20 cm&lt;/inm:AFMETINGEN&gt;
...
&lt;/inm:Record&gt;
a file-specific index rather than the real term identifier), they
are unambiguous, and we have a single version.</p>
    </sec>
    <sec id="sec-6">
      <title>6. METADATA SCHEMA MAPPING</title>
      <p>In this activity we map the original record fields (see Fig. 3)
to a metadata schema. In the E-Culture project we use the
VRA Core scheme which is a specialization of Dublin Core9
for visual resources (our target type of resources).
Before mapping to the schema we analyze the metadata
(including examination of any additional documentation,
websites, and interviews with experts). The meaning of the
fields needs to be understood to find a correct
correspondence within the target schema. The first impression of the
meaning of a field might be misleading. For example, the
TWGEO field was initially mapped to vra:location, i.e., the
DC/VRA element indicating where the work was created.
However, the documentation showed that the field actually
gives information about the location related to the subject,
and not the creation place. We finally used the VRA Core v4
element vra:subject.geographicPlace, which gives the correct
interpretation. This element is a subproperty of DC/VRA
subject.</p>
      <p>An important additional consideration is that certain records
or fields may contain confidential or administrative
information such as acquisition or bookkeeping information. For
example, the amount for which an object is insured should
not be publicly visible. This situation did not occur with
the Bibliopolis data.
9http://dublincore.org/
Table 2 shows an overview of the mapping from the XML
record fields to a VRA metadata schema with examples.
Here we face two situations. First, in the simplest case,
there is a exact semantic match between an original field
and a VRA field. Second, if this is not the case, the field
should be specified as a specialization of an existing VRA
element. In the Bibliopolis case this occurs with the
ORIGINAL10, REPRODUCTION and CLASSIFICATION fields. The
first two are specific “titles”, the third one is a specific
“subject” description. In Table 2 we see that the RDF/OWL
specification contains property definitions in the Bibliopolis
namespace (bp:) paired with a statement about the
subproperty relationship with a VRA element.</p>
      <p>One field requires some deeper study. The MAKER field not
only contains the creator of the work, but also a character
indicating the role that the person played in creating the work.
As shown in the example record in Fig. 3 the MAKER field
has the value Yemantszoon, Mauricius : d, where “d” stands
for “drukker”, Dutch for “printer”. To preserve the roles of
the creators we specialize the VRA property vra:creator with
the properties that correspond to the roles found in the
Bibliopolis data. This resulted in a set of RDF/OWL definitions
such as:
bp:drukker rdfs:subPropertyOf vra:creator
bp:origineel rdfs:subPropertyOf vra:title
bp:reproductie rdfs:subPropertyOf vra:title
bp:classificatie rdfs:subPropertyOf vra:subject
10For readability we use the English in the text, in cases
where it is close to the Dutch equivalent (“original” vs.
“origineel”)
(The example uses the RDF N3 notation).</p>
      <p>Dublin Core has excellent general coverage. In all collections
we tackled sofar, we were able to find for each field a Dublin
Core / VRA which was either an equivalent, or could act as
superproperty of a local specialization. This characteristic
makes Dublin Core a powerful tool for metadata
interoperability.</p>
    </sec>
    <sec id="sec-7">
      <title>7. METADATA VALUE CONVERSION</title>
      <p>After the schema is created the data values of the fields
have to be converted. As discussed in [5] we have two kinds
of fields: those that contain free-text literal values, such
as a description field, and those that contain values from
(implicit) vocabularies, such as the fields for keywords or
geographic places. In the latter case we distinguish between
three kinds of vocabularies to which the field value can be
converted:</p>
      <sec id="sec-7-1">
        <title>1. A local vocabulary.</title>
        <p>2. A vocabulary that is implicitly present in the field
values.</p>
      </sec>
      <sec id="sec-7-2">
        <title>3. Terms that may belong to a vocabulary. In the Bibliopolis dataset we had the following situations for metadata value mappings:</title>
        <p>Converting to a local vocabulary concept. Option 1 is
exemplified by the values of the field TWOND which
represent thesaurus concepts. This relationship is explicitly
present in the source data and is preserved during the
metadata value conversion. We create the RDF/OWL
representations and use the corresponding URIs of these entries in
the Bibliopolis thesaurus. Once again, these URIs are
composed of text as the records refer to the (unique) Dutch text
label of the concept and not to the concept identifier. This
is relevant information for the choice of the URI naming
scheme for vocabulary concepts (cf. Section 5).</p>
        <p>Converting to an implied vocabulary concept. In this
case we map field values to resources which form new
vocabularies implicitly present in the data. In the Bibliopolis
data there were two fields whose values formed an implicit
vocabulary.</p>
        <p>In Table 2 we see the value “D” in the field CLASSIFICATIE.
Further analysis revealed that these single-letter values
actually represent a small vocabulary for library-type
classifications of the subject. This information is not part of the XML
data, but is only shown on the website of Bibliopolis. This
classification vocabulary has also some broader/narrower
relations. We represented this vocabulary using the SKOS
template and mapped the field values to concepts from this
vocabulary.</p>
        <p>The RDF example in Fig. 4 shows the SKOS specification
of a subset of such classification subjects, including the D
concept. The M concept (“secondary subjects”) has a
hierarchical substructure.
bp:A rdf:type skos:concept .
bp:A skos:prefLabel @en</p>
        <p>"General works" .
bp:D rdf:type skos:concept .
bp:D skos:prefLabel @en</p>
        <p>‘‘History of the art of printing" .
bp:M rdf:type skos:concept .
bp:M skos:prefLabel @en</p>
        <p>"Secondary subjects" .
bp:M1 rdf:type skos:concept .
bp:M1 skos:prefLabel @en
"Philosophy, psychology" ;
skos:broader bp:M
bp:M4 rdf:type skos:concept .
bp:M4 skos:prefLabel @en
"language and literature" ;
skos:broader bp:M .
bp:M41 rdf:type skos:concept .
bp:M41 skos:prefLabel @en "English" ;</p>
        <p>skos:broader bp:M4 .
bp:M41 rdf:type skos:concept .
bp:M41 skos:prefLabel @en "German" ;</p>
        <p>skos:broader bp:M4 .
The other implicit vocabulary present within the data is that
of roles. The field MAKER contains the name of the creator
along with its role (eg: Yemantszoon, Mauricius : d where d
stands for printer) which is one of the 14 roles. We create
RDF representations of these terms as SKOS concepts.
Converting into a typed resource. Again, we create new
RDF resources from field values that are potentially part of
some vocabulary. We create a unique URI by adding the
field name to the field value. For example, for values of
the field TECHNIQUE this results in &amp;bp;techniek_boekdruk,
which is part of the bp: namespace. The reason for this is
that the values of TECHNIQUE and OBJECT sometimes
coincide, for example, foto is a technique as well as an object
type. This vocabulary can be an existing standard
vocabulary such as the AAT in which case an alignment between
the new resource and the vocabulary has to be performed. In
the Bibliopolis data a number of values of the fields
TECHNIQUE, OBJECT and TWGEO can be aligned to the AAT
and TGN. There were a small number of unmapped values
of field TECHNIQUE (13) and of field OBJECT (5) as can
be seen in Table 3. These terms can be added to the AAT
by extending it. The alignment and extension is further
discussed in Section 8.</p>
        <p>We also create resources from field values where the
vocabulary the values belong to is unknown or the mapping is
not performed. This allows for the option of creating future
semantic extensions, although as a result we have a number
of resources we do not use. In general, these may be names
of organizations or persons, places, cultures or historical
periods. In Bibliopolis the values of MAKER and TWNAAM
contain person names. These names can possibly be linked
to the ULAN vocabulary. We create resources out of these
NUMMER
TITEL
TITEL EN
names with URIs in the bp: namespace removing invalid
characters and spaces. The concepts are of type ulan:person
and the human readable label contains the name.
Converting to a literal. Finally, pieces of text such as titles
and descriptions are converted to literals. In Bibliopolis the
values of TITLE and DESCRIPTION fields were converted
into literals with language tags as the title and description
of works is both in English and in Dutch.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>THESAURUS ALIGNMENT</title>
      <p>The local thesaurus and the resources containing techniques,
object types and locations extracted from the data during
the metadata conversion process need to be aligned with
standard vocabularies.</p>
      <p>We aligned the Bibliopolis thesaurus to AAT by
syntactically matching the Dutch skos:prefLabel to the Dutch
translation of AAT preferred terms and mapped 209 concepts out
of 1033 as presented in Table 3.</p>
      <p>Then, we need to identify the relation between the matched
terms. The OWL owl:sameAs relation is typically an
overSource Data</p>
      <p>Vocabulary</p>
      <p>Terms
Mapped Total</p>
      <p>Instances
Mapped Total
Thesaurus
Metadata
technique
Metadata
object type
Metadata
subject
place</p>
      <p>AAT
AAT
AAT
TGN
209
15
14
32
1033
28
19
69
1332
978
349
1468
1507
480
statement that we try to avoid, as ambiguity is quite
common. The SKOS Mapping Vocabulary specification11 was
created for the purpose of linking thesauri to each other. It
specifies relationships such as skos:exactMatch,
skos:broadMatch, skos:narrowMatch and more for aligning
vocabularies. For this alignment the mappings are still based on the
lexical match of term labels, that corresponds to the relation
skos:exactMatch.
11http://www.w3.org/2004/02/skos/mapping/spec/
The field TWGEO contains geographic names which were
mapped to TGN. As the values of this field are in Dutch
we extended TGN by adding the Dutch label terms to the
proper concept. For example, the value Parijs is the dutch
label of Paris in TGN. Such extensions had to be performed
manually, while the mapping of values to cities in the
Netherlands could be performed automatically as the labels in
TGN contain the Dutch language version. We used
syntactic matching for finding appropriate mappings along with
some additional techniques to reduce ambiguity, such as
restricting the search to cities instead of provinces and the
use of background knowledge like the vernacular names of
cities. We only automatically mapped unambiguous terms,
manually mapping ambiguous terms. Background
knowledge of the collection data helped in solving ambiguity as it
restricted the places the data could be associated to.
The values of the fields TECHNIQUE and OBJECT were also
aligned with AAT using syntactic matching and once more
use skos:exactMatch relation. As can be seen in Table 3, a
number of terms were not mapped. We extend the AAT
by adding the leftover terms to some part of the
vocabulary if possible. For instance, the technique boekdruk (book
printing) is not part of AAT but is a special kind of printing
technique, therefore the AAT concept printing is selected as
broader term. We use the SKOS template to represent the
extension.</p>
      <p>From Table 3 we can see that a large number of resources
are created without being linked to vocabularies. Such
resources might be seen as an unnecessary overhead but they
can be used in the future when new vocabularies are added
or mapped manually. Almost 80 percent of the thesaurus
terms were not mapped to AAT and while a number of terms
could be linked with skos:broadMatch, this would require
additional manual work which could take up a significant
amount of time while yielding few matches. This is not the
case for the values of TECHNIQUE, OBJECT and TWGEO
fields where by manually aligning 13, 5 and 37 terms
respectively would yield complete alignments. For OBJECT
linking 5 terms would yield an alignment of another 500
occurrences of the term in the metadata which is one third of
the total occurrences and well worth the manual effort.</p>
    </sec>
    <sec id="sec-9">
      <title>9. DISCUSSION</title>
      <p>Interoperability is becoming one of the key issues in the
open Web world. Many research programs, such as the
IST program of the EU, have interoperability high on the
agenda. However, real interoperability between collections
is still scarce. Until now, many approaches have focused on
interoperability as a problem between two collections.
In this paper we take a different approach. We assume a
multitude of collections will become part of the
interoperable space; the activities we present can to a large extent be
carried out by studying an individual collection. Mapping
to existing other vocabularies requires knowledge of other
components, but there is no need for these to be complete.
For vocabulary alignment the adage “a little semantics goes
a long way”12 holds. Also, one should not view this as a
oneshot thing. Metadata and vocabularies change, so extensions
12quote from J. Hendler
will take place at regular intervals in time. This also means
that tool support should be in place to support this process,
allowing updates to be generated semi-automatically,
similar to the AnnoCultor13 that is being currently developed
within the E-Culture project.</p>
      <p>For the E-Culture virtual collection we have now carried
out this process a number of times. This paper should be
viewed as a post-hoc rationalization of this work. Our goal
is to provide a set of methods and tools that allow
collection owners (museums, archives) to carry out this process.
Cultural-heritage institutions are now often bound to closed
content management systems; the “three-O” paradigm (open
access, open data, open standards) is gaining support, but
we have to provide the owners of collections with the
necessary support facilities.</p>
      <p>We see two potential weaknesses of this work. Firstly, our
process still requires much more tool support. In particular
for vocabulary alignment we need to explore how existing
tools, such as the ones participating in the OAEI contest,
perform on this data set. Our current work is still to much
based on manual work and only uses simple syntactic tools.
Secondly, the use of Dublin Core as “top-level ontology” for
the structure as metadata can also be perceived as a risk.
What if the collection has metadata fields that fit with none
of the DC elements? However, this was not a problem in
either of these six collections. For the moment it seems Dublin
Core is indeed a key resource in information interoperability.
However, it is a challenge to construct reasoners that make
use of the collection-specific specializations.</p>
      <p>This article does not show the actual added value of the
converted collection content. For this the readers are
encouraged to visit the E-Culture online demonstrator, which
contains the Bibliopolis data.
10. ACKNOWLEDGMENTS
We are grateful to our colleagues from the Multimedian
E-Culture team: Alia Amin, Lora Aroyo, Victor de Boer,
Lynda Hardman, Michiel Hildebrand, Marco de Niet,
Annelies van Nispen, Marie France van Orsouw, Jacco van
Ossenbruggen, Annemiek Teesing, Jan Wielemaker and Bob
Wielinga. We would also like to thank Mark van Assem
for his input. The project is a collaboration between the
Free University Amsterdam, the Centre of Mathematics and
Computer Science (CWI), the University of Amsterdam,
Digital Heritage Netherlands (DEN) and the Netherlands
Institute for Cultural Heritage (ICN). The MultimediaN
project is funded through the BSIK programme of the Dutch
government.</p>
      <p>We are especially thankful to Marieke van Delft of the
Koninklijke Bibliotheek (National library of the Netherlands) for
her cooperation in the Bibliopolis case.
11. REFERENCES
[1] M. H. Butler, J. Gilbert, A. Seaborne, and</p>
      <p>K. Smathers. Data conversion, extraction and record
linkage using xml and rdf tools in project simile.
13http://annocultor.sourceforge.net/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>and HP</given-names>
            <surname>Laboratories</surname>
          </string-name>
          ,
          <year>August 2004</year>
          . [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvo</surname>
          </string-name>
          <article-title>¨nen</article-title>
          , E. Ma¨kela¨,
          <string-name>
            <given-names>M.</given-names>
            <surname>Salminen</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Valo,
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>World Wide Web</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          -3):
          <fpage>224</fpage>
          -
          <lpage>241</lpage>
          ,
          <year>October 2005</year>
          . [3]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Miles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rogers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Beckett</surname>
          </string-name>
          . Migrating
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>studies for generating rdf encodings of existing thesauri</article-title>
          . [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Schreiber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Amin</surname>
          </string-name>
          , M. van
          <string-name>
            <surname>Assem</surname>
          </string-name>
          , V. de Boer,
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Conference</surname>
          </string-name>
          , volume
          <volume>4273</volume>
          of Lecture Notes in Computer
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Science</surname>
          </string-name>
          , pages
          <fpage>951</fpage>
          -
          <lpage>958</lpage>
          . Springer,
          <year>2006</year>
          . [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tordai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Omelayenko</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Schreiber.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>e-culture application</article-title>
          .
          <year>2007</year>
          . [6]
          <string-name>
            <surname>M. van Assem</surname>
            , V. Malais´e,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Miles</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Schreiber.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          J. Domingue, editors,
          <source>ESWC</source>
          , volume
          <volume>4011</volume>
          of Lecture
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          Notes in Computer Science, pages
          <fpage>95</fpage>
          -
          <lpage>109</lpage>
          . Springer,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>