=Paper=
{{Paper
|id=Vol-233/paper-12
|storemode=property
|title=Semantically Exposing Existing Knowledge Repositories: A Case Study in Cultural Heritage
|pdfUrl=https://ceur-ws.org/Vol-233/p25.pdf
|volume=Vol-233
|dblpUrl=https://dblp.org/rec/conf/samt/PitzalisSLALHLMSPARS06
}}
==Semantically Exposing Existing Knowledge Repositories: A Case Study in Cultural Heritage==
Semantically Exposing Existing Knowledge
Repositories: A Case Study in Cultural Heritage
Denis Pitzalis, Patrick Sinclair, Christian Lahanier, Matthew Addis, Richard Lowe, Shahbaz Hafeez, Paul Lewis,
Kirk Martinez, mc schraefel, Ruven Pillay, Geneviève Aitken, Alistair Russell and Daniel A. Smith
Abstract— In this paper we describe the practical implications all museums in France. This work requires the management
of semantically exposing a cultural heritage multimedia collection of huge quantities of different kinds of data. To organise
system (EROS) through a Search and Retrieve Web Service our digital library we developed the EROS system[1][2].
(SRW).
This system consists of a relational multilingual database
Index Terms— multimedia system, semantic web, cultural her- that allows us to organise different media: at the moment
itage over 250, 000 photographic and radiographic images, 10, 000
technical reports, 1, 000 3D objects, 200, 000 quantitative
I. INTRODUCTION chemical and physical analyses related to more than 60, 000
works of art are accessible in digital form. This heterogeneous
Semantic web technologies have the potential to greatly ben-
group of data is common in real world applications.
efit the Cultural Heritage (CH) domain. CH institutions, such
Semantic interoperability of CH digital libraries has been
as museums and photographic archives, are rich resources of
investigated in the SCULPTEUR[5] and eCHASE[6] projects
heterogeneous multimedia content, depicting people, objects,
by using a z39.50 search and retrieve web service (SRW[3])
events, places, etc. This material, along with any supporting
and by mapping legacy metadata schemas to the CIDOC
metadata, tends to be locked away in internal legacy systems,
Conceptual Reference Model (CRM[4]), an ontology for
and open interfaces to the collections are rarely provided.
describing the semantics of CH documentation. Additional
The use of semantic web technologies could make an
semantics are attached to the legacy database attributes in order
impact on several levels. Richer semantics can greatly improve
to more fully define their meaning in the context of the CRM
the information systems used by conservators, curators and
framework. The CRM mapped attributes are exposed through
historians by enhancing the retrieval and browsing facilities.
the SRW as a flat list that can be queried by using Common
Making the data available through semantic web services could
Query Language (CQL) expressions. The SRW publishes
provide opportunities for tackling complex research problems
the mapping information in XML through the SRW explain
in the CH domain.
operation. The SRW is able to dynamically map CQL queries
However, there are still barriers for applying semantic web
expressed in terms of the CRM mappings to the relevant legacy
technologies directly. Many CH institutions are tied in to
database fields (in our case using SQL against a relational
their commercial content management systems. There are also
database) and return the results as XML structured according
high costs in converting and mapping all of their existing
to the CRM mappings.
material to semantic representations such as RDF. Although
Our SRW implementation is available as open source in the
some of the technical issues such as triple store scalability are
form of OpenMKS (http://openmks.sourceforge.net), which
being overcome, many still have doubts about the applicability
provides an SRW implementation that allows relational data
of semantic web technologies in practice. Alternatives that
to be mapped to an XML representation. It also provides a
bring semantics to traditional content management systems are
web-based user interface to the SRW that allows end users
desirable in this context.
to search and browse the content. Through the configuration
system we were able to adapt the system to the EROS content
II. CASE STUDY and metadata within C2RMF.
The C2RMF is the Research and Restoration Centre of mSpace[7] is an interaction model and software framework
French Museum located in the Louvre. It’s mission is to to help people access and explore information. mSpace helps
analyse, restore and document the works of art kept within people build knowledge from exploring relationships in data.
mSpace does this by offering several powerful tools for
D. Pitzalis, C. Lahanier, R. Pillay and G. Aitken are with Centre de organising an information space to suit a persons interest:
Recherche et de Restauration des Musèes de France, Palais du Louvre, Paris,
France. Email: {name.surname}@culture.fr slicing, sorting, swapping, information views and multimedia
P. Sinclair, P. Lewis, K. Martinez, mc schraefel, A. Russell and D.A. Smith preview cues. When we access a subset of the EROS data set
are with Electronics and Computer Science, University of Southampton, UK. through the mSpace interface each category in the information
Email: {pass,phl,km,mc,ar5,das05}@ecs.soton.ac.uk
M. Addis, R. Lowe and S. Hafeez are with IT Innovation Centre, Southamp- space is displayed in a separate column, and the selection in
ton, UK. Email: {mja,rl,szh}@it-innovation.soton.ac.uk each column narrows down the results presented in the next
Unfortunately, due to the vast size of the EROS data set, some
of the queries take a long time to complete by the SRW so
further optimisation will be investigated in the future. As such
we will be investigating optimisations of the SRW, and study
how the underlying database schema could be optimised and
Fig. 1. Subset of the EROS data set displayed through the mSpace interface improved without causing a huge impact.
We believe that the integration of semantically-based inter-
action paradigms, such as the mSpace framework, with legacy
column. mSpace has been designed to be independent of the data management systems is extremely valuable. Not only does
backend database and while the original mSpace server relied this provide rich browsing and navigation functionality that
on an RDF triplestore, the flexibility of mSpaces data access tends to be overlooked in many traditional systems, it show
protocol has been utilised in this project to provide an mSpace cases the benefits of semantically marked up information in
to a relational database exposed through the SRW. a tangible way. This allows users to serendipitously discover
artefacts and media that they would never have found through
III. DISCUSSION a traditional search box. It is also a great way of illustrating
many of the data quality issues present in many metadata
The user can explore the CRM ontology and then use the systems, as errors and inconsistencies are highlighted when
SRW/CQL to retrieve corresponding instances. In this way we the data is presented in an interface such as mSpace.
leverage Semantic Web techniques to describe the complex As part of our future work, we are investigating the integra-
space of CH information, whilst using XML and Web Service tion of the EROS system with the bibliographic records in the
standards to provide an easy to use search and retrieval service C2RMF library. This will draw on the work by the CIDOC
to access this information. This is a trade-off between the CRM working group on the alignment of the UNIMARC
complexity of queries that can be formulated and the need for standard to the CIDOC CRM. In the context of our longer
a simple query language that makes it easy for third-parties to term goals, that is providing cross-collection searching and
develop their own client applications. Whilst the SRW/CRM browsing of disparate multimedia sources in the CH domain,
solution is relatively easy for both content-providers and end- we are working on the harmonization of the data from different
user application developers to understand and use, this is at collections. In the eCHASE project, we are integrating the
the expense of the expressivity of semantic query languages collections of several large CH institutions, including picture
and the ability to use server side reasoning. libraries, television archives, publishers and we hope to attract
Whilst the use of SRW on top of relational legacy data museums and galleries over the coming months. This requires
sources is scalable to the large datasets often held by CH aligning the different data representations, ranging from time
institutions, it does not necessarily provide the performance and date, places, identifying the people across collections and
needed for highly interactive user querying of this data. categorization schemes such as controlled lists and thesauri.
In other words, our use of the SRW and CRM is geared
towards semantic interoperability of multiple heterogeneous ACKNOWLEDGEMENT
datasets, not high performance retrieval needed for interactive This research has been supported by the eCHASE project
data exploration of these datasets. If a high degree of user which is co-funded by the European Commission, DG Infor-
interactivity is required for large datasets, for example by mation Society, under the contract EDC 11262. We would also
using mSpaces to explore the EROS database, then specific like to acknoweldege the EPOCH network of excellence (IST-
additional optimisations are typically necessary. The need for, 2002-507382).
and the choice of, a suitable performance optimisation strategy
is not a result of our decision to use SRW, CRM mapping R EFERENCES
or CQL per se, but is more a reflection on the way that the [1] Aitken, G., Lahanier, C., Pillay, R.,Pitzalis, D.: “Database Management
underlying legacy data is structured, stored and searched. and Innovative Applications for Imaging within Museum Laboratories”
7th European Commission Conference ”SAUVEUR”, June 2006, Prague,
Czech Republic
IV. CONCLUSION AND FUTURE WORK [2] Aitken, G., Lahanier, C., Pillay, R.,Pitzalis, D.: “EROS : An Open Source
Database For Museum Conservation Restoration Preprints for the 14Th
We have described how we have semantically exposed a Triennial Meeting ICOM-CC, J&J London, 2005, The Hague, Netherlands”
CH multimedia repository, EROS, through the SRW and how [3] z39.50 SRW: http://www.loc.gov/z3950/agency/zing/srw/ (2005)
we integrated the mSpace interaction framework. There are [4] Doerr, M.: “The CIDOC Conceptual Reference Model: An ontological
approach to semantic interoperability of metadata” AI Magazine 24 (2003)
still barriers to the practical use of semantic web technologies 75–92
in the CH domain, and this approach enables some of the [5] Addis, M. J., Martinez, K., Lewis, P., Stevenson, J. and Giorgini, F.:
benefits to be explored whilst still supporting the existing “New Ways to Search, Navigate and Use Multimedia Museum Collec-
tions over the Web” In Proceedings of Museums and the Web 2005,
infrastructure. Vancouver, Canada. Trant, J. and Bearman, D., Eds. z39.50 SRW:
Many of the issues we have encountered are due to the http://www.loc.gov/z3950/agency/zing/srw/ (2005)
scale of real world collections, such as the EROS system. To [6] “eCHASE project”: 2004-2006 eContent no. 11262. www.echase.org.
[7] m. c. schraefel, D. A. Smith, A. Owens, A. Russell, C. Harris and M.
overcome part of these problems we decided to implement a Wilson: “The evolving mSpace platform: leveraging the semantic web on
simple caching mechanism on the mSpace SRW server, which the trail of the memex” Proceedings of the sixteenth ACM conference on
improved overall performance once a query had been made. Hypertext and Hypermedia, ACM Press, Salzburg, Austria, 2005