=Paper=
{{Paper
|id=Vol-233/paper-12
|storemode=property
|title=Semantically Exposing Existing Knowledge Repositories: A Case Study in Cultural Heritage
|pdfUrl=https://ceur-ws.org/Vol-233/p25.pdf
|volume=Vol-233
|dblpUrl=https://dblp.org/rec/conf/samt/PitzalisSLALHLMSPARS06
}}
==Semantically Exposing Existing Knowledge Repositories: A Case Study in Cultural Heritage==
        Semantically Exposing Existing Knowledge
      Repositories: A Case Study in Cultural Heritage
Denis Pitzalis, Patrick Sinclair, Christian Lahanier, Matthew Addis, Richard Lowe, Shahbaz Hafeez, Paul Lewis,
     Kirk Martinez, mc schraefel, Ruven Pillay, Geneviève Aitken, Alistair Russell and Daniel A. Smith
   Abstract— In this paper we describe the practical implications                all museums in France. This work requires the management
of semantically exposing a cultural heritage multimedia collection               of huge quantities of different kinds of data. To organise
system (EROS) through a Search and Retrieve Web Service                          our digital library we developed the EROS system[1][2].
(SRW).
                                                                                 This system consists of a relational multilingual database
   Index Terms— multimedia system, semantic web, cultural her-                   that allows us to organise different media: at the moment
itage                                                                            over 250, 000 photographic and radiographic images, 10, 000
                                                                                 technical reports, 1, 000 3D objects, 200, 000 quantitative
                        I. INTRODUCTION                                          chemical and physical analyses related to more than 60, 000
                                                                                 works of art are accessible in digital form. This heterogeneous
   Semantic web technologies have the potential to greatly ben-
                                                                                 group of data is common in real world applications.
efit the Cultural Heritage (CH) domain. CH institutions, such
                                                                                    Semantic interoperability of CH digital libraries has been
as museums and photographic archives, are rich resources of
                                                                                 investigated in the SCULPTEUR[5] and eCHASE[6] projects
heterogeneous multimedia content, depicting people, objects,
                                                                                 by using a z39.50 search and retrieve web service (SRW[3])
events, places, etc. This material, along with any supporting
                                                                                 and by mapping legacy metadata schemas to the CIDOC
metadata, tends to be locked away in internal legacy systems,
                                                                                 Conceptual Reference Model (CRM[4]), an ontology for
and open interfaces to the collections are rarely provided.
                                                                                 describing the semantics of CH documentation. Additional
   The use of semantic web technologies could make an
                                                                                 semantics are attached to the legacy database attributes in order
impact on several levels. Richer semantics can greatly improve
                                                                                 to more fully define their meaning in the context of the CRM
the information systems used by conservators, curators and
                                                                                 framework. The CRM mapped attributes are exposed through
historians by enhancing the retrieval and browsing facilities.
                                                                                 the SRW as a flat list that can be queried by using Common
Making the data available through semantic web services could
                                                                                 Query Language (CQL) expressions. The SRW publishes
provide opportunities for tackling complex research problems
                                                                                 the mapping information in XML through the SRW explain
in the CH domain.
                                                                                 operation. The SRW is able to dynamically map CQL queries
   However, there are still barriers for applying semantic web
                                                                                 expressed in terms of the CRM mappings to the relevant legacy
technologies directly. Many CH institutions are tied in to
                                                                                 database fields (in our case using SQL against a relational
their commercial content management systems. There are also
                                                                                 database) and return the results as XML structured according
high costs in converting and mapping all of their existing
                                                                                 to the CRM mappings.
material to semantic representations such as RDF. Although
                                                                                    Our SRW implementation is available as open source in the
some of the technical issues such as triple store scalability are
                                                                                 form of OpenMKS (http://openmks.sourceforge.net), which
being overcome, many still have doubts about the applicability
                                                                                 provides an SRW implementation that allows relational data
of semantic web technologies in practice. Alternatives that
                                                                                 to be mapped to an XML representation. It also provides a
bring semantics to traditional content management systems are
                                                                                 web-based user interface to the SRW that allows end users
desirable in this context.
                                                                                 to search and browse the content. Through the configuration
                                                                                 system we were able to adapt the system to the EROS content
                          II. CASE STUDY                                         and metadata within C2RMF.
  The C2RMF is the Research and Restoration Centre of                               mSpace[7] is an interaction model and software framework
French Museum located in the Louvre. It’s mission is to                          to help people access and explore information. mSpace helps
analyse, restore and document the works of art kept within                       people build knowledge from exploring relationships in data.
                                                                                 mSpace does this by offering several powerful tools for
   D. Pitzalis, C. Lahanier, R. Pillay and G. Aitken are with Centre de          organising an information space to suit a persons interest:
Recherche et de Restauration des Musèes de France, Palais du Louvre, Paris,
France. Email: {name.surname}@culture.fr                                         slicing, sorting, swapping, information views and multimedia
   P. Sinclair, P. Lewis, K. Martinez, mc schraefel, A. Russell and D.A. Smith   preview cues. When we access a subset of the EROS data set
are with Electronics and Computer Science, University of Southampton, UK.        through the mSpace interface each category in the information
Email: {pass,phl,km,mc,ar5,das05}@ecs.soton.ac.uk
   M. Addis, R. Lowe and S. Hafeez are with IT Innovation Centre, Southamp-      space is displayed in a separate column, and the selection in
ton, UK. Email: {mja,rl,szh}@it-innovation.soton.ac.uk                           each column narrows down the results presented in the next
                                                                             Unfortunately, due to the vast size of the EROS data set, some
                                                                             of the queries take a long time to complete by the SRW so
                                                                             further optimisation will be investigated in the future. As such
                                                                             we will be investigating optimisations of the SRW, and study
                                                                             how the underlying database schema could be optimised and
Fig. 1. Subset of the EROS data set displayed through the mSpace interface   improved without causing a huge impact.
                                                                                We believe that the integration of semantically-based inter-
                                                                             action paradigms, such as the mSpace framework, with legacy
column. mSpace has been designed to be independent of the                    data management systems is extremely valuable. Not only does
backend database and while the original mSpace server relied                 this provide rich browsing and navigation functionality that
on an RDF triplestore, the flexibility of mSpaces data access                tends to be overlooked in many traditional systems, it show
protocol has been utilised in this project to provide an mSpace              cases the benefits of semantically marked up information in
to a relational database exposed through the SRW.                            a tangible way. This allows users to serendipitously discover
                                                                             artefacts and media that they would never have found through
                        III. DISCUSSION                                      a traditional search box. It is also a great way of illustrating
                                                                             many of the data quality issues present in many metadata
   The user can explore the CRM ontology and then use the                    systems, as errors and inconsistencies are highlighted when
SRW/CQL to retrieve corresponding instances. In this way we                  the data is presented in an interface such as mSpace.
leverage Semantic Web techniques to describe the complex                        As part of our future work, we are investigating the integra-
space of CH information, whilst using XML and Web Service                    tion of the EROS system with the bibliographic records in the
standards to provide an easy to use search and retrieval service             C2RMF library. This will draw on the work by the CIDOC
to access this information. This is a trade-off between the                  CRM working group on the alignment of the UNIMARC
complexity of queries that can be formulated and the need for                standard to the CIDOC CRM. In the context of our longer
a simple query language that makes it easy for third-parties to              term goals, that is providing cross-collection searching and
develop their own client applications. Whilst the SRW/CRM                    browsing of disparate multimedia sources in the CH domain,
solution is relatively easy for both content-providers and end-              we are working on the harmonization of the data from different
user application developers to understand and use, this is at                collections. In the eCHASE project, we are integrating the
the expense of the expressivity of semantic query languages                  collections of several large CH institutions, including picture
and the ability to use server side reasoning.                                libraries, television archives, publishers and we hope to attract
   Whilst the use of SRW on top of relational legacy data                    museums and galleries over the coming months. This requires
sources is scalable to the large datasets often held by CH                   aligning the different data representations, ranging from time
institutions, it does not necessarily provide the performance                and date, places, identifying the people across collections and
needed for highly interactive user querying of this data.                    categorization schemes such as controlled lists and thesauri.
In other words, our use of the SRW and CRM is geared
towards semantic interoperability of multiple heterogeneous                                   ACKNOWLEDGEMENT
datasets, not high performance retrieval needed for interactive                 This research has been supported by the eCHASE project
data exploration of these datasets. If a high degree of user                 which is co-funded by the European Commission, DG Infor-
interactivity is required for large datasets, for example by                 mation Society, under the contract EDC 11262. We would also
using mSpaces to explore the EROS database, then specific                    like to acknoweldege the EPOCH network of excellence (IST-
additional optimisations are typically necessary. The need for,              2002-507382).
and the choice of, a suitable performance optimisation strategy
is not a result of our decision to use SRW, CRM mapping                                                   R EFERENCES
or CQL per se, but is more a reflection on the way that the                  [1] Aitken, G., Lahanier, C., Pillay, R.,Pitzalis, D.: “Database Management
underlying legacy data is structured, stored and searched.                      and Innovative Applications for Imaging within Museum Laboratories”
                                                                                7th European Commission Conference ”SAUVEUR”, June 2006, Prague,
                                                                                Czech Republic
        IV. CONCLUSION AND FUTURE WORK                                       [2] Aitken, G., Lahanier, C., Pillay, R.,Pitzalis, D.: “EROS : An Open Source
                                                                                Database For Museum Conservation Restoration Preprints for the 14Th
   We have described how we have semantically exposed a                         Triennial Meeting ICOM-CC, J&J London, 2005, The Hague, Netherlands”
CH multimedia repository, EROS, through the SRW and how                      [3] z39.50 SRW: http://www.loc.gov/z3950/agency/zing/srw/ (2005)
we integrated the mSpace interaction framework. There are                    [4] Doerr, M.: “The CIDOC Conceptual Reference Model: An ontological
                                                                                approach to semantic interoperability of metadata” AI Magazine 24 (2003)
still barriers to the practical use of semantic web technologies                75–92
in the CH domain, and this approach enables some of the                      [5] Addis, M. J., Martinez, K., Lewis, P., Stevenson, J. and Giorgini, F.:
benefits to be explored whilst still supporting the existing                    “New Ways to Search, Navigate and Use Multimedia Museum Collec-
                                                                                tions over the Web” In Proceedings of Museums and the Web 2005,
infrastructure.                                                                 Vancouver, Canada. Trant, J. and Bearman, D., Eds. z39.50 SRW:
   Many of the issues we have encountered are due to the                        http://www.loc.gov/z3950/agency/zing/srw/ (2005)
scale of real world collections, such as the EROS system. To                 [6] “eCHASE project”: 2004-2006 eContent no. 11262. www.echase.org.
                                                                             [7] m. c. schraefel, D. A. Smith, A. Owens, A. Russell, C. Harris and M.
overcome part of these problems we decided to implement a                       Wilson: “The evolving mSpace platform: leveraging the semantic web on
simple caching mechanism on the mSpace SRW server, which                        the trail of the memex” Proceedings of the sixteenth ACM conference on
improved overall performance once a query had been made.                        Hypertext and Hypermedia, ACM Press, Salzburg, Austria, 2005