Semantically Exposing Existing Knowledge Repositories: A Case Study in Cultural Heritage Denis Pitzalis, Patrick Sinclair, Christian Lahanier, Matthew Addis, Richard Lowe, Shahbaz Hafeez, Paul Lewis, Kirk Martinez, mc schraefel, Ruven Pillay, Geneviève Aitken, Alistair Russell and Daniel A. Smith Abstract— In this paper we describe the practical implications all museums in France. This work requires the management of semantically exposing a cultural heritage multimedia collection of huge quantities of different kinds of data. To organise system (EROS) through a Search and Retrieve Web Service our digital library we developed the EROS system[1][2]. (SRW). This system consists of a relational multilingual database Index Terms— multimedia system, semantic web, cultural her- that allows us to organise different media: at the moment itage over 250, 000 photographic and radiographic images, 10, 000 technical reports, 1, 000 3D objects, 200, 000 quantitative I. INTRODUCTION chemical and physical analyses related to more than 60, 000 works of art are accessible in digital form. This heterogeneous Semantic web technologies have the potential to greatly ben- group of data is common in real world applications. efit the Cultural Heritage (CH) domain. CH institutions, such Semantic interoperability of CH digital libraries has been as museums and photographic archives, are rich resources of investigated in the SCULPTEUR[5] and eCHASE[6] projects heterogeneous multimedia content, depicting people, objects, by using a z39.50 search and retrieve web service (SRW[3]) events, places, etc. This material, along with any supporting and by mapping legacy metadata schemas to the CIDOC metadata, tends to be locked away in internal legacy systems, Conceptual Reference Model (CRM[4]), an ontology for and open interfaces to the collections are rarely provided. describing the semantics of CH documentation. Additional The use of semantic web technologies could make an semantics are attached to the legacy database attributes in order impact on several levels. Richer semantics can greatly improve to more fully define their meaning in the context of the CRM the information systems used by conservators, curators and framework. The CRM mapped attributes are exposed through historians by enhancing the retrieval and browsing facilities. the SRW as a flat list that can be queried by using Common Making the data available through semantic web services could Query Language (CQL) expressions. The SRW publishes provide opportunities for tackling complex research problems the mapping information in XML through the SRW explain in the CH domain. operation. The SRW is able to dynamically map CQL queries However, there are still barriers for applying semantic web expressed in terms of the CRM mappings to the relevant legacy technologies directly. Many CH institutions are tied in to database fields (in our case using SQL against a relational their commercial content management systems. There are also database) and return the results as XML structured according high costs in converting and mapping all of their existing to the CRM mappings. material to semantic representations such as RDF. Although Our SRW implementation is available as open source in the some of the technical issues such as triple store scalability are form of OpenMKS (http://openmks.sourceforge.net), which being overcome, many still have doubts about the applicability provides an SRW implementation that allows relational data of semantic web technologies in practice. Alternatives that to be mapped to an XML representation. It also provides a bring semantics to traditional content management systems are web-based user interface to the SRW that allows end users desirable in this context. to search and browse the content. Through the configuration system we were able to adapt the system to the EROS content II. CASE STUDY and metadata within C2RMF. The C2RMF is the Research and Restoration Centre of mSpace[7] is an interaction model and software framework French Museum located in the Louvre. It’s mission is to to help people access and explore information. mSpace helps analyse, restore and document the works of art kept within people build knowledge from exploring relationships in data. mSpace does this by offering several powerful tools for D. Pitzalis, C. Lahanier, R. Pillay and G. Aitken are with Centre de organising an information space to suit a persons interest: Recherche et de Restauration des Musèes de France, Palais du Louvre, Paris, France. Email: {name.surname}@culture.fr slicing, sorting, swapping, information views and multimedia P. Sinclair, P. Lewis, K. Martinez, mc schraefel, A. Russell and D.A. Smith preview cues. When we access a subset of the EROS data set are with Electronics and Computer Science, University of Southampton, UK. through the mSpace interface each category in the information Email: {pass,phl,km,mc,ar5,das05}@ecs.soton.ac.uk M. Addis, R. Lowe and S. Hafeez are with IT Innovation Centre, Southamp- space is displayed in a separate column, and the selection in ton, UK. Email: {mja,rl,szh}@it-innovation.soton.ac.uk each column narrows down the results presented in the next Unfortunately, due to the vast size of the EROS data set, some of the queries take a long time to complete by the SRW so further optimisation will be investigated in the future. As such we will be investigating optimisations of the SRW, and study how the underlying database schema could be optimised and Fig. 1. Subset of the EROS data set displayed through the mSpace interface improved without causing a huge impact. We believe that the integration of semantically-based inter- action paradigms, such as the mSpace framework, with legacy column. mSpace has been designed to be independent of the data management systems is extremely valuable. Not only does backend database and while the original mSpace server relied this provide rich browsing and navigation functionality that on an RDF triplestore, the flexibility of mSpaces data access tends to be overlooked in many traditional systems, it show protocol has been utilised in this project to provide an mSpace cases the benefits of semantically marked up information in to a relational database exposed through the SRW. a tangible way. This allows users to serendipitously discover artefacts and media that they would never have found through III. DISCUSSION a traditional search box. It is also a great way of illustrating many of the data quality issues present in many metadata The user can explore the CRM ontology and then use the systems, as errors and inconsistencies are highlighted when SRW/CQL to retrieve corresponding instances. In this way we the data is presented in an interface such as mSpace. leverage Semantic Web techniques to describe the complex As part of our future work, we are investigating the integra- space of CH information, whilst using XML and Web Service tion of the EROS system with the bibliographic records in the standards to provide an easy to use search and retrieval service C2RMF library. This will draw on the work by the CIDOC to access this information. This is a trade-off between the CRM working group on the alignment of the UNIMARC complexity of queries that can be formulated and the need for standard to the CIDOC CRM. In the context of our longer a simple query language that makes it easy for third-parties to term goals, that is providing cross-collection searching and develop their own client applications. Whilst the SRW/CRM browsing of disparate multimedia sources in the CH domain, solution is relatively easy for both content-providers and end- we are working on the harmonization of the data from different user application developers to understand and use, this is at collections. In the eCHASE project, we are integrating the the expense of the expressivity of semantic query languages collections of several large CH institutions, including picture and the ability to use server side reasoning. libraries, television archives, publishers and we hope to attract Whilst the use of SRW on top of relational legacy data museums and galleries over the coming months. This requires sources is scalable to the large datasets often held by CH aligning the different data representations, ranging from time institutions, it does not necessarily provide the performance and date, places, identifying the people across collections and needed for highly interactive user querying of this data. categorization schemes such as controlled lists and thesauri. In other words, our use of the SRW and CRM is geared towards semantic interoperability of multiple heterogeneous ACKNOWLEDGEMENT datasets, not high performance retrieval needed for interactive This research has been supported by the eCHASE project data exploration of these datasets. If a high degree of user which is co-funded by the European Commission, DG Infor- interactivity is required for large datasets, for example by mation Society, under the contract EDC 11262. We would also using mSpaces to explore the EROS database, then specific like to acknoweldege the EPOCH network of excellence (IST- additional optimisations are typically necessary. The need for, 2002-507382). and the choice of, a suitable performance optimisation strategy is not a result of our decision to use SRW, CRM mapping R EFERENCES or CQL per se, but is more a reflection on the way that the [1] Aitken, G., Lahanier, C., Pillay, R.,Pitzalis, D.: “Database Management underlying legacy data is structured, stored and searched. and Innovative Applications for Imaging within Museum Laboratories” 7th European Commission Conference ”SAUVEUR”, June 2006, Prague, Czech Republic IV. CONCLUSION AND FUTURE WORK [2] Aitken, G., Lahanier, C., Pillay, R.,Pitzalis, D.: “EROS : An Open Source Database For Museum Conservation Restoration Preprints for the 14Th We have described how we have semantically exposed a Triennial Meeting ICOM-CC, J&J London, 2005, The Hague, Netherlands” CH multimedia repository, EROS, through the SRW and how [3] z39.50 SRW: http://www.loc.gov/z3950/agency/zing/srw/ (2005) we integrated the mSpace interaction framework. There are [4] Doerr, M.: “The CIDOC Conceptual Reference Model: An ontological approach to semantic interoperability of metadata” AI Magazine 24 (2003) still barriers to the practical use of semantic web technologies 75–92 in the CH domain, and this approach enables some of the [5] Addis, M. J., Martinez, K., Lewis, P., Stevenson, J. and Giorgini, F.: benefits to be explored whilst still supporting the existing “New Ways to Search, Navigate and Use Multimedia Museum Collec- tions over the Web” In Proceedings of Museums and the Web 2005, infrastructure. Vancouver, Canada. Trant, J. and Bearman, D., Eds. z39.50 SRW: Many of the issues we have encountered are due to the http://www.loc.gov/z3950/agency/zing/srw/ (2005) scale of real world collections, such as the EROS system. To [6] “eCHASE project”: 2004-2006 eContent no. 11262. www.echase.org. [7] m. c. schraefel, D. A. Smith, A. Owens, A. Russell, C. Harris and M. overcome part of these problems we decided to implement a Wilson: “The evolving mSpace platform: leveraging the semantic web on simple caching mechanism on the mSpace SRW server, which the trail of the memex” Proceedings of the sixteenth ACM conference on improved overall performance once a query had been made. Hypertext and Hypermedia, ACM Press, Salzburg, Austria, 2005