Historical Event-based Access to Museum Collections

 Chiel van den Akker1 , Lora Aroyo2 , Agata Cybulska1 , Marieke van Erp2 , Peter Gorgels3 , Laura
   Hollink4 , Cathy Jager3 , Susan Legêne1 , Lourens van der Meij2 , Johan Oomen5 , Jacco van
     Ossenbruggen2,6 , Guus Schreiber2 , Roxane Segers2 , Piek Vossen1 and Bob Wielinga2,7
                              1
                                Faculty of Arts/VU University Amsterdam
                    2
                       Department of Computer Science/VU University Amsterdam
                                      3
                                        Rijksmuseum Amsterdam
                   4
                      Web Information Systems Group/Delft University of Technology
                             5
                                Netherlands Institute for Sound and Vision
                        6
                          Centrum Wiskunde & Informatica (CWI), Amsterdam
                   7
                      Human-Computer Studies Laboratory/University of Amsterdam
                     {C.vandenAkker,AK.Cybulska,S.Legene,P.Vossen}@let.vu.nl
                    {L.M.Aroyo,Marieke,Lourens,Schreiber,RH.Segers}@cs.vu.nl
                                 {P.Gorgels,C.Jager}@rijksmuseum.nl
                                         L.Hollink@tudelft.nl
                                       joomen@beeldengeluid.nl
                                   Jacco.van.Ossenbruggen@cwi.nl
                                         B.J.Wielinga@uva.nl.


      Abstract. This paper presents research in the context of two multidisciplinary projects
      aimed at facilitating the history domain with an automatic approach for event extraction and
      modelling. To realise this, the Semantics of History project is providing a historical ontology
      and a lexicon to support the detection of historical events in textual data whilst the Agora
      project focusses on exploring the modelling aspects of historical events and employing the
      combined results in an event-driven browse and search approach. Furthermore, the historical
      events are used as a flexible model to identify semantically relevant relationships between
      objects in highly diverse museum collections, creating meaningful ‘cause’ and ‘effect’ links
      along the key event dimensions ‘who’, ‘what’, ‘where’ and ‘when’. This should finally support
      the (re)interpretation process of history research, by allowing end-users to create their own
      personal narratives, leading to theoretical reflection on the meaning of digitally mediated
      public history in contemporary society. In this paper, we give a high-level overview of the
      research challenges in the realisation of a desired search and browse scenario. Finally, we
      outline the open issues and future research.


1   Introduction

There is a vast amount of historical knowledge locked in museum collections. This knowledge is
often explicitly present in textual descriptions accompanying museum objects or implicitly present
through the fact that an object belongs to a particular collection and was collected for a particular
purpose. In this sense, objects from one museum collection only tell part of the story, as they present
a view on the past from one only perspective, limited to their collection. Through combining objects
from different collections, a more comprehensive view of a certain historical period can be given.
When unlocked, this knowledge can help casual users understand the significance of museum objects
and historical events better and aid experts (curators, art historians and historians) in their search
for objects relevant to a topic.
    The Agora project started in October 2009 with the aim of facilitating context-driven browsing
and search in heterogeneous museum collections. The context that unites these collections is pro-
vided by historical events that can be linked to the collection objects, as historical event-descriptions
are comprised of causal language, locations, the actors involved and the time of the event. Agora
is a four year project with a team made up of experts from the computer science and the history
departments at the VU University in Amsterdam8,9 , as well as from the Netherlands Instititute
for Sound and Vision (henceforth: S&V)10 and Rijksmuseum Amsterdam (henceforth: RMA)11 .
The goal of Agora is threefold: (1) a historical event thesaurus linked to museum artefacts, (2) a
semi-automatic event modelling approach that satisfies both the needs of experts and the general
public, and (3) an online social platform in which both the general public and expert historians can
explore various perspectives on events, build their own narratives, and contribute to the evolution
of the event thesaurus.
    The Semantics of History project started in August 2009 with the aim to model changes of
historical reality over time and through different writer perspectives which are revealed in historical
text archives. For this purpose, Semantics of History aims at developing a historical ontology and a
lexicon which will facilitate detection of historical events in textual data. The resulting models and
the event extraction from text will be implemented and tested in the context of the Agora project.
Semantics of History is funded by the VUA Interfaculty Research Institute CAMeRA and is carried
out in cooperation between two departments at the VU University in Amsterdam: the linguistics
department and the computer science department.
    The results of both Agora and Semantics of History will be deployed in a social cultural heritage
platform that will allow different users (e.g., experts, interested laypersons and secondary school
students) to have event-based access to the S&V and RMA collections. As Agora and Semantics of
History are tightly interwoven, we will from this point on report on the combined results of these
projects.
    This paper is structured as follows. In Section 2, the motivation for this project is given by
discussing the shortcomings of current collection access and modelling and by describing the needs
from different user groups in two use-cases. In Section 3, the technical challenges of Agora and Se-
mantics of History are presented, along with the approaches that we are investigating. To conclude,
we will present open issues in Section 4. In Figure 1, the different parties, goals and domains that
play a role in the two projects are summarised.


2   Motivation

In the humanities domain, there are different user groups with a great need for advanced cross-
collection access to resources. In our case, we are dealing with the question of how museums can
present the specific information that belongs to objects in their collection in a way that strengthens
users’ historical understanding and involvement in relevant historical debates. We are also ask-
ing ourselves how we can prevent users from ending up perusing the collection in a zapping-like,
8
   http://www.cs.vu.nl/
9
   http://www.let.vu.nl/
10
   http://www.beeldengeluid.nl
11
   http://www.rijksmuseum.nl/
                   History                                                                                              Computer Science


                                                                                                                              Text Mining
                                                                    Historical Event Thesaurus


                                                                                                     Information Extraction
                                                                                                          & Enrichment
         Experts              Event-driven Exploration
                                                                    Event Modelling
                                                                                                                           Schemas &
                                                                           &                                              Vocabularies
                                                                  Semantics of History

                                                                                                     Information Integration


                             Object-driven Exploration

        Secondary
                                                                             Event
          School
                                                                             Model
         Students                                                                                    User Interaction


                                                         Personal Mediated History Social Platform


                                Fig. 1. Overview of domains and goals involved in Agora


incidence-based viewing that will only lead to a confirmation of his or her preconceived views and
insights as no relations between objects are presented that are novel or surprising to the user. The
answer of museums to this challenge so far has been to prepare thematic ‘Web-specials’: portals and
other Web-presentations that present historical narratives as an extension of the regular exhibition
and education practice. With an approach that is centred around historical events for collection
access we aim to strengthen the role of museum collections in the public discourse about the past.
    Although museum collections and their accompanying information are becoming available in
digital forms, search is often limited to keyword search and browsing through predefined facets[1].
These access methods are not optimal; in keyword search, for example, it is not clear how the
retrieved results are related to each other as they are simply presented in a list. Specifying a search
query through facets resolves this problem, as it enables users to specify which relations they find
relevant. However, facet browsing is often limited in that there is usually one set of facets that may
not provide sufficiently finegrained access to all artefacts. Most museum collections, for example,
are searchable through facets that describe meta-data that is available for every object such as title,
year, artist, technique, dimensions and object id, but users may also want to search via locations
that play a role in the object (for example because a location is depicted in the artwork, or because
the birthplace of the artist might be relevant). Combined access to heterogeneous collections from
different institutions only augments the shortcomings of keyword and facet access strategies as
keyword search will often return more unorganised results when more collections are searched, and
facets from different collections are often incompatible.
    In addition to the general technical shortcomings of current access methods, there are short-
comings that stem from the cultural heritage domain. Users in this domain have a strong desire to
explore collections through a personal narrative or from a personal perspective. Currently, cultural
heritage collections lack event-based annotations that would provide the context to facilitate such
explorations, but cultural heritage institutions have expressed the wish to have a formal definition
of events to include in their annotation of collections.

    We argue that collection access through events can remedy these bottlenecks as events provide
the context that can link a variety of objects together, providing a more comprehensive overview
than facets. To facilitate cross-collection access, we aim to develop an event thesaurus with which
different collections can align their internal thesauri. Furthermore, our approach will combine both
searching and browsing as this ensures maximum flexibility for the user to explore the collection
whilst keeping track of relations between objects. We speculate that by providing a social platform
for history, laypersons and experts will complement each other in the process of creating a digitally
mediated public history. We believe in an open and social environment in which lay and expert users
can together explore and contribute to the evolving collections of objects, events and thesaurus
terms. In this way, we will research and develop ways to support a dynamic (community-based and
event-centred) creation of narratives of digitally presented material objects as well as multimedia
objects. Events are central to narrative and perspective. For a narrative is a sequence of events
with a beginning, middle and end, and different sequences of events provide different perspectives
on those events.

    We envision two types of exploration: object-driven and event-driven exploration. Object-driven
exploration involves a search or browsing activity where the user starts by selecting an object
from the collections and subsequently finds new objects and events through the relations with the
first object. In the event-driven exploration, the user starts by selecting an event and builds a
sequence of related events and objects. As a user may hop between events and objects on his or
her search through the collections the object-driven and event-driven explorations alternate. An
example of a cross-collection exploration scenario in the Agora platform is presented in Figure 2.
In this scenario, the object-driven and event-driven exploration is presented as an alternative to
the typically currently used small, fixed set of relations imposed by the owner of the collection. It
enables the user to wander through the RMA and S&V collections via event-based relations between
collection objects that are most relevant to the user. In the example illustrated in Figure 2, the
user starts by selecting the RMA print “Arrival of Van Spilbergen in Kandy”. This object is related
to an event that has VOC as actor and Batavia as place. In this way, the user can explore these
facets and discover in the results another RMA painting “The Castle of Batavia, seen from West
Kali Besar” depicting the Tradeport of Batavia. Via this object, the user can find another object
from the RMA collection that depicts an event that takes place such as “A Tea Visit in Batavia”.
This object is related to a set of events, such as acts of colonialism, which can also have sub-
events, e.g. Police Actions. The user can choose any of these events or sub-events to explore the
collections further and for example arrive at one of the S&V videos that reports on the Police
Actions. The user then continues the sequence along the facet of another sub-event Indonesian War
for Independence, which offers the S&V video “Suriname and the Netherlands Antilles” annotated
with the sub-event Suriname’s Independence. Finally, the user is recommended (from an external
resource) the sculpture “Slavery” located in the Oosterpark in Amsterdam, annotated with the
same Suriname.
              RMA Object                                      RMA Object                                      RMA Object                               S&V Object
      Title: Arrival of Van Spilbergen                Title: The Castle of Batavia,                    Title: A Tea Visit in Batavia             Title: News from Indonesia
                  in Kandy                             seen from West Kali Besar


                                     Actor: VOC                                       Place: Batavia                                   Event:
                                     Place: Batavia                                                                                    Politce
                                                                                                                                       Action


                                                                                      S&V Object                                 Event:                  External material:
                                                                              Title: Suriname and the                            Suriname's              Police Action
                     Title: Slavery Monument                                    Netherlands Antilles                             Independence


                                                      External material:
                                                      Suriname
                                                      Netherlands


                                                        Fig. 2. Agora Exploration Scenario


2.1   Use Cases
We have defined two use cases that illustrate the need to present the same collections in different
ways to different user groups. The exploration scenario presented above fits in our first use-case:
assisting secondary school students for their Culture and Society end assignment. Since 1998, every
student in the two higher tiers of secondary education in the Netherlands is required to write a piece
on a particular topic. To facilitate the search for relevant objects and references for (art)historic
topics, we will build a use-case that is focussing on event-driven browsing for different museum
collections. By combining different objects, students are enabled to present their own view on the
events they write about. Different relations between objects and events can result in different views
on events.
    Our second use case is aimed at facilitating experts in (art)historic research. For historical
research, these experts want access to all objects that are related to certain events. To them, Agora
platform may present an overview of objects and events a certain actor may be related to, and
thus aiding the researcher in his or her information gathering task. We may also help curators
to document new objects and add data to old ones, providing structured data by means of the
event-model.


3     Challenges
Three domain challenges lie at the basis of our work: (1) sharing terminology, (2) understanding
the meaning of events, and (3) building a historical event thesaurus. Each of these challenges is
detailed below.

Sharing terminology
A clear definition of the shared terminology is a prerequisite for successful multidisciplinary col-
laboration. This need is particularly pressing within collaborative projects in the field of cultural
heritage and computer science such as ours as each field has different definitions and theories about
shared concepts. Even our central concept, event, is treated differently by the parties involved in our
project; in the history domain, the notion of an event has been defined as “what agents make happen
or undergo”[2]. This has the implication that actions are a species of events. Furthermore, events
are concrete particulars. They are unrepeatable entities with a location in space and time[3]. Within
computational linguistics, the notion of event is often not defined, and if defined, the definition is
mostly pragmatic and broad to ensure reusability across different domains. Another difference is
that in computer science ‘event extraction’ often does not stretch beyond the literal task of iden-
tifying event labels, participants, locations and time stamps, whereas historians are interested in
the interpretation of events. Computer science also considers events mostly as separate entities,
whereas historians consider events in their connection with other events. The significance of an
event depends on this connection and is usually expressed in the form of a narrative[4]. Through
continual dialogue we are acknowledging the differences in each other’s dealing with shared concepts
and ensure that we have a stable middle-ground to work from.

Understanding the meaning of events
We are in the process of finding out what the notion of an event means for computer science and
for history, and how to incorporate both views in an event model. Although there is no consensus
on the definition of event in computer science, most event modelling approaches share the charac-
teristic that they want to model: Who does what, where and when?. We take this as at least the
minimal requirement an event modelling approach should be able to express. Once a minimal event
definition has been developed, we can start to think about modelling additional aspects that play a
role in our domain and are closely related to events, namely: granularity, interpretation, perspective,
and causality. We are currently investigating the use of the simple event model (SEM) to model
historical events[5]. SEM aims to provide the minimal set of classes to describe events, minimising
possible clashes between different domain-specific event definitions. It is designed to use external
type definitions and has mappings to other models such as DOLCE12 and LODE13 . We may also
borrow from other models such as F[6] for modelling causation and interpretation to extend SEM.
However, we do not make any commitments to a particular event model as we aim for a flexible
approach in which we can switch to more specific or general models when the need arises.

Building a historical event thesaurus
In order to build a historical event thesaurus, it is important to investigate how historical events are
referred to and how they relate to each other in the museum collections. This provides insights into
how museum collections are annotated. Next, we will try to model these events in such a way that
they can be used to provide better cross-collection access for diverse groups of users. We therefore
want the model to be rich enough to capture the intricacies of historical events, but flexible enough
to ensure its understandability. The first two steps should result in a thesaurus that can be used to
better support users in searching and browsing rich and heterogeneous museum collections.
    In addition, we identify three different types of technical challenges that come into play when one
wants to disclose museum collections: (1) information extraction and enrichment, (2) information
integration, and (3) user interface design. Fortunately, the information extraction, enrichment and
integration will not have to start from scratch, as the Rijksmuseum Amsterdam and Netherlands
Institute for Sound and Vision have linked their collections to existing domain-specific vocabularies
12
     http://www.loa-cnr.it/DOLCE.html
13
     http://linkedevents.org/ontology/
and thesauri such as AAT14 and Iconclass15 but also to general vocabularies, such as WordNet[7].
For the user interface, we can build on previous work from the MultimediaN E-Culture project[8].
In the following subsections, we will first discuss the two technical challenges and then our approach
to dealing with them. The issues that the event modelling and object- and event-driven collection
exploration requirements pose on the interface design depends on the outcomes of resolving the first
two technical challenges and will be addressed in a later stage of the project.

3.1     Information Extraction and Enrichment
One of the biggest challenges is information extraction on historical events from different textual
data. We will start by identifying the ‘bigger’ events that have been deemed important enough to
receive a proper name (e.g., French Revolution, Second World War ). As the behaviour of references
to this type of events is similar to other named entities, we are recasting the identification of
these event labels as an named entity recognition task[9]. In order to detect accompanying actors,
locations and temporal information and the relations between these, as well as smaller events that
do not have a proper name we will first employ state of the art named entity and term recognition
techniques[10], followed by relation finding[11].
    Once we are able to detect references to events, we will need to identify which references belong
to distinct events, and which are a variation on the description of the same event. From manual
extraction from a small number of newspaper texts, we observed that by identifying time, locations
and participants of events and by defining their relations with those of other events, are able to
relate historical events with each other. We are developing different matching functions to determine
what descriptions refer to the same historical events and test these in larger and more heterogeneous
corpora (consisting of e.g., collection catalogues, historical resources, and secondary literature). For
example, every description that involves the same type of event, the same participants, the same
location and the same time period is likely to refer to the same event. Another heuristic we are
investigating involves descriptions that abstract from any of those elements but that we can identify
as being semantically compatible which may indicate that these descriptions also refer to the same
event. Typically, we see that different descriptions add or leave out details of what happened or
group events into large happenings with bigger impact[12].
    By means of the event model, relations between specific event descriptions and the more gen-
eral ones can be represented; also event presentations influenced by writers’ perspective towards
historical events can be recognised and captured. Eventually, we want to be able to maximise the
recall for finding events in text regardless of the description and secondly be able to infer added
subjectivity and interpretation layers to the events.

3.2     Information Integration
The RMA and S&V collections represent different components of the Dutch cultural heritage. The
Rijksmuseum Amsterdam focusses on art, crafts and history. A large part of its one million object
collection consists of 17th century Dutch paintings. The Netherlands Institute for Sound and Vision
aims at preserving the Dutch audiovisual heritage. Its collection contains about 700,000 hours of
radio, television, movie and music material, of which most is less than fifty years old. Although the
two collections are different in age and focus, there is a fair overlap in the topics they deal with
14
     http://www.getty.edu/research/conducting_research/vocabularies/aat
15
     http://www.iconclass.nl/
as the S&V collection contains, for example, documentaries about events that are also depicted or
play a role in the RMA collection.
    In order to access these two collections simultaneously we first need to align the collection meta-
data schemas with each other (e.g., artist in the RMA collection database may correspond to creator
in the S&V collection database). From previous experience in the MultimediaN E-Culture project
we have learnt that a good way to do this is to map both to an accepted metadata schema such as
Visual Resource Association core categories (VRA)16 [8]. To consolidate the collection integration,
we also aim to map the values of the fields in the collection databases to a shared vocabulary[13]
or to other relevant external resources such as the Dutch Biography Portal17 .
    The backbone of our event-driven and object-driven exploration method shall be based on
ClioPatria which provides a basis for collection exploration that combines searching and browsing[14]


4      Open Issues

One of the most central open question in this work focusses on what are the recurring historical
events that can be traced, in the RMA and S&V collections, based on documentations and attribu-
tions of meaning and provenance; and how can we interpret the historical time-lines and narratives
that emerge from such a search, using semantics to derive and explain various views, biases, con-
tradictions, opinions and emotional reactions? Currently, in various types of historic documents
and collections, events are captured with a single interpretation or perspective. However, we aim
at allowing for multiple local, national, international, and personal perspectives on historical events
and their sequences. In this context we are in search for answers to the following questions:

 – how can events be placed in historical sequences, in the context of various collections that
   address different past-relationships;
 – how to include different perspectives in individual events and in event narratives;
 – how to extract and model causal relations between events;
 – how to involve and motivate the end users in the process of collaborative editing of historical
   event narratives

    Critical for the success of this research is to step upon previous experiences and analyse the
implicit historical event model that cultural institutions created by constructing their collections
and collection description (i.e., what do events mean to them and how are they represented?). It
is interesting to explore the past selection and interpretation processes in order to facilitate new
access to (enriched) cultural heritage data and to ultimately investigate how this digitally mediated
public history is related to current history writing.
    Finally, to allow for effective deployment of the research results we need to specify the envisioned
role of a social cultural heritage platform for both the cultural heritage professionals and for the
well-informed or interested lay people; and to what extent this platform will be the main drive to
maintain the dynamics both in the shared historical thesaurus and the historical events descriptions
and their relationships.
16
     http://www.vraweb.org/projects/vracore4/
17
     http://www.biografischportaal.nl
Acknowledgements
Agora is funded by NWO in the CATCH programme and Semantics of History is funded by VU
University of Amsterdam’s Interfaculty research institute CAMeRA.


References
 1. Cohen, D.J.: History and the second decade of the web. Rethinking History 8(2) (2004) 293–301
 2. Ricoeur, P.: Time and Narrative. Volume 1. Chicago and London (1984)
 3. Davidson, D.: Essays on Action and Events. Oxford (1980)
 4. Danto, A.: Analytical Philosophy of History. Cambridge (1968)
 5. van Hage, W.R., Malaisé, V., de Vries, G., Schreiber, A.T., van Someren, M.: Combining ship trajec-
    tories and semantics with the simple event model (sem). In: Proceedings of 1st ACM International
    Workshop on Events in Multimedia (EiMM09), Bejing, China, ACM (October 23 2009)
 6. Scherp, A., Franz, T., Saathoff, C., Staab, S.: F–a model of events based on the foundational ontology
    DOLCE+DnS ultralight. In: Proceedings The Fifth International Conference on Knowledge Capture
    (K-CAP 2009), Redondo Beach, CA, USA, ACM (2009)
 7. Fellbaum, C., ed.: WordNet: An Electronic Lexical Database. The MIT Press (1998)
 8. Schreiber, G., Amin, A., Aroyo, L., van Assem, M., de Boer, V., Hardman, L., Hildebrand, M., Ome-
    layenko, B., van Osenbruggen, J., Tordai, A., Wielemaker, J., Wielinga, B.: Semantic annotation and
    search of cultural-heritage collections: The MultimediaN E-Culture demonstrator. Journal of Web
    Semantics 6(4) (2008) 243–249
 9. Sundheim, B.M.: Overview of results of the muc-6 evaluation. In: Proceedings of the 6th conference
    on Message understanding. (1993) 13–31
10. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceed-
    ings of Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009), Boulder,
    CO, USA (2009) 147–155
11. Suchanek, F.M., Ifrim, G., Weikum, G.: Combining linguistic and statistical analysis to extract relations
    from web documents. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge
    discovery and data mining, Philadelphia, PA, USA (2006) 712–717
12. Cybulska, A., Vossen, P.: Event models for historical perspectives: determining relations between high
    and low level events in text, based on the classification of time, location and participants. In: To Appear
    in: Proceedings of LREC 2010, Valletta, Malta (2010)
13. Tordai, A., Omelayenko, B., Schreiber, G.: Thesaurus and metadata alignment for a semantic e-culture
    application. In: Proceedings of the 4th international conference on Knowledge capture (K-CAP’07),
    Redondo Beach, CA, USA, ACM (2007) 199–200
14. Wielemaker, J., Hildebrand, M., van Ossenbruggen, J., Schreiber, G.: Thesaurus-based search in large
    heterogeneous collections. In: The Semantic Web - ISWC’08. Volume 5318 of LNCS., Tenerife, Spain,
    Springer-Verlag (May 2008) 695–708