Social Event Detection at MediaEval 2011:
                     Challenges, Dataset and Evaluation

           Symeon Papadopoulos                         Raphael Troncy                    Vasileios Mezaris
             CERTH / Informatics and                      EURECOM                     CERTH / Informatics and
               Telematics Institute                 Sophia Antipolis, France            Telematics Institute
              57001 Thermi, Greece                raphael.troncy@eurecom.fr            57001 Thermi, Greece
                 papadop@iti.gr                                                           bmezaris@iti.gr
                                       Benoit Huet                 Ioannis Kompatsiaris
                                       EURECOM                     CERTH / Informatics and
                                 Sophia Antipolis, France            Telematics Institute
                                 benoit.huet@eurecom.fr             57001 Thermi, Greece
                                                                         ikom@iti.gr


ABSTRACT                                                          2.    CHALLENGES
This paper provides an overview of the Social Event Detec-           The SED task is composed of two challenges with a com-
tion (SED) task, which is organized as part of the MediaEval      mon test dataset of images with their metadata (time-stamps,
2011 benchmarking activity. With the convergence between          tags, geotags for a small subset of them). Participants were
social networking and multimedia creation and distribution        invited to submit results to either one of the challenges, or
being experienced on a regular basis by hundreds of millions      to both of them. In both cases, the image metadata that
of people worldwide, this task examines how new or state          can be used by the participants for completing this task are
of the art techniques can cope with the need for detecting        only those provided to them as an XML file. The use of ad-
social events by automatically analyzing the social multime-      ditional information (e.g. geotags) that may be available on
dia content. This paper discusses the challenges set as part      the Internet for a given image of the dataset is not permitted.
of the SED Task, the dataset that was provided to the task        However, external resources (such as Wordnet, Wikipedia,
participants, and the process of evaluating the submissions.      or even visual concept detectors trained on external collec-
                                                                  tions) can be employed, provided that they do not relate to
Categories and Subject Descriptors                                specific images of the test dataset (or any images given for
                                                                  specifying the sought events), and that their development
H.3 [Information Storage and Retrieval]: Information              and use did not benefit from any knowledge of the task’s
Search and Retrieval                                              dataset and challenge definitions.

General Terms                                                     2.1    Challenge 1
Experimentation                                                      The first challenge reads: Find all soccer events taking
                                                                  place in Barcelona (Spain) and Rome (Italy) in the
1.   INTRODUCTION                                                 test collection. For each event provide all photos
                                                                  associated with it.
  The modeling, detection, and processing of events is an
                                                                     Soccer events, for the purpose of this task, are soccer
area that has started to receive considerable attention by
                                                                  games and social events centered around soccer such as the
the multimedia community [2]. The Social Event Detection
                                                                  celebration of winning a cup. In contrast, a single person
(SED) task of MediaEval 2011 requires participants to dis-
                                                                  playing with a soccer ball out in the street is not a soccer
cover events and detect media items that are related to a
                                                                  event under the task’s definition.
specific social event of a given event class. By social events,
                                                                     Finding the events, in this task, does not mean finding
we mean that the events are planned by people, attended
                                                                  some textual descriptions or metadata of the events. What
by people and that the social media are captured by people.
                                                                  we are looking for is a set of photo clusters, each cluster
A lot of multimedia content on the Internet was captured
                                                                  comprising only photos associated with a single soccer event
during such an event or is otherwise related to events. How-
                                                                  (thus, each cluster defining a retrieved soccer event). The
ever, this content is often scattered, i.e., disassociated from
                                                                  “photos associated with a soccer event” that we are looking
the related events. This, together with the observation that
                                                                  for are all photos of the test collection that directly relate
humans often think in terms of events, generate the need for
                                                                  (in content, and also in terms of place/time) with the event
automatically establishing the event-media associations that
                                                                  of interest. E.g., photos of game x being played, photos
will allow multimedia browsing and search in a way that is
                                                                  of fans inside the stadium during/a bit before/a bit after
more natural to the users.
                                                                  game x, photos of fans leaving the stadium after the end of
                                                                  game x, are all related to the “game x” soccer event. On the
Copyright is held by the author/owner(s).                         contrary, photos that miss the above relations to an actual
MediaEval 2011 Workshop, September 1-2, 2011, Pisa, Italy         soccer event (e.g. a photo showing part of the stadium where
                                                                   augmented with a few non-geotagged photos for the same
                                                                   cities and time period [3]. However, before providing the
     (a)                                                           XML photo metadata archive (including any tags, geotags,
                                                                   time-stamps, etc.) to the task participants, the geotags were
                                                                   removed for 80% of the photos in the collection (randomly
     (b)                                                           selected). This was done for simulating the frequent lack of
                                                                   geotags in photo collections on the Internet (including the
                                                                   Flickr collection). The dataset and the ground truth will be
     (c)                                                           made publicly available from the MediaEval website.
Figure 1: Example images of (a) soccer events, (b)
events in Paradiso, (c) events in Parc del Forum.                  4.    EVALUATION
                                                                      The evaluation of the submissions to the SED task is per-
                                                                   formed with the use of the ground truth EventMedia asso-
                                                                   ciations [3]. As an aid, the cluster-based event detection
the fans gather, with no fans visible or otherwise any relation    framework of [1] was employed in generating this ground
to a specific game), are not considered as relevant.               truth. Two evaluation measures are used:
  If all images were properly tagged and correctly geotagged
and time-stamped, this would be a trivial task. But, since              • Harmonic mean (F-score) of Precision and Recall for
most images are not geotagged (both in our testset and on                 the retrieved images. This measures only the goodness
the Internet), participants need to also consider tag and/or              of the retrieved photos but not the number of retrieved
visual information for finding the most complete set of rel-              events, nor how accurate the correspondence between
evant events and images. As a required (baseline) run, the                retrieved images and events is.
participants are asked to use any combination of the avail-             • Normalized Mutual Information (NMI). This compares
able image metadata they see fit, but no visual information,              two sets of photo clusters (where each cluster com-
for finding the relevant events and images. The use of visual             prises the images of a single event), jointly considering
information in addition to the various provided image meta-               the goodness of the retrieved photos and their assign-
data is encouraged in subsequent runs. Examples of images                 ment to different events.
that are relevant to soccer events are given in Figure 1(a).
                                                                   Both evaluation measures receive values in the range [0, 1]
2.2        Challenge 2                                             with higher values indicating a better agreement with the
   The second challenge reads: Find all events that took           ground truth results.
place in May 2009 in the venue named Paradiso (in
Amsterdam, NL) and in the Parc del Forum (in                       5. CONCLUSIONS
Barcelona, Spain). For each event provide all photos                  The SED task gave its participants the opportunity to
associated with it.                                                test and comparatively evaluate different approaches to the
   For both these venues, more than one event took place           problem of social event detection in multimedia collections.
in May 2009. We consider that multiple bands playing the           The results of the submissions give rise to interesting conclu-
same evening are not distinct events, but a lineup of multiple     sions. Details on the methods and results of each individual
artists (i.e. we consider that two different events cannot         participant can be found in the working notes papers of the
happen the same day at the same location). Some events             MediaEval 2011 Workshop Proceedings.
(e.g. a festival) can last several days with a lineup of artists
and will be considered as a single event.
   What we are looking for is again a set of photo clusters,       Acknowledgments
each cluster comprising only photos associated with a single       The work presented in this paper was partially supported
event. For specifying these events, besides the venue names,       by the project OpenSEM funded by EIT ICT Labs, and
some exemplary images are provided. These, however, do             by the European Commission under contracts FP7-216444
not have time-stamps. Similarly to the first challenge, par-       Petamedia, FP7-248984 GLOCAL, FP7-215453 WeKnowIt
ticipants may need to consider different kinds of information      and FP7-249008 CHORUS+.
for finding the most complete set of relevant images. A base-
line run that uses no visual information is required, and the      6.    REFERENCES
use of visual information in addition to the various image
                                                                   [1] S. Papadopoulos, C. Zigkolis, Y. Kompatsiaris, and
metadata is encouraged in subsequent runs. Examples of
                                                                       A. Vakali. Cluster-based Landmark and Event
relevant images for the Paradiso and Parc del Forum venues
                                                                       Detection on Tagged Photo Collections. IEEE
are given in Figure 1(b)-(c).
                                                                       Multimedia, 18(1):52–63, February 2011.
                                                                   [2] A. Scherp, R. Jain, M. Kankanhalli, and V. Mezaris.
3.     DATASET                                                         Modeling, Detecting, and Processing Events in
   A collection of 73.645 photos was created by issuing ap-            Multimedia. In 18th ACM International Conference on
propriate queries to the Flickr web service through its web-           Multimedia, pages 1739–1740, Firenze, Italy, 2010.
based API. The collected photos represent the complete set         [3] R. Troncy, B. Malocha, and A. Fialho. Linking Events
of geotagged photos that were available for five different             with Media. In 6th International Conference on
cities (i.e., Amsterdam, Barcelona, London, Paris and Rome,            Semantic Systems (I-SEMANTICS), Graz, Austria,
based on the geotags) and were taken in May 2009, further              2010.