Social Event Detection at MediaEval 2011: Challenges, Dataset and Evaluation Symeon Papadopoulos Raphael Troncy Vasileios Mezaris CERTH / Informatics and EURECOM CERTH / Informatics and Telematics Institute Sophia Antipolis, France Telematics Institute 57001 Thermi, Greece raphael.troncy@eurecom.fr 57001 Thermi, Greece papadop@iti.gr bmezaris@iti.gr Benoit Huet Ioannis Kompatsiaris EURECOM CERTH / Informatics and Sophia Antipolis, France Telematics Institute benoit.huet@eurecom.fr 57001 Thermi, Greece ikom@iti.gr ABSTRACT 2. CHALLENGES This paper provides an overview of the Social Event Detec- The SED task is composed of two challenges with a com- tion (SED) task, which is organized as part of the MediaEval mon test dataset of images with their metadata (time-stamps, 2011 benchmarking activity. With the convergence between tags, geotags for a small subset of them). Participants were social networking and multimedia creation and distribution invited to submit results to either one of the challenges, or being experienced on a regular basis by hundreds of millions to both of them. In both cases, the image metadata that of people worldwide, this task examines how new or state can be used by the participants for completing this task are of the art techniques can cope with the need for detecting only those provided to them as an XML file. The use of ad- social events by automatically analyzing the social multime- ditional information (e.g. geotags) that may be available on dia content. This paper discusses the challenges set as part the Internet for a given image of the dataset is not permitted. of the SED Task, the dataset that was provided to the task However, external resources (such as Wordnet, Wikipedia, participants, and the process of evaluating the submissions. or even visual concept detectors trained on external collec- tions) can be employed, provided that they do not relate to Categories and Subject Descriptors specific images of the test dataset (or any images given for specifying the sought events), and that their development H.3 [Information Storage and Retrieval]: Information and use did not benefit from any knowledge of the task’s Search and Retrieval dataset and challenge definitions. General Terms 2.1 Challenge 1 Experimentation The first challenge reads: Find all soccer events taking place in Barcelona (Spain) and Rome (Italy) in the 1. INTRODUCTION test collection. For each event provide all photos associated with it. The modeling, detection, and processing of events is an Soccer events, for the purpose of this task, are soccer area that has started to receive considerable attention by games and social events centered around soccer such as the the multimedia community [2]. The Social Event Detection celebration of winning a cup. In contrast, a single person (SED) task of MediaEval 2011 requires participants to dis- playing with a soccer ball out in the street is not a soccer cover events and detect media items that are related to a event under the task’s definition. specific social event of a given event class. By social events, Finding the events, in this task, does not mean finding we mean that the events are planned by people, attended some textual descriptions or metadata of the events. What by people and that the social media are captured by people. we are looking for is a set of photo clusters, each cluster A lot of multimedia content on the Internet was captured comprising only photos associated with a single soccer event during such an event or is otherwise related to events. How- (thus, each cluster defining a retrieved soccer event). The ever, this content is often scattered, i.e., disassociated from “photos associated with a soccer event” that we are looking the related events. This, together with the observation that for are all photos of the test collection that directly relate humans often think in terms of events, generate the need for (in content, and also in terms of place/time) with the event automatically establishing the event-media associations that of interest. E.g., photos of game x being played, photos will allow multimedia browsing and search in a way that is of fans inside the stadium during/a bit before/a bit after more natural to the users. game x, photos of fans leaving the stadium after the end of game x, are all related to the “game x” soccer event. On the Copyright is held by the author/owner(s). contrary, photos that miss the above relations to an actual MediaEval 2011 Workshop, September 1-2, 2011, Pisa, Italy soccer event (e.g. a photo showing part of the stadium where augmented with a few non-geotagged photos for the same cities and time period [3]. However, before providing the (a) XML photo metadata archive (including any tags, geotags, time-stamps, etc.) to the task participants, the geotags were removed for 80% of the photos in the collection (randomly (b) selected). This was done for simulating the frequent lack of geotags in photo collections on the Internet (including the Flickr collection). The dataset and the ground truth will be (c) made publicly available from the MediaEval website. Figure 1: Example images of (a) soccer events, (b) events in Paradiso, (c) events in Parc del Forum. 4. EVALUATION The evaluation of the submissions to the SED task is per- formed with the use of the ground truth EventMedia asso- ciations [3]. As an aid, the cluster-based event detection the fans gather, with no fans visible or otherwise any relation framework of [1] was employed in generating this ground to a specific game), are not considered as relevant. truth. Two evaluation measures are used: If all images were properly tagged and correctly geotagged and time-stamped, this would be a trivial task. But, since • Harmonic mean (F-score) of Precision and Recall for most images are not geotagged (both in our testset and on the retrieved images. This measures only the goodness the Internet), participants need to also consider tag and/or of the retrieved photos but not the number of retrieved visual information for finding the most complete set of rel- events, nor how accurate the correspondence between evant events and images. As a required (baseline) run, the retrieved images and events is. participants are asked to use any combination of the avail- • Normalized Mutual Information (NMI). This compares able image metadata they see fit, but no visual information, two sets of photo clusters (where each cluster com- for finding the relevant events and images. The use of visual prises the images of a single event), jointly considering information in addition to the various provided image meta- the goodness of the retrieved photos and their assign- data is encouraged in subsequent runs. Examples of images ment to different events. that are relevant to soccer events are given in Figure 1(a). Both evaluation measures receive values in the range [0, 1] 2.2 Challenge 2 with higher values indicating a better agreement with the The second challenge reads: Find all events that took ground truth results. place in May 2009 in the venue named Paradiso (in Amsterdam, NL) and in the Parc del Forum (in 5. CONCLUSIONS Barcelona, Spain). For each event provide all photos The SED task gave its participants the opportunity to associated with it. test and comparatively evaluate different approaches to the For both these venues, more than one event took place problem of social event detection in multimedia collections. in May 2009. We consider that multiple bands playing the The results of the submissions give rise to interesting conclu- same evening are not distinct events, but a lineup of multiple sions. Details on the methods and results of each individual artists (i.e. we consider that two different events cannot participant can be found in the working notes papers of the happen the same day at the same location). Some events MediaEval 2011 Workshop Proceedings. (e.g. a festival) can last several days with a lineup of artists and will be considered as a single event. What we are looking for is again a set of photo clusters, Acknowledgments each cluster comprising only photos associated with a single The work presented in this paper was partially supported event. For specifying these events, besides the venue names, by the project OpenSEM funded by EIT ICT Labs, and some exemplary images are provided. These, however, do by the European Commission under contracts FP7-216444 not have time-stamps. Similarly to the first challenge, par- Petamedia, FP7-248984 GLOCAL, FP7-215453 WeKnowIt ticipants may need to consider different kinds of information and FP7-249008 CHORUS+. for finding the most complete set of relevant images. A base- line run that uses no visual information is required, and the 6. REFERENCES use of visual information in addition to the various image [1] S. Papadopoulos, C. Zigkolis, Y. Kompatsiaris, and metadata is encouraged in subsequent runs. Examples of A. Vakali. Cluster-based Landmark and Event relevant images for the Paradiso and Parc del Forum venues Detection on Tagged Photo Collections. IEEE are given in Figure 1(b)-(c). Multimedia, 18(1):52–63, February 2011. [2] A. Scherp, R. Jain, M. Kankanhalli, and V. Mezaris. 3. DATASET Modeling, Detecting, and Processing Events in A collection of 73.645 photos was created by issuing ap- Multimedia. In 18th ACM International Conference on propriate queries to the Flickr web service through its web- Multimedia, pages 1739–1740, Firenze, Italy, 2010. based API. The collected photos represent the complete set [3] R. Troncy, B. Malocha, and A. Fialho. Linking Events of geotagged photos that were available for five different with Media. In 6th International Conference on cities (i.e., Amsterdam, Barcelona, London, Paris and Rome, Semantic Systems (I-SEMANTICS), Graz, Austria, based on the geotags) and were taken in May 2009, further 2010.