=Paper=
{{Paper
|id=Vol-1263/paper5
|storemode=property
|title=Social Event Detection at MediaEval 2014: Challenges, Datasets, and Evaluation
|pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_5.pdf
|volume=Vol-1263
|dblpUrl=https://dblp.org/rec/conf/mediaeval/PetkosPMK14
}}
==Social Event Detection at MediaEval 2014: Challenges, Datasets, and Evaluation==
<pdf width="1500px">https://ceur-ws.org/Vol-1263/mediaeval2014_submission_5.pdf</pdf>
<pre>
     Social Event Detection at MediaEval 2014: Challenges,
                   Datasets, and Evaluation

         Georgios Petkos, Symeon Papadopoulos, Vasileios Mezaris, Yiannis Kompatsiaris
                                         Information Technologies Institute / CERTH
                                                 6th Km. Charilaou-Thermis
                                                    Thessaloniki, Greece
                                           {gpetkos,papadop,bmezaris,ikom}@iti.gr

ABSTRACT                                                          mitting their runs in only the subtask that they would like
This paper provides an overview of the Social Event Detec-        to focus on; they are however encouraged to submit their
tion (SED) task that takes place as part of the 2014 MediaE-      runs in both subtasks. As will be detailed in the next sec-
val Benchmark. The task is motivated by the need to mine          tion, given a large collection of images, the two subtasks
a common type of real-world activity, social events in large      require participants to: a) perform a full clustering of the
collections of online multimedia. The task has two subtasks,      images around events, b) retrieve sets of events according to
each of which is related to a different aspect of such a mining   specific search criteria.
procedure: detection of events (by means of clustering) and
retrieval of events, and is performed on a large image collec-
                                                                  3. CHALLENGES
tion of more than 470K Flickr images (development and test        3.1 Subtask 1: Full clustering
set). We examine the details of the subtasks, the datasets,          In the first subtask, a collection of images with their meta-
as well as the evaluation process.                                data is provided, and participants are asked to produce a full
                                                                  clustering of the images, so that each cluster corresponds to
1.   INTRODUCTION                                                 a social event. Participants that took part in the 2013 ver-
   The wealth of content uploaded by users on the Internet        sion of the task [5] should be familiar with this subtask as
is often related to different aspects of real-world human ac-     it is a continuation of the first subtask from last year. This
tivities. This presents an important mining opportunity and       subtask may be treated as a typical clustering problem or
thus there have been many efforts to analyse such data. For       with the help of recently introduced “supervised clustering”
instance, web content has been extensively used for applica-      approaches [4, 1, 3].
tions such as detecting breaking news or monitoring ongo-            The main challenges of the first subtask are:
ing stories. A very interesting field of work in this direction       • The number of target clusters is not provided and will
involves the detection of social events in multimedia collec-            have to be inferred by the clustering methods of the
tions retrieved from the web. With social events we mean                 participants.
events which are attended by people and are represented
                                                                     • Each photo is accompanied by metadata, which are po-
by multimedia content uploaded online by different people.
                                                                       tentially helpful for the clustering; however, they are
Instances of such events are concerts, sports events, public
                                                                       often missing or are of inconsistent quality and there-
celebrations and even protests. Mining such events may be
                                                                       fore introduce a multimodal aspect to the problem.
of interest to e.g. professional journalists who would like to
discover new events or new material about known events,              • Some of the metadata is noisy. For example, if the
or to casual users that would like to organize their personal          date is incorrectly set at the device of a user, then the
photo collections around attended events.                              info about the date that his / her pictures were taken
   Indeed, during the last years, the SED problem has at-              will be incorrect.
tracted significant interest by the research community. In-
dicative of this is the fact that the SED task has been part      3.2    Subtask 2: Retrieval of events
of the MediaEval benchmark in the last three years (2011-           In the second subtask, a collection of events is provided;
2013) [2]. In the following, we are going to present the de-      each event is represented by a set of images with their meta-
tails of the subtasks, datasets and evaluation process for the    data, and participants are asked to retrieve those events that
fourth edition of the task.                                       meet some criteria. Please note that this is a new subtask,
                                                                  appearing for the first time this year.
2.   TASK OVERVIEW                                                  The retrieval criteria will be related to the following:
                                                                     • The location of the event (country, city, venue).
   This year, the SED task is organized around two subtasks,
the details of which will be provided in the following section.      • The type of the event (concert, protest, etc.).
Participants are allowed to submit up to five runs for each           • Entities involved in the event (e.g. a band in a con-
of the subtasks. Additionally, participants may opt for sub-            cert).
                                                                  For instance, the first test query asks users to find all music
                                                                  events that took place in Canada, whereas another asks for
Copyright is held by the author/owner(s).                         all conferences, exhibitions and technical events that took
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain    place in the U.K.
4.   DATASETS                                                     etc.) and the correct query results are known by filtering
   Two datasets will be used in the task. Both are comprised      according to the criteria of each query. The results of the
of images collected from Flickr using the Flickr API. All im-     second subtask will be evaluated using three different evalu-
ages are covered by a Creative Commons license. For both          ation measures: precision, recall and F1-score. The ground
datasets, the actual image files and their metadata are made      truth has been obtained by taking into account both the
available. The metadata includes the following: username          metadata of events from Last.fm and Upcoming and by
of the uploader, date taken, date uploaded, title, descrip-       manual labelling. In particular, for all events, either they
tion, tags and geo-location. For both datasets, some of the       are Last.fm or Upcoming events, we know their time and
metadata is not available, for instance, only roughly 20% of      location from the metadata obtained from the respective
the images come with their geo-location.                          API. Additionally, we know that all Last.fm events are music
   The first dataset contains 362,578 images and together         events and also know the event metadata contains the name
with it, we also provide the grouping of these images into        of the relevant artist. Events from Upcoming may belong
17,834 clusters that represent social events. The second          to different categories (e.g. protest, sports, music, etc.) and
dataset contains 110,541 images and, contrary to the first        were manually classified. Additionally, for Upcoming events
set, we do not release the grouping of its images into clus-      that were classified as music events, the relevant artist was
ters that represent social events1 .                              also manually defined.
   The first dataset is used as the development set for both      6.   CONCLUSION
subtasks and as the test set for the second subtask. We will
                                                                    We presented the subtasks, datasets and evaluation pro-
refer to this dataset as the development set, although it is
                                                                  cess for the 2014 SED task. Interestingly, this year a new
also used for testing in the second subtask. For the first sub-
                                                                  subtask is introduced: the event retrieval subtask. Thus,
task, the development dataset provides to the participants
                                                                  a new dimension is added to the overall SED problem this
a large number of examples of correct/target image clus-
                                                                  year.
ters corresponding to events. For the second subtask, the
development dataset provides the set of events from which         7.   ACKNOWLEDGMENTS
the participants must retrieve the relevant events for each         The work was supported by the European Commission
query. A number of example queries together with the ids          under contracts FP7-287911 LinkedTV, FP7-318101 Media-
of the relevant events is also provided for development. For      Mixer and FP7-287975 SocialSensor. We would also like to
testing, participants are asked to find those events in the       thank Timo Reuter for the ReSEED dataset, on which the
development dataset, but using a different set of criteria.       development dataset was partly based.
Importantly, whereas there are 8 development queries, there
are 10 test queries. The 8 development queries have a direct      8.   REFERENCES
correspondence to the 8 first test queries, they have similar     [1] G. Petkos, S. Papadopoulos, and Y. Kompatsiaris.
criteria. For instance, whereas one development query asks            Social event detection using multimodal clustering and
for all music events that took place in Copenghagen, the              integrating supervisory signals. In Proceedings of the
corresponding test query asks for all music events that took          2nd ACM International Conference on Multimedia
place in Bucharest. However, there are two additional test            Retrieval, ICMR ’12, pages 23:1–23:8, New York, NY,
queries, for which a corresponding query is not provided.             USA, 2012. ACM.
   The second dataset is used only in the first subtask for       [2] G. Petkos, S. Papadopoulos, V. Mezaris, R. Troncy,
testing purposes. That is, participants are asked to find             P. Cimiano, T. Reuter, and Y. Kompatsiaris. Social
image-cluster associations in the second dataset, similar (in         event detection at MediaEval: a three-year retrospect
nature) to those in the development set.                              of tasks and results. In Proceedings of the 2014
                                                                      Workshop on Social Events in Web Multimedia (in
5.   EVALUATION                                                       conjuction with ICMR), 2014.
  For the first subtask, the submissions will be evaluated        [3] G. Petkos, S. Papadopoulos, E. Schinas, and
against the ground truth using the following three evaluation         Y. Kompatsiaris. Graph-based multimodal clustering
measures:                                                             for social event detection in large collections of images.
   • F1-Score calculated from precision and recall.                   In MultiMedia Modeling International Conference,
   • Normalized Mutual Information (NMI).                             MMM 2014, Dublin, Ireland, January 6-10, 2014,
                                                                      Proceedings, Part I, pages 146–158, 2014.
   • Divergence from a random baseline. All evaluation
     measures will also be reported in an adjusted form           [4] T. Reuter and P. Cimiano. Event-based classification of
     called “Divergence from a random baseline” [6], which            social media streams. In Proceedings of the 2nd ACM
     indicates how much useful learning has occurred and              International Conference on Multimedia Retrieval,
     helps detecting problematic clustering submissions.              ICMR ’12, pages 22:1–22:8, New York, NY, USA, 2012.
                                                                      ACM.
  The ground truth for the first subtask has been obtained        [5] T. Reuter, S. Papadopoulos, G. Petkos, V. Mezaris,
by taking advantage of machine tags with which users have             Y. Kompatsiaris, P. Cimiano, C. de Vries, and S. Geva.
labelled the pictures on Flickr. These machine tags associate         Social event detection at MediaEval 2013: Challenges,
the images to distinct events in Last.fm2 and Upcoming3 .             datasets, and evaluation. Proceedings of the MediaEval
  For the second subtask, each event is labelled according            2013 Multimedia Benchmark Workshop Barcelona,
to the search criteria that were listed above (type, location,        Spain, October 18-19, 2013, 2013.
1
  But we plan to release it, after the task is completed.         [6] C. M. D. Vries, S. Geva, and A. Trotman. Document
2
  http://www.last.fm/                                                 clustering evaluation: Divergence from a random
3
  http://en.wikipedia.org/wiki/Upcoming                               baseline. CoRR, abs/1208.5654, 2012.

</pre>