<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Social Event Detection at MediaEval 2013: Challenges, Datasets, and Evaluation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Timo Reuter</string-name>
          <email>treuter@cit-ec.uni-</email>
          <email>treuter@cit-ec.unibielefeld.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Symeon Papadopoulos,</string-name>
          <email>papadop@iti.gr</email>
          <email>{papadop,gpetkos}@iti.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christopher de Vries</string-name>
          <email>chris@de-vries.id.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasileios Mezaris,</string-name>
          <email>{bmezaris,ikom}@iti.gr</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shlomo Geva</string-name>
          <email>s.geva@qut.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georgios Petkos, CERTH-ITI</institution>
          ,
          <addr-line>Thermi</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Queensland University of, Technology</institution>
          ,
          <addr-line>Brisbane</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universität Bielefeld, CITEC</institution>
          ,
          <addr-line>Bielefeld</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Yiannis Kompatsiaris, CERTH-ITI</institution>
          ,
          <addr-line>Thermi</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <fpage>18</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>In this paper, we provide an overview of the Social Event Detection (SED) task that is part of the MediaEval Benchmark for Multimedia Evaluation 2013. This task requires participants to discover social events and organize the related media items in event-speci c clusters within a collection of Web multimedia. Social events are events that are planned by people, attended by people and for which the social multimedia are also captured by people. We describe the challenges, datasets, and the evaluation methodology.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>As social media applications proliferate, an ever-increasing
amount of web and multimedia content available on the Web
is being created. A lot of this content is related to social
events, which we de ne as events that are organized and
attended by people and are illustrated by social media content
created by people.</p>
      <p>For users, nding digital content related to social events is
challenging, requiring to search large volumes of data,
possibly at di erent sources and sites. Algorithms that can
support humans in this task are clearly needed. The proposed
task thus consists in developing algorithms that can detect
event-related media and group them by the events they
illustrate or are related to. Such a grouping would provide
the basis for aggregation and search applications that foster
easier discovery, browsing and querying of social events.</p>
    </sec>
    <sec id="sec-2">
      <title>TASK OVERVIEW</title>
      <p>
        For this year's edition of the Social Event Detection task,
two challenges, C1 and C2, are de ned, which are di erent
compared to SED2012 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For each challenge, a dedicated
dataset of images (and videos in the case of C1) together
with their metadata (e.g. timestamp, geographic
information, tags) is provided. Participants are allowed to submit
up to ve runs per task, where each run contains a di erent
set of results. This could be produced by either a di erent
approach or a variation of the same approach. Each run will
be evaluated separately.
3.
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>CHALLENGES</title>
    </sec>
    <sec id="sec-4">
      <title>C1: Full Clustering</title>
      <p>\Produce a complete clustering of the image dataset
according to events."</p>
      <p>Cluster the entire dataset for all images included in
the test set according to events they depict.</p>
      <p>As the target number of events is not given, a
subchallenge is to discover it.</p>
      <p>
        The rst challenge will be a completely data-driven one
involving the analysis of a large-scale dataset, requiring the
production of a complete clustering of the image dataset
according to events (see Figure 1). The task is a supervised
clustering task [
        <xref ref-type="bibr" rid="ref3 ref4">4, 3</xref>
        ] where a set of training events is
provided. However, the events in training and test are disjoint.
This challenge will not specify a particular event or event
class of interest but focuses on grouping images according
to events they are associated to. The ground truth is a
single label, such that no image/video can belong to more than
one event. It is challenging that the number of the events
is not known beforehand and it is up to the participants to
decide which images are clustered together into one event.
      </p>
      <p>Image
documents</p>
      <p>Event 1</p>
      <p>Event 2
the other runs, additional data can be used (including the
images). It is allowed to use generic external resources like
Wikipedia, WordNet, or visual concepts trained on other
data. However, it is not allowed to use external data that
directly relates to the individual images that are included in
the dataset, such as machine tags1.</p>
      <p>Subtask: Full Clustering of Media using Videos
\Assign all videos into the event set of the images
you have created in Challenge 1."</p>
      <p>This is an extension to Challenge 1. Participants should
use their created event clusters and assign the videos to
them. As for the main task, here we also search for a
complete assignment of the videos to events.
3.2</p>
      <p>C2: Classification of Media into Event Types
\Classify images into event and non-event and into
event types."</p>
      <p>For each image in the dataset decide whether the image
depicts an event or not (in the latter case assign the
no-event label to it).</p>
      <p>For each image in the dataset that is not labelled as
noevent, decide what type of event it depicts. The
available event types are the following: concert, conference,
exhibition, fashion, protest, sports, theatre/dance, other.
The second challenge will be a supervised classi cation task,
which requires learning how event-related media items look
like (both in terms of visual content and accompanying
metadata). More speci cally, a set of eight event types are
dened, and methods should automatically decide to which
type (if any) an unknown media item belongs.</p>
      <p>C2 submissions are subject to the same limitations as the
ones of C1 with the di erence that it is allowed to use visual
information from the images in all runs.</p>
    </sec>
    <sec id="sec-5">
      <title>DATASET</title>
      <p>
        The dataset for Challenge 1 consists of pictures from Flickr
and 1,327 videos from YouTube together with their
associated metadata. The pictures were downloaded using the
Flickr API. We considered pictures with an upload time
between January 2006 and December 2012, yielding a dataset
of 437,370 pictures assigned to 21,169 events. The events
were determined by people as described in Reuter et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
and include sport events, protest marches, BBQs, debates,
expositions, festivals or concerts. All of them are published
under a Creative Commons license allowing free
distribution. As it is a real-world dataset, there are some features
(capture/upload time and uploader information) that are
available for every picture, but there are also features that
are available for only a subset of the images: geographic
information (45.9%), tags (95.6%), title (97.9%), and
description (37.9%). 70% of the dataset is provided for training
including the ground truth. The rest is used for evaluation
purposes.
      </p>
      <p>The dataset for Challenge 2 is comparable to that of
Challenge 1 except for the fact that the pictures were gathered
from Instagram using the respective API. The training set
was collected between 27th and 29th of April 2013, based
1A special triple tag to de ne extra semantic information
for interpretation by computer systems
on event-related keywords, and consisted of 27,754 pictures
(after cleaning). The classi cation of pictures to event types
was performed manually by multiple annotators, while
several borderline cases were completely removed. The test set
was collected between the 7th and 13th of May 2013, was
processed using the same procedure as the training set, and
consisted of 29,411 pictures. There are eight event types
in the dataset: music (concert) events, conferences,
exhibitions, fashion shows, protests, sport events, theatrical/dance
events (considered as one category) and other events (e.g.
parades, gatherings). As in the dataset for Challenge 1,
some features are not present for all pictures: 27.9% of the
pictures have geographic information, 93.4% come with a
title and almost all pictures (99.5%) have at least one tag.
5.</p>
    </sec>
    <sec id="sec-6">
      <title>EVALUATION</title>
      <p>We evaluate the submissions with ground truth
information that has been created by human annotators. The results
of event-related media item detection will be evaluated using
three evaluation measures:</p>
      <p>
        F1-score, calculated from Precision and Recall
(applicable to both C1 and C2). [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
Normalized Mutual Information (NMI). Both will be
used to assess the overlap between clusters and classes.
(applicable only to C1).
      </p>
      <p>
        Divergence from a Random Baseline. All evaluation
measures will also be reported in an adjusted measure
called Divergence from a Random Baseline [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
indicating how much useful learning has occurred and helping
detect problematic clustering submissions (applicable
to both C1 and C2).
6.
      </p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION</title>
      <p>This year's SED edition decomposes the problem of social
event detection into two main components: (a) clustering of
media depicting certain social events, (b) deciding whether
an image is event-related, and if yes, what type of event it is
related to. Both the scale and the complexity of this year's
dataset make it more challenging and more representative
of real-world problems.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The work was supported by the European Commission
under contracts FP7-287911 LinkedTV, FP7-318101
MediaMixer, FP7-287975 SocialSensor and FP7-249008 CHORUS+.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>C. M. De Vries</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Geva</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Trotman</surname>
          </string-name>
          .
          <article-title>Document clustering evaluation: Divergence from a random baseline</article-title>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Schinas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mezaris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Kompatsiaris.</surname>
          </string-name>
          <article-title>Social event detection at mediaeval 2012: Challenges, dataset and evaluation</article-title>
          .
          <source>In Proceedings of MediaEval 2012 Workshop</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Petkos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kompatsiaris</surname>
          </string-name>
          .
          <article-title>Social event detection using multimodal clustering and integrating supervisory signals</article-title>
          .
          <source>In Proceedings of the 2nd ACM Intern. Conf. on Multimedia Retrieval, page 23. ACM</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Reuter</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          .
          <article-title>Event-based classi cation of social media streams</article-title>
          .
          <source>In Proceedings of the 2nd ACM Intern. Conf. on Multimedia Retrieval, page 22. ACM</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>