CERTH @ MediaEval 2011 Social Event Detection Task

           Symeon Papadopoulos,                     Yiannis Kompatsiaris                      Athena Vakali
                                                    1                                   2
              Christos Zigkolis                    Informatics & Telematics                 Informatics Department
              1
                  CERTH, Thessaloniki                     Institute                           Aristotle University,
                  2
                    Aristotle University         CERTH, Thessaloniki, Greece                 Thessaloniki, Greece
         {papadop,chzigkol}@iti.gr                          ikom@iti.gr                 avakali@csd.auth.gr


ABSTRACT                                                           that is focused on the ﬁve cities of the SED dataset. For
This paper describes the participation of CERTH in the “So-        those photos that have geotagging information associated
cial Event Detection Task @ MediaEval 2011”, which aims            with them (∼ 20% of the photos [1]), the classiﬁer simply
at discovering social events in a large photo collection. The      assigns the nearest city to the photo (geodesic distance is
task comprises two challenges: (i) identiﬁcation of soccer         used for ranking). For the non-geotagged photos, the clas-
events in the cities of Barcelona and Rome, and (ii) iden-         siﬁer employs a tag-based matching scheme to classify the
tiﬁcation of events taking place in two speciﬁc venues. We         photo to one of the cities: the classiﬁer counts the number of
adopt an approach that combines spatial and temporal ﬁl-           city-specific tags in the textual metadata (title, description)
ters with tag-based location classiﬁcation models and an ef-       of the photo for each dataset city and selects the one, with
ﬁcient photo clustering method. In our best runs, we achieve       which the photo shares the maximum number of city-speciﬁc
F-measure and NMI scores of 77.4% and 0.63 respectively            tags; the city tags are automatically derived from statisti-
for Challenge 1, and 64% and 0.38 for Challenge 2.                 cal analysis of tags of city photos collected independently
                                                                   from Flickr. If the classiﬁer assigns a photo to a city that is
Categories and Subject Descriptors                                 not of interest for the challenge at hand, then this photo is
H.3 [Information Search and Retrieval]                             not further considered (but is not excluded from the event
                                                                   expansion step described in subsection 2.3).
1. INTRODUCTION                                                    Subsequently, a ﬁner-grained classiﬁer is employed for se-
In this paper we present our system, experiments, and con-         lecting only the photos that are related to the topic/entity
clusions in the context of the MediaEval 2011 Social Event         of interest. For Challenge 1, a soccer classiﬁer was created,
Detection (SED) Task. The SED Task, which is described             while a venue classiﬁer was employed for Challenge 2. Both
in detail in [1], provides a collection of 73,645 tagged photos    classiﬁers rely on an approach similar to the one described
from Flickr and requests the detection of two types of so-         for the city classiﬁcation. In both cases, appropriate tag
cial events. Challenge 1 pertains to the detection of soccer       models (soccer model and venue models, one for each of the
events in the cities of Barcelona and Rome. Challenge 2 asks       venues speciﬁed by the task) were used that will be further
for events taking place in Paradiso (Amsterdam) and Parc           described in Section 3.
del Forum (Barcelona). The task considers a social event as
a group of photos capturing some aspect of a certain event.
Formally, given the collection P , {p} of photos, the task
asks for the detection of K events {Ei |Ei ⊂ P }, i = 1, ..., K.

2. SED APPROACH
We employed a common approach for tackling both chal-
lenges. Figure 1 illustrates its main steps: (a) photo ﬁlter-
ing, (b) event partitioning, and (c) event expansion.

2.1 Photo filtering
This step is implemented through the cascaded combination
of two classiﬁers. The ﬁrst classiﬁer is a city-level classiﬁer


Copyright is held by the author/owner(s).
MediaEval 2011 Workshop, September 1-2, 2011, Pisa, Italy                    Figure 1: Proposed SED approach
2.2 Event partitioning                                              run        p1     p2    p3    NMI       P        R        F
We deﬁne a single event by a date-place combination. For             1        m1,b    ttd   T    0.3742   57.66    62.50    59.98
that reason, we ﬁrst enumerate all unique dates that appear          2        m1,+     tt   ∅    0.5707   90.58    67.58    77.40
in the set of photos collected from the photo ﬁltering step          3        m1,+     tt   T    0.6180   90.58    67.58    77.40
described above. For each unique date, we consider a dis-            4        m1,+     tt   V    0.5748   89.18    67.58    76.89
tinct event, except for the dates for which there are photos         5        m1,+    ttd   T    0.6301   94.63    65.43    77.37
classiﬁed to more than one city of interest. For these dates,
one distinct event is considered for each diﬀerent city, with                   Table 1: Results for Challenge 1
which at least one photo is associated. At the end of this
step, a list of events is available and each event of this list         run     p1     p3    NMI       P         R        F
is associated with a set of photos.                                      1     mb,2    ∅    0.2516   51.36     48.85    50.08
                                                                         2     mb,2    T    0.2629   50.58     48.85    49.70
2.3 Event expansion                                                      3     mb,2    V    0.2527   51.27     48.85    50.03
Each event produced by the event partitioning step is en-                4     mb,2    H    0.2646   50.58     48.85    49.70
riched by making use of the metadata of the photos asso-                 5     m+,2    H    0.3796   54.31     77.90    64.00
ciated with it. A ﬁrst expansion is carried out by adding
photos of the same user at the same day of the event. Next,                     Table 2: Results for Challenge 2
photos with geotagging information that are located in the
vicinity of the event (within a radius of 200m) are also
added to the event under consideration. Finally, an addi-
                                                                   4.    DISCUSSION
                                                                   The ﬁrst important observation by studying the results in
tional list of photos related to the event are discovered by
                                                                   Tables 1 and 2 highlights the importance of using an ap-
means of clustering the photo collection and selecting the
                                                                   propriate tag model for photo classiﬁcation. A signiﬁcant
photos of the same cluster under the constraint that their
                                                                   improvement in all evaluation measures is achieved by use
creators/owners are already associated to the event through
                                                                   of an enriched tag model. For instance, in Challenge 1, this
at least one photo. The photo collection is clustered by
                                                                   is clearly visible by comparing runs 1 and 5, while in Chal-
means of a community detection scheme that is applied on
                                                                   lenge 2 it is demonstrated by comparing runs 4 and 5. This
a visual, tag or hybrid similarity graph [2].
                                                                   highlights the value of rich domain knowledge in the reliable
                                                                   detection of social events in photo collections. In addition,
3. EXPERIMENTS                                                     the use of description (ttd) in addition to the title and tags
We present a set of 10 experiments that evaluate the per-          of photos appears to improve the performance of our system
formance of the system under a variety of conﬁgurations.           when the extended soccer tag model is used (compare runs
Tables 1 and 2 summarize the results obtained from the             3 and 5 in Table 1).
oﬃcial submission of the ﬁve runs to Challenges 1 and 2 re-
spectively. In both tables, the run number along with the se-      Finally, the use of image clustering appears to be of limited
lected parameters are listed together with the achieved per-       value to the system. In Challenge 1, there is a modest im-
formance scores, Normalized Mutual Information (N M I),            provement in the obtained NMI when using the tag-based
Precision (P ), Recall (R) and F-measure (F ), which are de-       clusters (run 3 versus run 2), and a marginal improvement
scribed in [1]. For Challenge 1, all three parameters appear-      when usign the visual clusters. However, a slight drop in
ing in Figure 1 are studied, while for Challenge 2, only the       precision is observed when using the visual clusters (run 4
ﬁrst and third parameter are studied.                              versus run 2). Similar results are obtained for Challenge 2.
                                                                   We attribute this result to the fact that most of the potential
The ﬁrst parameter (p1 ) pertains to the tag model used for        gains of clustering are already captured by the user-based
ﬁltering out irrelevant photos. Two diﬀerent soccer tag mod-       event expansion (used in all runs).
els were used for Challenge 1, i.e. p1 ∈ {m1,b , m1,+ }, where
m1,b is the baseline soccer tag model containing generic soc-      In conclusion, the experiments indicate the importance of
cer tags as well as tags consisting of spanish and italian         textual metadata of photos in combination with rich do-
football club names. The extended tag soccer model (m1,+ )         main knowledge for the eﬀective detection of events in large
additionally contains alternative team names (e.g. “Blau-          photo collections. In the future, we plan a more comprehen-
grana” for Barcelona FC) and stadium names. For Challenge          sive analysis of false positives and false negatives in order to
2, a similar selection was available: the baseline venue tag       further improve the system performance.
model (m2,b ) consisted of few tags with generic music event
terms (e.g. “concert”, “gig”) as well as the names of the two      Acknowledgments
venues of interest. The extended model (m2,+ ) was enriched        This work has been supported by the GLOCAL EC project
with the names of the bands playing in these venues in May         under contract number FP7-248984.
2009 that were retrieved by use of the last.fm API.
The second parameter (p2 ) regards the use of description          5.    REFERENCES
along with the photo title/tags (p2 ∈ {tt, ttd}), ttd denoting     [1] S. Papadopoulos, R. Troncy, V. Mezaris, B. Huet, and
                                                                       I. Kompatsiaris. Social event detection at mediaeval 2011:
the use of description in addition to title/tags (tt). The third
                                                                       Challenges, dataset and evaluation. In MediaEval 2011
parameter (p3 ) regards the use of clustering for event expan-         Workshop, Pisa, Italy, September 1-2 2011.
sion (p3 ∈ {∅, T, V, H}), where the options of tag-based (T ),     [2] S. Papadopoulos, C. Zigkolis, Y. Kompatsiaris, and
visual (V ), hybrid (H) clusters (produced by graph-based              A. Vakali. Cluster-based landmark and event detection for
clustering on graphs comprising both tag-based and visual              tagged photo collections. Multimedia, IEEE, 18(1):52 – 63,
similarities) or no clusters at all (∅) were available.                jan. 2011.