=Paper=
{{Paper
|id=None
|storemode=property
|title=A Data-Driven Approach for Social Event Detection
|pdfUrl=https://ceur-ws.org/Vol-1043/mediaeval2013_submission_8.pdf
|volume=Vol-1043
|dblpUrl=https://dblp.org/rec/conf/mediaeval/RafailidisSLSD13
}}
==A Data-Driven Approach for Social Event Detection==
<pdf width="1500px">https://ceur-ws.org/Vol-1043/mediaeval2013_submission_8.pdf</pdf>
<pre>
         A Data-Driven Approach for Social Event Detection

                                                                                                                        ∗
      Dimitrios Rafailidis, Theodoros Semertzidis, Michalis Lazaridis, Michael G. Strintzis ,
                                         Petros Daras
             CERTH-Information Technologies Institute, 6th Km. Charilaou-Thermi, Thessaloniki, Greece
                          {drafail,theosem,lazar,daras}@iti.gr, strintzi@eng.auth.gr


ABSTRACT                                                          possible time by parallelizing the processes and performing
In this paper, we present a data-driven approach for chal-        a hierarchical-like clustering with single check merging of
lenge 1 of the MediaEval 2013 Social Event Detection Task.        clusters and without iterative procedures. Thus, we followed
Our proposed approach consists of the following steps: (a)        the proposed data-driven approach where photos of the same
initialization based on the images’ spatio-temporal informa-      time and place correspond to the same event. Moreover, in
tion; (b) computation of clusters’ intercorrelations; and (c)     case of missing time or the geolocation information the re-
the ﬁnal clusters’ generation. In the initialization step, the    maining image information is used.
images that have both geolocation and time information are
clustered analogously, where few “anchored” clusters are gen-
erated, while the rest of images with no geolocation or time
                                                                  2.   PROPOSED APPROACH
information are considered as singleton (one image) clusters.     In this Section, each step of our proposed approach is de-
In the second step, all pairwise intercorrelations between the    scribed in more detail.
“anchored” and the singleton clusters are calculated with the
help of an aggregated similarity measure based on the user,       Initialization Step Given N images in the database, we
title, description tag, and visual information of images. In      retrieve A ≤ N images that have both geolocation and time
the ﬁnal step, the “anchored” and singleton clusters derived      information, while the rest Csing = N − A images are con-
by the initialization step are merged based on the calculated     sidered as singleton clusters. The geolocation information is
intercorrelations of the second step to generate the ﬁnal clus-   provided as longitude and latitude coordinates, whereas in
ters. Our best run achieves a score of 0.5701, 0.8739 and         our approach we consider the date that the photo has been
0.5592 for F1-Measure, NMI and Divergence (F1), respec-           taken as the time information. Then, images A are clustered
tively.                                                           in two steps. In the ﬁrst step, two images I1 , I2 ∈ A that
                                                                  do not diﬀer by more than a predeﬁned threshold l are clus-
                                                                  tered together on the condition that both |I1long − I2long ≤ l|
1. INTRODUCTION                                                   and |I1lat − I2lat ≤ l| hold true, where Iilong and Iilat are
We hereby present the data-driven approach followed by the        the longitude and latitude coordinates of the i-th image. In
Visual Computing Lab (http://vcl.iti.gr) of CERTH at the          doing so, clusters C1 , . . . , Cr are generated. In the second
MediaEval 2013 Social Event Detection Task for challenge          step the generated clusters r are split based on the time
1. Details of the task are provided in the paper from Reuter      condition that images of a cluster should be within a prede-
et al. [3]. Previous works on the ﬁeld use techniques such as     ﬁned time window w. All generated clusters from the two
LDA of Vavliakis et al. [5] or spectral clustering of Petkos et   steps that contain images A are called “anchored clusters”
al. [1] that perform well on small sets but have high prepro-     Canc , in the sense that these clusters must not be merged
cessing requirements. Our initial motivation was to design        together, since the time condition is never satisﬁed. The
an approach that exploits the most of the available informa-      ﬁnal outcome of the initialization step is that (a) the Csing
tion while avoiding complex training algorithms and classi-       singleton clusters and (b) the Canc “anchored” clusters of
ﬁcation schemes that do not scale well. Towards this end,         the images A where each cluster Ci is associated with a
our initial goal was to process the dataset in the shortest       minimum Tmin Ci
                                                                                    and a maximum Tmax    Ci
                                                                                                              date based on the w
∗T. Semertzidis and M.G. Strintzis are also with the Infor-
                                                                                       Ci
                                                                  time window, i.e. Tmax     − TminCi
                                                                                                       ≤ w and Tmax
                                                                                                                  Ci    Ci
                                                                                                                     / Tmin is the
mation Processing Laboratory, Electrical and Computer En-         maximum/minimum date of an image within cluster Ci .
gineering Dept., Aristotle University of Thessaloniki, Greece
                                                                  Computation of Cluster Intercorrelations: In the sec-
                                                                  ond step of our approach, we only calculate all possible in-
                                                                  tercorrelations if two examined clusters Ci and Cj are (a)
                                                                  not both “anchored” clusters and (b) satisfy the time con-
                                                                  dition that at least one diﬀerence between the associated
                                                                          Ci     Ci     Cj     Cj
                                                                  dates Tmin  , Tmax , Tmin , Tmax is lower or equal than the
                                                                  time window w. In case that the time information of one
                                                                  singleton cluster Csing is missing, the aforementioned time
Copyright is held by the author/owner(s).
MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain    constraint is ignored. Then, if the time condition is satis-
                                                                  ﬁed, the intercorrelations between two clusters Ci and Cj
are computed as follows. First, each cluster Ci is associated
with the three textual vocabularies of tags, titles, descrip-                           Table 1: Results for challenge 1.
tions as well as with a list of users, which are the owners                              Run     F1-Score   NMI    DIV-F1
of the images of cluster Ci . For each cluster Ci , the dis-                          Req. Run    0.5698   0.8743 0.5049
tinct tags, titles, descriptions, and users form the respective                       Gen. Run
textual vocabularies and the list of users. For the textual                           1 (plain)   0.5631   0.8696 0.4921
information of tags, titles, descriptions we used a Jaccard                           2 (plain)   0.5665   0.8727 0.5006
similarity measure to calculate the textual similarity mea-                           3 (visual)  0.5662   0.8688 0.4917
sures S tags (Ci , Cj ), S desc (Ci , Cj ) and S title (Ci , Cj ) for each
                                                                                      4 (visual)  0.5701   0.8739 0.5025
pair Ci –Cj . In parallel, the cluster similarity S users (Ci , Cj )
based on the users is computed.
                                                                             between the results on the training and test set. All experi-
Using the visual information, we have a set of the Ivis (Ci )
                                                                             ments were conducted on our distributed environment in the
distinct visual neighbors of all images that belong to the
                                                                             context of the EC funded project CUBRIK (see Section 5).
same cluster Ci , by aggregating all the visual neighbors k.
Thus, the visual similarity between two clusters Ci and Cj
is calculated as:                                                            4.   DISCUSSION
                                                                             Based on the experimental results of Table 1, we observe
                                     |Ivis (Ci ) ∩ Ivis (Cj )|
             S visual (Ci , Cj ) =                                           that the visual information slightly improves the perfor-
                                     |Ivis (Ci ) ∪ Ivis (Cj )|               mance of the algorithm. It is not necessarily solving the
Finally, the intercorrelations between two clusters Ci and                   sparsity problem, which is detected by the many zero and
Cj are computed using the following aggregated similarity                    low values of the cluster intercorrelations. This happens be-
measure:                                                                     cause the k visual neighbors of each image are not deﬁnitely
                                                                             conceptually similar and thus add noise to the cluster in-
  S agg (Ci , Cj ) = a1 S users (Ci , Cj ) + a2 S tags (Ci , Cj ) + . . .    tercorrelations. This is a very important challenge for many
                                                                             content-based tag propagation methods that try to solve the
  +a3 S desc (Ci , Cj ) + a4 S title (Ci , Cj ) + a5 S visual (Ci , Cj )     sparsity problem between less annotated images. This issue
where each coeﬃcient a1 , a2 , a3 , a4 , a5 expresses the respec-            has been termed “learning tag relevance”, based on the se-
tive weight of each similarity measure.                                      mantic connections between the assigned tag (or any other
                                                                             textual information) and the content it represents. It must
Final Cluster Generation: The ﬁnal clusters are gen-                         be revealed to perform as accurate tag propagation. In the
erated based on the calculated intercorrelations. For each                   future, we plan to evaluate the proposed data-driven ap-
singleton cluster Csing the maximum intercorrelation with                    proach using our personalized content-based tag propaga-
an “anchored” cluster Canc is computed. If the condition                     tion method [2], in order to solve the extreme sparsity that
S agg (Csing , Canc ) ≤ Mthres , where Mthres is a merging thresh-           may occur between the cluster intercorrelations.
old, is satisﬁed then the two clusters are merged. After all
pair-wise comparisons between the singleton cluster and the                  5.   ACKNOWLEDGEMENTS
“anchored” ones, the non-merged singleton clusters are com-                  This work was partially supported by the EC FP7 funded
pared with each other in the same way and merged analo-                      project CUBRIK, ICT- 287704 (www.cubrikproject.eu).
gously which thus generates the ﬁnal clusters.
                                                                             6.   REFERENCES
3. EXPERIMENTS                                                               [1] G. Petkos, S. Papadopoulos, and Y. Kompatsiaris.
A grid selection strategy was used to compute the optimal                        Social event detection using multimodal clustering and
values of l = 0.05, w = 24 hours, a1 = 0.5, and a2...5 = 0.125.                  integrating supervisory signals. In ICMR. ACM, 2012.
In order to retrieve the visual neighbors (k), we used Op-                   [2] D. Rafailidis, A. Axenopoulos, J. Etzold,
ponent SIFT [4] with a codebook of 1004 dimensions. The                          S. Manolopoulou, and P. Daras. Content-based tag
number of visual neighbors has been set to 20. By varying                        propagation and tensor factorization for personalized
the merging threshold Mthres our best run in the training                        item recommendation based on social tagging. ACM
set with Mthres =0.4 achieves a F1-Measure of 0.8889, a NMI                      Trans. Interact. Intell. Syst., to appear.
of 0.9771, and a DIV-F1 of 0.8076.                                           [3] T. Reuter, S. Papadopoulos, V. Mezaris, P. Cimiano,
                                                                                 C. de Vries, and S. Geva. Social Event Detection at
The experimental results for the test set are presented in                       MediaEval 2013: Challenges, datasets, and evaluation.
Table 1. For the required run the merging threshold Mthres                       In MediaEval 2013 Workshop, Barcelona, Spain,
is set to 0.003. For the general runs (1-4) it is set to 0.01,                   October 18-19 2013.
0.005, 0.01 and 0.005, respectively. The remark “visual” de-                 [4] K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek.
notes that visual information was used in addition. The                          Evaluating color descriptors for object and scene
reason for using low values of Mthres in our runs in the test                    recognition. IEEE Trans. on PAMI, 32(9):1582–1596,
set is due to the majority of the calculated intercorrelations                   2010.
which were lower than 0.01, whereas the respective inter-                    [5] K. N. Vavliakis, F. A. Tzima, and P. A. Mitkas. Event
correlations in the training set were lower than 0.4. Higher                     detection via lda for the mediaeval 2012 sed task. In
values of Mthres in the test set generated many singleton                        MediaEval Workshop, 2012.
clusters. The unknown number of clusters and the extremely
low cluster intercorrelations can explain the high diﬀerence

</pre>