<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Data-Driven Approach for Social Event Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dimitrios Rafailidis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Theodoros Semertzidis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michalis Lazaridis</string-name>
          <email>lazar@iti.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael G. Strintzis</string-name>
          <email>strintzi@eng.auth.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petros Daras</string-name>
          <email>daras@iti.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CERTH-Information Technologies Institute</institution>
          ,
          <addr-line>6th Km. Charilaou-Thermi, Thessaloniki</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>T. Semertzidis and M.G. Strintzis are also with the Information Processing Laboratory, Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <fpage>18</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>In this paper, we present a data-driven approach for challenge 1 of the MediaEval 2013 Social Event Detection Task. Our proposed approach consists of the following steps: (a) initialization based on the images' spatio-temporal information; (b) computation of clusters' intercorrelations; and (c) the final clusters' generation. In the initialization step, the images that have both geolocation and time information are clustered analogously, where few “anchored” clusters are generated, while the rest of images with no geolocation or time information are considered as singleton (one image) clusters. In the second step, all pairwise intercorrelations between the “anchored” and the singleton clusters are calculated with the help of an aggregated similarity measure based on the user, title, description tag, and visual information of images. In the final step, the “anchored” and singleton clusters derived by the initialization step are merged based on the calculated intercorrelations of the second step to generate the final clusters. Our best run achieves a score of 0.5701, 0.8739 and 0.5592 for F1-Measure, NMI and Divergence (F1), respectively.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        We hereby present the data-driven approach followed by the
Visual Computing Lab (http://vcl.iti.gr) of CERTH at the
MediaEval 2013 Social Event Detection Task for challenge
1. Details of the task are provided in the paper from Reuter
et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Previous works on the field use techniques such as
LDA of Vavliakis et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or spectral clustering of Petkos et
al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that perform well on small sets but have high
preprocessing requirements. Our initial motivation was to design
an approach that exploits the most of the available
information while avoiding complex training algorithms and
classification schemes that do not scale well. Towards this end,
our initial goal was to process the dataset in the shortest
      </p>
      <p>Initialization Step Given N images in the database, we
retrieve A N images that have both geolocation and time
information, while the rest Csing = N A images are
considered as singleton clusters. The geolocation information is
provided as longitude and latitude coordinates, whereas in
our approach we consider the date that the photo has been
taken as the time information. Then, images A are clustered
in two steps. In the first step, two images I1, I2 2 A that
do not differ by more than a predefined threshold l are
clustered together on the condition that both jI1long I2long lj
and jI1lat I2lat lj hold true, where Iilong and Iilat are
the longitude and latitude coordinates of the i-th image. In
doing so, clusters C1; : : : ; Cr are generated. In the second
step the generated clusters r are split based on the time
condition that images of a cluster should be within a
predefined time window w. All generated clusters from the two
steps that contain images A are called “anchored clusters”
Canc, in the sense that these clusters must not be merged
together, since the time condition is never satisfied. The
final outcome of the initialization step is that (a) the Csing
singleton clusters and (b) the Canc “anchored” clusters of
the images A where each cluster Ci is associated with a
minimum TmCiin and a maximum TmCaix date based on the w
time window, i.e. TmCaix TmCiin w and TmCaix/ TmCiin is the
maximum/minimum date of an image within cluster Ci.
Computation of Cluster Intercorrelations: In the
second step of our approach, we only calculate all possible
intercorrelations if two examined clusters Ci and Cj are (a)
not both “anchored” clusters and (b) satisfy the time
condition that at least one difference between the associated
dates TmCiin, TmCaix, TmCijn, Tmax is lower or equal than the</p>
      <p>Cj
time window w. In case that the time information of one
singleton cluster Csing is missing, the aforementioned time
constraint is ignored. Then, if the time condition is
satisfied, the intercorrelations between two clusters Ci and Cj
are computed as follows. First, each cluster Ci is associated
with the three textual vocabularies of tags, titles,
descriptions as well as with a list of users, which are the owners
of the images of cluster Ci. For each cluster Ci, the
distinct tags, titles, descriptions, and users form the respective
textual vocabularies and the list of users. For the textual
information of tags, titles, descriptions we used a Jaccard
similarity measure to calculate the textual similarity
measures Stags(Ci; Cj), Sdesc(Ci; Cj) and Stitle(Ci; Cj ) for each
pair Ci–Cj. In parallel, the cluster similarity Susers(Ci; Cj)
based on the users is computed.</p>
      <p>Using the visual information, we have a set of the Ivis(Ci)
distinct visual neighbors of all images that belong to the
same cluster Ci, by aggregating all the visual neighbors k.
Thus, the visual similarity between two clusters Ci and Cj
is calculated as:
Finally, the intercorrelations between two clusters Ci and
Cj are computed using the following aggregated similarity
measure:</p>
      <p>Sagg(Ci; Cj) = a1Susers(Ci; Cj) + a2Stags(Ci; Cj) + : : :
+a3Sdesc(Ci; Cj) + a4Stitle(Ci; Cj) + a5Svisual(Ci; Cj)
where each coefficient a1; a2; a3; a4; a5 expresses the
respective weight of each similarity measure.</p>
      <p>Final Cluster Generation: The final clusters are
generated based on the calculated intercorrelations. For each
singleton cluster Csing the maximum intercorrelation with
an “anchored” cluster Canc is computed. If the condition
Sagg(Csing; Canc) Mthres, where Mthres is a merging
threshold, is satisfied then the two clusters are merged. After all
pair-wise comparisons between the singleton cluster and the
“anchored” ones, the non-merged singleton clusters are
compared with each other in the same way and merged
analogously which thus generates the final clusters.</p>
    </sec>
    <sec id="sec-2">
      <title>3. EXPERIMENTS</title>
      <p>
        A grid selection strategy was used to compute the optimal
values of l = 0:05, w = 24 hours, a1 = 0:5, and a2:::5 = 0:125.
In order to retrieve the visual neighbors (k), we used
Opponent SIFT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] with a codebook of 1004 dimensions. The
number of visual neighbors has been set to 20. By varying
the merging threshold Mthres our best run in the training
set with Mthres=0.4 achieves a F1-Measure of 0.8889, a NMI
of 0.9771, and a DIV-F1 of 0.8076.
      </p>
      <p>The experimental results for the test set are presented in
Table 1. For the required run the merging threshold Mthres
is set to 0.003. For the general runs (1-4) it is set to 0.01,
0.005, 0.01 and 0.005, respectively. The remark “visual”
denotes that visual information was used in addition. The
reason for using low values of Mthres in our runs in the test
set is due to the majority of the calculated intercorrelations
which were lower than 0.01, whereas the respective
intercorrelations in the training set were lower than 0.4. Higher
values of Mthres in the test set generated many singleton
clusters. The unknown number of clusters and the extremely
low cluster intercorrelations can explain the high difference
between the results on the training and test set. All
experiments were conducted on our distributed environment in the
context of the EC funded project CUBRIK (see Section 5).</p>
    </sec>
    <sec id="sec-3">
      <title>4. DISCUSSION</title>
      <p>
        Based on the experimental results of Table 1, we observe
that the visual information slightly improves the
performance of the algorithm. It is not necessarily solving the
sparsity problem, which is detected by the many zero and
low values of the cluster intercorrelations. This happens
because the k visual neighbors of each image are not definitely
conceptually similar and thus add noise to the cluster
intercorrelations. This is a very important challenge for many
content-based tag propagation methods that try to solve the
sparsity problem between less annotated images. This issue
has been termed “learning tag relevance”, based on the
semantic connections between the assigned tag (or any other
textual information) and the content it represents. It must
be revealed to perform as accurate tag propagation. In the
future, we plan to evaluate the proposed data-driven
approach using our personalized content-based tag
propagation method [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], in order to solve the extreme sparsity that
may occur between the cluster intercorrelations.
      </p>
    </sec>
    <sec id="sec-4">
      <title>5. ACKNOWLEDGEMENTS</title>
      <p>This work was partially supported by the EC FP7 funded
project CUBRIK, ICT- 287704 (www.cubrikproject.eu).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Petkos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kompatsiaris</surname>
          </string-name>
          .
          <article-title>Social event detection using multimodal clustering and integrating supervisory signals</article-title>
          .
          <source>In ICMR. ACM</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rafailidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Axenopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Etzold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Manolopoulou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Daras</surname>
          </string-name>
          .
          <article-title>Content-based tag propagation and tensor factorization for personalized item recommendation based on social tagging</article-title>
          .
          <source>ACM Trans. Interact</source>
          . Intell. Syst., to appear.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Reuter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mezaris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          , C. de Vries, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Geva</surname>
          </string-name>
          . Social Event Detection at MediaEval 2013:
          <article-title>Challenges, datasets, and evaluation</article-title>
          . In MediaEval 2013 Workshop, Barcelona, Spain, October
          <volume>18</volume>
          -19
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>K. E. A. van de Sande</surname>
            , T. Gevers, and
            <given-names>C. G. M.</given-names>
          </string-name>
          <string-name>
            <surname>Snoek</surname>
          </string-name>
          .
          <article-title>Evaluating color descriptors for object and scene recognition</article-title>
          .
          <source>IEEE Trans. on PAMI</source>
          ,
          <volume>32</volume>
          (
          <issue>9</issue>
          ):
          <fpage>1582</fpage>
          -
          <lpage>1596</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Vavliakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Tzima</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Mitkas</surname>
          </string-name>
          .
          <article-title>Event detection via lda for the mediaeval 2012 sed task</article-title>
          .
          <source>In MediaEval Workshop</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>