<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Camille Guinaudeau</string-name>
          <email>guinaudeau@limsi.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antoine Laurent</string-name>
          <email>laurent@limsi.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hervé Bredin</string-name>
          <email>bredin@limsi.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LIMSI-CNRS, Rue John Von Neumann</institution>
          ,
          <addr-line>91400 Orsay</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University Paris-Sud</institution>
          ,
          <addr-line>Rue du Château, 91400 Orsay</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>This paper provides an overview of the Social Event Detection (SED) system developed at LIMSI for the 2014 campaign. Our approach is based on a hierarchical agglomerative clustering that uses textual metadata, user-based knowledge and geographical information. These di erent sources of knowledge, either used separately or in cascade, reach good results for the full clustering subtask with a normalized mutual information equal to 0.95 and F1 scores greater than 0.82 for our best run.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The Social Event Detection (SED) task aims at mining
social events (such as concerts, protest and so on) in large
collections of online multimedia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This challenge is divided
into two subtasks: detection by full clustering and retrieval
of events. In this work, we focus only on the rst subtask
which consists in clustering all images in the given dataset,
so that each cluster represents a social event. As the number
of the target clusters is not provided by the SED organizers,
the main di culty of this substak is to infer this number. To
overcome this di culty, our full clustering system relies on
a hierarchical agglomerative clustering approach that works
by continuously merging the most similar clusters (as long as
the similarity between them is higher than some threshold).
      </p>
      <p>In this work, our system is only based on the metadata
associated with images. The hierarchical clustering approach,
presented in Section 2.2, is based on distance matrices
accounting for textual metadata (Section 2.3.1) or
geographical information (Section 2.3.2). In order to make the
distance computation as robust as possible and avoid data
sparsity, a rst preliminary clustering is performed on the
dataset, so that each preliminary cluster is associated with
a set of metadata coming from all the images in the cluster.
This preliminary clustering is described in Section 2.1.</p>
    </sec>
    <sec id="sec-2">
      <title>FULL CLUSTERING</title>
      <p>The development dataset released by the SED task
organizers for the full clustering subtask is composed by 362,578
images collected from Flickr, associated with their
metadata. To lower the computation time and facilitate our
experimentation process, this development dataset was divided
into three smaller datasets Dev A, Dev B and Dev C so that
each dataset has approximately the same number of clusters
and the same distribution in terms of number of images per
cluster. Moreover, the number of images in each cluster is
also quite similar to the number of images contained in the
test set (110,541) allowing us to experiment on our approach
with comparable datasets. As explained in the introduction,
a preliminary clustering was rst applied on these datasets
{ to create user-based clusters of images { before the
hierarchical clustering step that makes use of textual metadata
and/or geographical information.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>User-based clustering</title>
      <p>The preliminary clustering is obtained by creating one
cluster per user, using the user name associated with each
image in the dataset. As these user-based clusters usually
contain several social events (one user is rarely associated
with only one social event) they are then divided into smaller
ones depending on date and time information also mentioned
in the pictures' metadata.</p>
      <p>For each cluster, we initialize a time reference with the
date of a randomly chosen picture, hereafter called picture
F , in the cluster. We then compare this time reference with
the date of all the other pictures in the cluster and we pick
the closest one, hereafter called picture C. If picture C is
taken less than 1 hours before or after the time reference,
then it belongs in the same cluster as picture F and the
time reference is recomputed to equal the mean between the
previous time reference and the date of picture C.
However, if picture C was taken too far in time from picture F ,
then another cluster is created with a new time reference
corresponding to the taken time of picture C.</p>
      <p>
        The objective of this step is to lower the number of
clusters to be processed in the hierarchical clustering step while
keeping the clusters as pure as possible. To evaluate the
purity of our clusters we use the homogeneity metric [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which
equals one when each cluster contains only members of a
1All parameters were tuned on Dev A.
      </p>
      <p>F1 (Main Score)</p>
      <p>NMI
Divergence F1</p>
      <p>Dev A
0.7895
0.9479
0.6880</p>
      <p>Dev B
0.7869
0.9472
0.7258
single class. Table 1 summarizes the homogeneity scores for
the development datasets Dev A, Dev B and Dev C with
the parameter ranging from 1 hour to 100 hours. It can
be seen from this table that, rst, the values are similar for
all the datasets and, second, that the homogenity values are
high even if the parameter equals 100 meaning that users
tend not to participate to social events very frequently.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Hierarchical clustering approach</title>
      <p>The hierarchical agglomerative clustering approach begins
with the set of clusters that have been de ned in the
preliminary clustering presented in the previous section and is
based on a single-linkage clustering method. When two
clusters u and v from this set are combined into a single cluster
w, u and v are removed from the set, and w is added to the
set. When only one cluster remains, the algorithm stops. To
decide whether two clusters have to be combined, a distance
matrix is maintained at each iteration where the d[u; v]
entry corresponds to the distance between cluster u and v.
At each iteration, the algorithm must update the distance
matrix to re ect the distance of the newly formed cluster w
with the remaining clusters in the set. The distance between
the newly formed cluster w and each v0 is computed with
the following equation
d(w; v0) = min(dist(w[i]; v0[j]))
(1)
for all images i in cluster w and j in cluster v0.</p>
      <p>The nal clustering is then obtained by forming at
clusters from the hierarchical clustering previously de ned. A
threshold is used so that observations in each at cluster
have no intergroup distance greater than .
2.3</p>
    </sec>
    <sec id="sec-5">
      <title>Distance matrices</title>
      <p>In this last part, we describe how the distance matrices
used in the hierarchical agglomerative clustering approach
are computed. Both use the metadata associated with the
pictures, namely textual metadata and geographical
information.
2.3.1</p>
      <sec id="sec-5-1">
        <title>Textual metadata distance matrix</title>
        <p>
          To compute the textual distance, each cluster is
represented by a vector composed by lemmas weighted with a
BM25 score. A cosine distance is then computed between
two vectors to estimate the distance between the two
corresponding clusters. To create the vectors, words are
extracted from the textual metadata associated with each
picture in the cluster. These words are then lemmatized and
only nouns, adjectives and non modal verbs are kept to
characterize the cluster. Each lemma in the vector is nally
associated with a score computed using the BM25 weighting
function [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] that gives a weight close to 1 to lemmas that are
the most representative of the content of the cluster. In our
system, lemmas are extracted from title, description and,
when available, tags metadata.
2.3.2
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Geographic distance matrix</title>
        <p>We also compute the geographic distance between every
cluster that contains at least one picture with GPS
information. The distance between two clusters u and v corresponds
to the minimum geographic distance between any picture
from cluster u and any picture from cluster v. Moreover, in
order to avoid the merging of events that take place in the
same location but at di erent time (such as festivals that
take place at the same location every year), we also prevent
the merging of two clusters if their associated date is greater
than a certain threshold (48h) by arti cially increasing their
geographical distance.
3.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTS AND RESULTS</title>
      <p>For the full clustering subtask, each participant was
allowed to submit up to 5 runs. In this section we describe
our runs and discuss the results obtained. All the
submitted runs are based on the preliminary clustering, that uses
an parameter equal to 20 hours, 24 hours or 30 hours.
The hierarchical clustering is then obtained thanks to the
textual metadata only (Text), the geographical information
only (Geo) or both sources of knowledge (Geo-Text). In
this latter case, the combination is done in cascade, meaning
that a hierarchical clustering is rst performed thanks to the
geographical information and then a hierarchical clustering
based on text is applied on the result of the geographical
clustering. From Table 2 it can be seen that the
combination gives the best results with F1 scores greater than
0.8 and normalized mutual information greater than 0.95.
We can also see that combining both sources of information
(24h-Geo-Text) improves the metric values compared with
information used alone (24h-Text and 24h-Geo). Finally the
left part of the table presents the results obtained on our
three development datasets. We can notice from these
numbers that the proposed approach is robust and gives similar
results on both the development and test sets.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Georgios</given-names>
            <surname>Petkos</surname>
          </string-name>
          , Symeon Papadopoulos, Vasileios Mezaris, and
          <string-name>
            <given-names>Yiannis</given-names>
            <surname>Kompatsiaris</surname>
          </string-name>
          .
          <source>Social Event Detection at MediaEval</source>
          <year>2014</year>
          :
          <article-title>Challenges, Datasets, and Evaluation</article-title>
          .
          <source>In Working Notes Proceedings of the MediaEval 2014 Workshop</source>
          , Barcelona, Spain,
          <year>October 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Stephen</surname>
            <given-names>E Robertson</given-names>
          </string-name>
          , Steve Walker, Susan Jones,
          <string-name>
            <surname>Micheline M Hancock-Beaulieu</surname>
            ,
            <given-names>Mike</given-names>
          </string-name>
          <string-name>
            <surname>Gatford</surname>
          </string-name>
          , et al.
          <article-title>Okapi at TREC-3</article-title>
          . NIST Special Publication, pages
          <volume>109</volume>
          {
          <fpage>109</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Rosenberg</surname>
          </string-name>
          and
          <string-name>
            <given-names>Julia</given-names>
            <surname>Hirschberg</surname>
          </string-name>
          .
          <article-title>V-measure: A conditional entropy-based external cluster evaluation measure</article-title>
          .
          <source>In EMNLP-CoNLL</source>
          , volume
          <volume>7</volume>
          , pages
          <fpage>410</fpage>
          {
          <fpage>420</fpage>
          .
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>