=Paper= {{Paper |id=Vol-1263/paper45 |storemode=property |title=LIMSI @ MediaEval SED 2014 |pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_45.pdf |volume=Vol-1263 |dblpUrl=https://dblp.org/rec/conf/mediaeval/GuinaudeauLB14 }} ==LIMSI @ MediaEval SED 2014== https://ceur-ws.org/Vol-1263/mediaeval2014_submission_45.pdf
                                    LIMSI @ MediaEval SED 2014

                             Camille Guinaudeau1,2 , Antoine Laurent2 , Hervé Bredin2
                                      1
                                          University Paris-Sud, Rue du Château, 91400 Orsay
                                     2
                                          LIMSI-CNRS, Rue John Von Neumann, 91400 Orsay
                                             {guinaudeau,laurent,bredin}@limsi.fr



ABSTRACT
                                                                         Table 1: Homogeneity for user-based clustering
This paper provides an overview of the Social Event Detec-                             Dev A Dev B Dev C
tion (SED) system developed at LIMSI for the 2014 cam-                             1h  0.9874   0.9872  0.9874
paign. Our approach is based on a hierarchical agglomera-                          10h 0.9813   0.9796  0.9798
tive clustering that uses textual metadata, user-based knowl-                      20h 0.9785   0.9766  0.9770
edge and geographical information. These different sources                         24h 0.9777   0.9755  0.9757
of knowledge, either used separately or in cascade, reach
                                                                                   30h 0.9763   0.9743  0.9749
good results for the full clustering subtask with a normal-
                                                                                  100h 0.9678   0.9673  0.9665
ized mutual information equal to 0.95 and F1 scores greater
than 0.82 for our best run.
                                                                    each dataset has approximately the same number of clusters
1.    INTRODUCTION                                                  and the same distribution in terms of number of images per
   The Social Event Detection (SED) task aims at mining             cluster. Moreover, the number of images in each cluster is
social events (such as concerts, protest and so on) in large        also quite similar to the number of images contained in the
collections of online multimedia [1]. This challenge is divided     test set (110,541) allowing us to experiment on our approach
into two subtasks: detection by full clustering and retrieval       with comparable datasets. As explained in the introduction,
of events. In this work, we focus only on the first subtask         a preliminary clustering was first applied on these datasets
which consists in clustering all images in the given dataset,       – to create user-based clusters of images – before the hier-
so that each cluster represents a social event. As the number       archical clustering step that makes use of textual metadata
of the target clusters is not provided by the SED organizers,       and/or geographical information.
the main difficulty of this substak is to infer this number. To
overcome this difficulty, our full clustering system relies on      2.1      User-based clustering
a hierarchical agglomerative clustering approach that works            The preliminary clustering is obtained by creating one
by continuously merging the most similar clusters (as long as       cluster per user, using the user name associated with each
the similarity between them is higher than some threshold).         image in the dataset. As these user-based clusters usually
   In this work, our system is only based on the metadata as-       contain several social events (one user is rarely associated
sociated with images. The hierarchical clustering approach,         with only one social event) they are then divided into smaller
presented in Section 2.2, is based on distance matrices ac-         ones depending on date and time information also mentioned
counting for textual metadata (Section 2.3.1) or geographi-         in the pictures’ metadata.
cal information (Section 2.3.2). In order to make the dis-             For each cluster, we initialize a time reference with the
tance computation as robust as possible and avoid data              date of a randomly chosen picture, hereafter called picture
sparsity, a first preliminary clustering is performed on the        F , in the cluster. We then compare this time reference with
dataset, so that each preliminary cluster is associated with        the date of all the other pictures in the cluster and we pick
a set of metadata coming from all the images in the cluster.        the closest one, hereafter called picture C. If picture C is
This preliminary clustering is described in Section 2.1.            taken less than α1 hours before or after the time reference,
                                                                    then it belongs in the same cluster as picture F and the
                                                                    time reference is recomputed to equal the mean between the
2.    FULL CLUSTERING                                               previous time reference and the date of picture C. How-
  The development dataset released by the SED task orga-            ever, if picture C was taken too far in time from picture F ,
nizers for the full clustering subtask is composed by 362,578       then another cluster is created with a new time reference
images collected from Flickr, associated with their meta-           corresponding to the taken time of picture C.
data. To lower the computation time and facilitate our ex-             The objective of this step is to lower the number of clus-
perimentation process, this development dataset was divided         ters to be processed in the hierarchical clustering step while
into three smaller datasets Dev A, Dev B and Dev C so that          keeping the clusters as pure as possible. To evaluate the pu-
                                                                    rity of our clusters we use the homogeneity metric [3] which
                                                                    equals one when each cluster contains only members of a
Copyright is held by the author/owner(s).                           1
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain          All parameters were tuned on Dev A.
                                                    Table 2: Results on test set
                          Dev A      Dev B      Dev C 20h-Geo-Text 24h-Geo-Text      30h-Geo-Text      24h-Text    24h-Geo
      F1 (Main Score)     0.7895     0.7869     0.7912     0.8214         0.8140        0.8115          0.7563      0.7387
           NMI            0.9479     0.9472     0.9483     0.9554         0.9532        0.9526          0.9423      0.9359
       Divergence F1      0.6880     0.7258     0.7224     0.8207         0.8132        0.8107          0.7557      0.7380


single class. Table 1 summarizes the homogeneity scores for        2.3.2   Geographic distance matrix
the development datasets Dev A, Dev B and Dev C with                 We also compute the geographic distance between every
the α parameter ranging from 1 hour to 100 hours. It can          cluster that contains at least one picture with GPS informa-
be seen from this table that, first, the values are similar for   tion. The distance between two clusters u and v corresponds
all the datasets and, second, that the homogenity values are      to the minimum geographic distance between any picture
high even if the α parameter equals 100 meaning that users        from cluster u and any picture from cluster v. Moreover, in
tend not to participate to social events very frequently.         order to avoid the merging of events that take place in the
                                                                  same location but at different time (such as festivals that
2.2     Hierarchical clustering approach                          take place at the same location every year), we also prevent
   The hierarchical agglomerative clustering approach begins      the merging of two clusters if their associated date is greater
with the set of clusters that have been defined in the pre-       than a certain threshold (48h) by artificially increasing their
liminary clustering presented in the previous section and is      geographical distance.
based on a single-linkage clustering method. When two clus-
ters u and v from this set are combined into a single cluster
w, u and v are removed from the set, and w is added to the
                                                                  3.   EXPERIMENTS AND RESULTS
set. When only one cluster remains, the algorithm stops. To          For the full clustering subtask, each participant was al-
decide whether two clusters have to be combined, a distance       lowed to submit up to 5 runs. In this section we describe
matrix is maintained at each iteration where the d[u, v] en-      our runs and discuss the results obtained. All the submit-
try corresponds to the distance between cluster u and v.          ted runs are based on the preliminary clustering, that uses
At each iteration, the algorithm must update the distance         an α parameter equal to 20 hours, 24 hours or 30 hours.
matrix to reflect the distance of the newly formed cluster w      The hierarchical clustering is then obtained thanks to the
with the remaining clusters in the set. The distance between      textual metadata only (Text), the geographical information
the newly formed cluster w and each v 0 is computed with          only (Geo) or both sources of knowledge (Geo-Text). In
the following equation                                            this latter case, the combination is done in cascade, meaning
                                                                  that a hierarchical clustering is first performed thanks to the
               d(w, v 0 ) = min(dist(w[i], v 0 [j]))       (1)    geographical information and then a hierarchical clustering
                                                       0          based on text is applied on the result of the geographical
for all images i in cluster w and j in cluster v .
                                                                  clustering. From Table 2 it can be seen that the combi-
  The final clustering is then obtained by forming flat clus-
                                                                  nation gives the best results with F1 scores greater than
ters from the hierarchical clustering previously defined. A
                                                                  0.8 and normalized mutual information greater than 0.95.
threshold θ is used so that observations in each flat cluster
                                                                  We can also see that combining both sources of information
have no intergroup distance greater than θ.
                                                                  (24h-Geo-Text) improves the metric values compared with
2.3     Distance matrices                                         information used alone (24h-Text and 24h-Geo). Finally the
                                                                  left part of the table presents the results obtained on our
  In this last part, we describe how the distance matrices
                                                                  three development datasets. We can notice from these num-
used in the hierarchical agglomerative clustering approach
                                                                  bers that the proposed approach is robust and gives similar
are computed. Both use the metadata associated with the
                                                                  results on both the development and test sets.
pictures, namely textual metadata and geographical infor-
mation.
                                                                  4.   REFERENCES
2.3.1     Textual metadata distance matrix                        [1] Georgios Petkos, Symeon Papadopoulos, Vasileios
  To compute the textual distance, each cluster is repre-             Mezaris, and Yiannis Kompatsiaris. Social Event
sented by a vector composed by lemmas weighted with a                 Detection at MediaEval 2014: Challenges, Datasets,
BM25 score. A cosine distance is then computed between                and Evaluation. In Working Notes Proceedings of the
two vectors to estimate the distance between the two cor-             MediaEval 2014 Workshop, Barcelona, Spain, October
responding clusters. To create the vectors, words are ex-             2014.
tracted from the textual metadata associated with each pic-       [2] Stephen E Robertson, Steve Walker, Susan Jones,
ture in the cluster. These words are then lemmatized and              Micheline M Hancock-Beaulieu, Mike Gatford, et al.
only nouns, adjectives and non modal verbs are kept to char-          Okapi at TREC-3. NIST Special Publication, pages
acterize the cluster. Each lemma in the vector is finally as-         109–109, 1995.
sociated with a score computed using the BM25 weighting           [3] Andrew Rosenberg and Julia Hirschberg. V-measure: A
function [2] that gives a weight close to 1 to lemmas that are        conditional entropy-based external cluster evaluation
the most representative of the content of the cluster. In our         measure. In EMNLP-CoNLL, volume 7, pages 410–420.
system, lemmas are extracted from title, description and,             Citeseer, 2007.
when available, tags metadata.