=Paper=
{{Paper
|id=Vol-1263/paper58
|storemode=property
|title=UPC at MediaEval 2014 Social Event Detection Task
|pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_58.pdf
|volume=Vol-1263
|dblpUrl=https://dblp.org/rec/conf/mediaeval/Manchon-VizueteGN14
}}
==UPC at MediaEval 2014 Social Event Detection Task==
<pdf width="1500px">https://ceur-ws.org/Vol-1263/mediaeval2014_submission_58.pdf</pdf>
<pre>
        UPC at MediaEval 2014 Social Event Detection Task

                  Daniel Manchon-Vizuete, Irene Gris-Sarabia and Xavier Giro-i-Nieto
                                             Universitat Politecnica de Catalunya
                                                    Barcelona, Catalonia
                                                   xavier.giro@upc.edu


ABSTRACT                                                          tags have a semantic dimension that was not captured in our
This working notes paper presents the contribution of the         previous approach, where we compared tag similarity solely
UPC team to the Social Event Detection (SED) Subtask 1            based on the string of characters, with no semantic inter-
in MediaEval 2014. This contribution extends the solution         pretation at all. This approach fails into detecting related
tested in the previous year with a better optimization of         tags and synonyms, as they are considered as completely
the parameters that determine the clustering algorithm, and       different.
by introducing an additional pass that considers the merges          In this MediaEval SED 2014 edition, we addressed these
of all pairs of mini-clusters generated during the two first      problems by assessing the mergings between mini-clusters
passes. Our proposal also addresses the problem of incom-         based on the textual tags only. This way it was not nec-
plete metadata by generating additional textual tags based        essary to define and train different merging strategies de-
on geolocation and natural language processing techniques.        pending on the available metadata. This approach required
                                                                  the introduction of strategies to populate the tags associated
                                                                  to the photos from the available data. In addition, in this
1.   MOTIVATION                                                   year’s submission, we also optimized the merging of mini-
   This document describes the algorithms tested by the           clusterings by adopting a three-pass approach, instead of a
UPC team in the MediaEval 2014 Social Event Detection             two-pass one as in the previous year.
(SED) Subtask 1, which addressed the problem of full clus-           This paper is structured as follows. Section 2 presents the
tering of a photo collection. The reader is referred to [6] for   novelties that have been introduced this year compared to
the task description in MediaEval 2014 and further details        our previous submission. The performance of the solution
about the study case, dataset and metrics .                       is assessed in Section 3 with the results obtained on the
   The proposed approach extends our submission in the pre-       MediaEval SED 2014 task. Finally, Section 4 provides the
vious MediaEval SED 2013 [5]. That work solved the Full           insights learned with this work.
Clustering subtask by firstly generating a temporal-based
over-segmentation of the photo collection as proposed by
PhotoTOC [7]. In a second pass, the mini-clusters in a
                                                                  2.    APPROACH
close temporal neighbourhood were clustered based on ge-            This section presents the new advances we have intro-
olocation, textual tags and user ID metadata. That solution       duced in our algorithm with respect to the solution submit-
presented difficulties when addressing the realistic scenario     ted in MediaEval SED 2013 [5].
where many photos had missing metadata information and
noisy tags.                                                       2.1   Text-based cluster merging
   We observed that the images in the provided collection are        While in the previous year mini-clusters were merged with
heterogeneous in terms of the type of metadata they have          a similarity that combined geolocation, tags and user IDs, in
associated with. Some of the photos in the dataset contained      2014 we have used only the textual tags of the mini-clusters
geolocation data, while some others did not. On the other         to assess their merges. In addition, instead of using the
hand, users also present diverse behaviour regarding photo        Jaccard Index on the tags to assess the similarity between
tagging. While some users will provide textual tags to their      mini-clusters, this year we have switched to TF-IDF descrip-
pictures, others will just share the photos with no tags at       tors, compared with the cosine distance. These changes have
all.                                                              allowed a better estimation of the relevance of the different
   This situation generates an unbalanced description of the      terms with respect to others.
photos in the dataset. In our previous approach [5], we ap-
plied different similarity functions when assessing the merg-     2.2   Reverse geocoding
ing of mini-clusters, depending on which metadata was avail-        A first solution to increase the amount of textual tags was
able from the mini-clusters.                                      using the geolocation coordinates by obtaining the names
   On the other hand, when tags are present, there is also        of the locations where the photos were taken. This strat-
a problem related to natural language processing. Textual         egy is known as Reverse geocoding, and it allows transform-
                                                                  ing numerical coordinates for geolocation into a readable
                                                                  text description. Our system has used the MapQuest Open
Copyright is held by the author/owner(s).                         Geocoding API Web Service, which provides an easy access
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain    to data stored under the OpenStreeMap (OSM) project [3].
2.3    Tag expansion with hypernyms                               4.   CONCLUSIONS
   One of the main challenges when dealing with textual tags         The UPC contribution to the Social Event Detection task
is their semantic interpretation. The case of synonyms can-       in MediaEval 2014 has been an extension of the approach
not be captured with a character-based descriptor. It re-         previously tested in the 2013 edition [5]. The implemented
quires the introduction of some semantic information, typi-       algorithm provides a fast solution for photo clustering given
cally coded in an ontology.                                       its time-based first- and second- passes, which compare pho-
   Another novelty in this year has been the enrichment of        tos and mini-clusters in a local neighbourhood. A first im-
the tag set with the hypernyms, that is, those terms that         provement from the previous submission has been the op-
provide a generalization of each original term. This way          timization of the parameters which control the creation of
we have tried to capture better the synonym terms. The            these mini-clusters. The main novelty has been introduced
Natural Language Processing Toolkit (NLTK) [4] has been           in the merging of these mini-clusters which, instead of com-
used for this task, which exploits the knowledge contained        bining different modalities (tags, geolocation and user ID),
in WordNet [2].                                                   have been solely based on textual tags. Nevertheless, these
                                                                  textual tags have been extended in two directions: by re-
2.4    Three clustering passes                                    verse geocoding and with hypernyms. However, the results
   The creation of clusters begins with a sequential process      that have been obtained in the test dataset do not show
in the temporal dimension, as in our previous submission          any improvement by the introduction of the new tags. In
in 2013, which was inspired by the original PhotoTOC al-          fact, performance decreases, especially when using the hy-
gorithm [7]. This approach allows a fast converge of the          pernyms. This behaviour on the test dataset was unex-
problem, as clusters are initially only merged in a local time-   pected, as during the development of our solution with the
based neighbourhood.                                              training dataset, an improvement had been observed.
   In particular, our approach defines three passes. In the
first pass, the photos of each user are treated separately,       5.   ACKNOWLEDGMENTS
comparing the time stamps of the photos in a local neigh-
                                                                    This work has been developed in the framework of the
bourhood. In a second pass, the mini-clusters generated
                                                                  project TEC2013-43935-R, funded by the Spanish Ministerio
from the first pass are compared in also a small temporal
                                                                  de Economia y Competitividad and the European Regional
neighbourhood and a relaxed similarity threshold, assessed
                                                                  Development Fund (ERDF).
with the TF-IDF descriptors of each mini-cluster. At this
stage, the user ID associated to each photo is also added
as an additional textual tag. Finally, a third pass considers     6.   REFERENCES
all pairs of existing mini-clusters in a global neighborhood.     [1] J. Bergstra, D. Yamins, and D. D. Cox. Making a
However, in this case the similarity threshold for merging is         science of model search: Hyperparameter optimization
much more strict.                                                     in hundreds of dimensions for vision architectures. In
                                                                      ICML (1), volume 28 of JMLR Proceedings, pages
3.    EXPERIMENTS AND RESULTS                                         115–123. JMLR.org, 2013.
                                                                  [2] C. Fellbaum. WordNet. Wiley Online Library, 1998.
  The UPC participated in Subtask 1 with the results shown
in Table 1. The description of the different runs is the fol-     [3] M. Haklay and P. Weber. OpenStreetMap:
lowing:                                                               User-generated street maps. Pervasive Computing,
                                                                      IEEE, 7(4):12–18, 2008.
Run 1: No tags were added. The parameters that opti-              [4] E. Loper and S. Bird. NLTK: The natural language
    mized the clustering algorithm were found with hyper-             toolkit. In Proceedings of the ACL-02 Workshop on
    opt Python package [1].                                           Effective Tools and Methodologies for Teaching Natural
                                                                      Language Processing and Computational Linguistics -
Run 2: Configuration from Run 1 was modified by adding                Volume 1, ETMTNLP ’02, pages 63–70, Stroudsburg,
    textual tags obtained with reverse geolocation , as de-           PA, USA, 2002. Association for Computational
    scribed in Section 2.2.                                           Linguistics.
Run 3: Configuration from Run 1 was modified by adding            [5] D. Manchon Vizuete, I. Gris-Sarabia, and X. Giro-i
    textual tags obtained with the hypernyms provided by              Nieto. Photo clustering of social events by extending
    the NLTK Python package [4], as described in Section              PhotoTOC to a rich context. In Proc. Social Events in
    2.3.                                                              Web Multimedia (SEWN), 2014.
                                                                  [6] G. Petkos, S. Papadopoulos, V. Mezaris, and
Run 4: Configurations from Run 2 and Run 3 were com-                  Y. Kompatsiaris. Social Event Detection at MediaEval
    bined.                                                            2014: Challenges, datasets, and evaluation. In
                                                                      MediaEval 2014 Workshop, Barcelona, Spain, October
                                                                      16-17 2014.
          GeoTag     Hypernym        F1       NMI     Div. F1
                                                                  [7] J. C. Platt, M. Czerwinski, and B. Field. PhotoTOC:
 Run 1                             0.9240    0.9820   0.9231
                                                                      automatic clustering for browsing personal
 Run 2        •                    0.9165    0.9793   0.9155
                                                                      photographs. In Proc. 4th Pacific Rim Conference on
 Run 3                    •        0.8141    0.9432   0.8127
                                                                      Multimedia., volume 1, pages 6–10, 2003.
 Run 4        •           •        0.8112    0.9393   0.8097

         Table 1: UPC results in Subtask 1.

</pre>