<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UPC at MediaEval 2014 Social Event Detection Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel Manchon-Vizuete</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irene Gris-Sarabia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xavier Giro-i-Nieto</string-name>
          <email>xavier.giro@upc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitat Politecnica de Catalunya Barcelona</institution>
          ,
          <addr-line>Catalonia</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>This working notes paper presents the contribution of the UPC team to the Social Event Detection (SED) Subtask 1 in MediaEval 2014. This contribution extends the solution tested in the previous year with a better optimization of the parameters that determine the clustering algorithm, and by introducing an additional pass that considers the merges of all pairs of mini-clusters generated during the two rst passes. Our proposal also addresses the problem of incomplete metadata by generating additional textual tags based on geolocation and natural language processing techniques.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>MOTIVATION</title>
      <p>
        This document describes the algorithms tested by the
UPC team in the MediaEval 2014 Social Event Detection
(SED) Subtask 1, which addressed the problem of full
clustering of a photo collection. The reader is referred to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for
the task description in MediaEval 2014 and further details
about the study case, dataset and metrics .
      </p>
      <p>
        The proposed approach extends our submission in the
previous MediaEval SED 2013 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. That work solved the Full
Clustering subtask by rstly generating a temporal-based
over-segmentation of the photo collection as proposed by
PhotoTOC [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In a second pass, the mini-clusters in a
close temporal neighbourhood were clustered based on
geolocation, textual tags and user ID metadata. That solution
presented di culties when addressing the realistic scenario
where many photos had missing metadata information and
noisy tags.
      </p>
      <p>We observed that the images in the provided collection are
heterogeneous in terms of the type of metadata they have
associated with. Some of the photos in the dataset contained
geolocation data, while some others did not. On the other
hand, users also present diverse behaviour regarding photo
tagging. While some users will provide textual tags to their
pictures, others will just share the photos with no tags at
all.</p>
      <p>
        This situation generates an unbalanced description of the
photos in the dataset. In our previous approach [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we
applied di erent similarity functions when assessing the
merging of mini-clusters, depending on which metadata was
available from the mini-clusters.
      </p>
      <p>On the other hand, when tags are present, there is also
a problem related to natural language processing. Textual
tags have a semantic dimension that was not captured in our
previous approach, where we compared tag similarity solely
based on the string of characters, with no semantic
interpretation at all. This approach fails into detecting related
tags and synonyms, as they are considered as completely
di erent.</p>
      <p>In this MediaEval SED 2014 edition, we addressed these
problems by assessing the mergings between mini-clusters
based on the textual tags only. This way it was not
necessary to de ne and train di erent merging strategies
depending on the available metadata. This approach required
the introduction of strategies to populate the tags associated
to the photos from the available data. In addition, in this
year's submission, we also optimized the merging of
miniclusterings by adopting a three-pass approach, instead of a
two-pass one as in the previous year.</p>
      <p>This paper is structured as follows. Section 2 presents the
novelties that have been introduced this year compared to
our previous submission. The performance of the solution
is assessed in Section 3 with the results obtained on the
MediaEval SED 2014 task. Finally, Section 4 provides the
insights learned with this work.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>APPROACH</title>
      <p>
        This section presents the new advances we have
introduced in our algorithm with respect to the solution
submitted in MediaEval SED 2013 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Text-based cluster merging</title>
      <p>While in the previous year mini-clusters were merged with
a similarity that combined geolocation, tags and user IDs, in
2014 we have used only the textual tags of the mini-clusters
to assess their merges. In addition, instead of using the
Jaccard Index on the tags to assess the similarity between
mini-clusters, this year we have switched to TF-IDF
descriptors, compared with the cosine distance. These changes have
allowed a better estimation of the relevance of the di erent
terms with respect to others.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Reverse geocoding</title>
      <p>
        A rst solution to increase the amount of textual tags was
using the geolocation coordinates by obtaining the names
of the locations where the photos were taken. This
strategy is known as Reverse geocoding, and it allows
transforming numerical coordinates for geolocation into a readable
text description. Our system has used the MapQuest Open
Geocoding API Web Service, which provides an easy access
to data stored under the OpenStreeMap (OSM) project [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>Tag expansion with hypernyms</title>
      <p>One of the main challenges when dealing with textual tags
is their semantic interpretation. The case of synonyms
cannot be captured with a character-based descriptor. It
requires the introduction of some semantic information,
typically coded in an ontology.</p>
      <p>
        Another novelty in this year has been the enrichment of
the tag set with the hypernyms, that is, those terms that
provide a generalization of each original term. This way
we have tried to capture better the synonym terms. The
Natural Language Processing Toolkit (NLTK) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] has been
used for this task, which exploits the knowledge contained
in WordNet [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
2.4
      </p>
    </sec>
    <sec id="sec-6">
      <title>Three clustering passes</title>
      <p>
        The creation of clusters begins with a sequential process
in the temporal dimension, as in our previous submission
in 2013, which was inspired by the original PhotoTOC
algorithm [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This approach allows a fast converge of the
problem, as clusters are initially only merged in a local
timebased neighbourhood.
      </p>
      <p>In particular, our approach de nes three passes. In the
rst pass, the photos of each user are treated separately,
comparing the time stamps of the photos in a local
neighbourhood. In a second pass, the mini-clusters generated
from the rst pass are compared in also a small temporal
neighbourhood and a relaxed similarity threshold, assessed
with the TF-IDF descriptors of each mini-cluster. At this
stage, the user ID associated to each photo is also added
as an additional textual tag. Finally, a third pass considers
all pairs of existing mini-clusters in a global neighborhood.
However, in this case the similarity threshold for merging is
much more strict.</p>
    </sec>
    <sec id="sec-7">
      <title>EXPERIMENTS AND RESULTS</title>
      <p>
        The UPC participated in Subtask 1 with the results shown
in Table 1. The description of the di erent runs is the
following:
Run 1: No tags were added. The parameters that
optimized the clustering algorithm were found with
hyperopt Python package [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Run 2: Con guration from Run 1 was modi ed by adding
textual tags obtained with reverse geolocation , as
described in Section 2.2.</p>
      <p>
        Run 3: Con guration from Run 1 was modi ed by adding
textual tags obtained with the hypernyms provided by
the NLTK Python package [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], as described in Section
2.3.
      </p>
      <p>Run 4: Con gurations from Run 2 and Run 3 were
combined.</p>
      <p>Run 1
Run 2
Run 3
Run 4</p>
      <p>F1
0.9240
0.9165
0.8141
0.8112</p>
      <p>NMI
0.9820
0.9793
0.9432
0.9393</p>
      <p>Div. F1
0.9231
0.9155
0.8127
0.8097</p>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSIONS</title>
      <p>
        The UPC contribution to the Social Event Detection task
in MediaEval 2014 has been an extension of the approach
previously tested in the 2013 edition [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The implemented
algorithm provides a fast solution for photo clustering given
its time-based rst- and second- passes, which compare
photos and mini-clusters in a local neighbourhood. A rst
improvement from the previous submission has been the
optimization of the parameters which control the creation of
these mini-clusters. The main novelty has been introduced
in the merging of these mini-clusters which, instead of
combining di erent modalities (tags, geolocation and user ID),
have been solely based on textual tags. Nevertheless, these
textual tags have been extended in two directions: by
reverse geocoding and with hypernyms. However, the results
that have been obtained in the test dataset do not show
any improvement by the introduction of the new tags. In
fact, performance decreases, especially when using the
hypernyms. This behaviour on the test dataset was
unexpected, as during the development of our solution with the
training dataset, an improvement had been observed.
5.
      </p>
    </sec>
    <sec id="sec-9">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work has been developed in the framework of the
project TEC2013-43935-R, funded by the Spanish Ministerio
de Economia y Competitividad and the European Regional
Development Fund (ERDF).
6.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bergstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yamins</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Cox</surname>
          </string-name>
          .
          <article-title>Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures</article-title>
          .
          <source>In ICML (1)</source>
          , volume
          <volume>28</volume>
          <source>of JMLR Proceedings</source>
          , pages
          <volume>115</volume>
          {
          <fpage>123</fpage>
          . JMLR.org,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fellbaum</surname>
          </string-name>
          . WordNet. Wiley Online Library,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Haklay</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Weber</surname>
          </string-name>
          .
          <article-title>OpenStreetMap: User-generated street maps</article-title>
          .
          <source>Pervasive Computing</source>
          , IEEE,
          <volume>7</volume>
          (
          <issue>4</issue>
          ):
          <volume>12</volume>
          {
          <fpage>18</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Loper</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Bird</surname>
          </string-name>
          . NLTK:
          <article-title>The natural language toolkit</article-title>
          .
          <source>In Proceedings of the ACL-02 Workshop on E ective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics - Volume 1, ETMTNLP '02</source>
          , pages
          <fpage>63</fpage>
          {
          <fpage>70</fpage>
          ,
          <string-name>
            <surname>Stroudsburg</surname>
          </string-name>
          , PA, USA,
          <year>2002</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Manchon Vizuete</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Gris-Sarabia, and X. Giro-i Nieto. Photo clustering of social events by extending PhotoTOC to a rich context</article-title>
          .
          <source>In Proc. Social Events in Web Multimedia (SEWN)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Petkos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mezaris</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kompatsiaris</surname>
          </string-name>
          . Social Event Detection at MediaEval 2014:
          <article-title>Challenges, datasets, and evaluation</article-title>
          . In MediaEval 2014 Workshop, Barcelona, Spain, October
          <volume>16</volume>
          -17
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Platt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Czerwinski</surname>
          </string-name>
          , and
          <string-name>
            <surname>B. Field.</surname>
          </string-name>
          <article-title>PhotoTOC: automatic clustering for browsing personal photographs</article-title>
          .
          <source>In Proc. 4th Paci c Rim Conference on Multimedia.</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>6</fpage>
          {
          <fpage>10</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>