<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>João R. M. Palotti</string-name>
          <email>palotti@ifs.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mihai Lupu</string-name>
          <email>lupu@ifs.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Navid Rekabsaz</string-name>
          <email>rekabsaz@ifs.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Allan Hanbury</string-name>
          <email>hanbury@ifs.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vienna University of</institution>
          ,
          <addr-line>Technology</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>This paper describes the e orts of Vienna University of Technology (TUW) in the MediaEval 2014 Retrieving Diverse Social Images challenge. Our approach consisted of 3 steps: (1) a pre- ltering based on Machine Learning, (2) a re-ranking based on Word2Vec, and (3) a clustering part based on an ensemble of clusters. Our best run reached a F@20 of 0.564.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Diversi cation is an interesting problem for the
information retrieval community, being a challenge for both text and
multimedia data. Focused on image retrieval, the MediaEval
2014 Retrieving Diverse Social Images Task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] was proposed
to foster the development and evaluation of methods for
retrieving diverse images of di erent point of interest.
      </p>
    </sec>
    <sec id="sec-2">
      <title>METHODS</title>
      <p>2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Pre-Filtering</title>
      <p>
        We employed a pre- ltering step to exclude likely
irrelevant pictures. The goal of this step is to increase the
percentage of relevant images. We studied two approaches: (1)
a ltering step based on a simpli ed version of Jain et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
experiments, removing images without any view, geotagged
more than 8 kilometers away from the point of interest (POI)
and with a description length longer than 2000 characters;
(2) we trained a Logistic Regression classi er on the whole
2013 and 2014 data, using as features the ones described
above and also the images' license, the time of the day
(morning, afternoon, night) and the number of times the
POI appeared in the title and descriptions of an image.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Re-ranking</title>
      <p>
        For re-ordering the results, we used the title, tags and
description of the photos. For text pre-processing, we
decomposed the terms using a greedy dictionary based approach.
In the next step, we expand the query using the rst
sentence of Wikipedia which helped for place disambiguation.
We tested four document similarity methods based on Solr1,
Random Indexing2, Galago3 and Word2Vec4[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Among all,
we found the best result using a semantic similarity approach
based on Word2Vec.
      </p>
      <p>Word2Vec provides vector representation of words by
using deep learning. We trained a model on Wikipedia and
then used the vector representation of words to calculate
the text similarity of the query to each photo.</p>
      <p>
        Apart from the Word2Vec scores, we extracted binary
attributes based on Jain et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], as we did in the pre- ltering
step, and we used a Linear Regression to re-rank the results
based on the development data.
2.3
      </p>
      <p>We worked on three methods for clustering, all based on
similarity measures. They share the idea of creating a
similarity graph (potentially complete) in which each vertex
represents an image for one point of interest, and each edge
represents the similarity between two images. Di erent
similarity metrics and di erent set of features can be used. Next,
we explain each algorithm and how we combined them.
2.3.1</p>
      <sec id="sec-4-1">
        <title>Metis</title>
        <p>
          The rst approach, called Metis [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], tries to collapse
similar and neighbor vertices, reducing the initial graph to a
smaller one (known as coarsening step). Then, it divides the
coarsest graph into a pre-de ned number of graphs,
generating the clusters.
2.3.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Spectral</title>
        <p>
          Spectral clustering [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] can also be seen as a graph
partitioning method, which measures both the total dissimilarity
between groups as well as the total similarity within a group.
We used the Scikit-learn5 implementation of this method.
2.3.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Hierarchical</title>
        <p>
          Hierarchical clustering [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] is based on the idea of a
hierarchy of clusters. A tree is built in a way that the root
gathers all the samples and the leaves are clusters with only
one sample. This tree can be built bottom-up or top-down.
We used the bottom-up implementation from Scikit-learn.
1http://lucene.apache.org/solr/
2https://code.google.com/p/semanticvectors/
3http://sourceforge.net/p/lemur/galago/
4https://code.google.com/p/word2vec/
5http://scikit-learn.org/
1
2
3
4
5
        </p>
        <p>Pre-Filtering</p>
        <p>Re-Ranking
0.827
0.903
0.870
0.890
0.837
0.282
0.262
0.301
0.297
0.299
2.3.4</p>
      </sec>
      <sec id="sec-4-4">
        <title>Merging</title>
        <p>We found that the clustering methods were unstable as
modi cations in the ltering step caused a great variation
in the clustering step. Therefore, we decided to implement a
merging heuristic, which takes into account di erent points
of view from each clustering method and/or feature set,
being potentially more robust than using one single algorithm.</p>
        <p>
          Given c di erent clustering algorithms, f di erent feature
sets, and m distance measures, there are c f m possible
cluster sets. In our work, we used the 3 algorithms described,
3 features sets (HOG, CN, CN3x3, see [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] for details about
these features) and 2 distance measures (cosine and
Chebyshev), giving us 18 di erent cluster sets for each POI. We
can then compute a frequency matrix that will hold for
every two documents the number of cluster sets in which they
occur in the same cluster. Next we create a re-ranked list
of the images from the original list (Flickr ranking) based
only on this frequency matrix. In order to do that, we
dene a threshold t and a function F on a set of frequencies
that determine when a document should be moved to the
re-ranked list. Suppose the re-ranked list contains the
documents D1; :::; Di and we want to know if a document Dk in
the original list can be moved to the re-ranked list at
position i + 1. We compute the frequencies f1; :::; fi between Dk
and each D1; :::; Di and if F (f1; :::; fi) &lt; t, Dk is moved to
the re-ranked list, otherwise it is not. After all the elements
in the original list are processed, if there are still remaining
documents not moved to the re-ranked list, the value of t
is increased and these documents are reprocessed. The
algorithm continues until there are no documents left in the
original list. The functions for F used in this work were
maximum, minimum and mean, but other measures, such
as mode, median or any percentile could be easily employed
as well.
2.4
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Credibility</title>
      <p>Our approaches were based on Machine Learning (ML):
we trained a Logistic Regression classi er to learn if a
document was relevant or not based on the credibility data (used
only face proportion, location similarity, upload frequency
and bulk proportion). We tested two methods: (1)
excluding documents set as irrelevant for Run4 and (2) moving to
the bottom of the list irrelevant documents for Run5.</p>
    </sec>
    <sec id="sec-6">
      <title>3. EXPERIMENTS</title>
      <p>We submitted all 5 runs, varying on the use of pre- ltering,
the re-ranking method, the clustering approach and the use
of credibility. Details are shown in Table 1 and the results
are shown in Table 2. Based on the development data, we
were expecting Run3 and Run4 to be our best runs, but the
results on the test data shows that we probably over tted
the development set for the re-ranking and credibility part.
The best result was that the cluster ensemble proved to be
robust for this task.
4.</p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION</title>
      <p>Our experiments show that an ensemble of clusters can be
a robust way to diversify results. Unfortunately our re-rank
method did not work in the test set as well as it did in the
development set. Last, the use of credibility also seems to
have over tted the development data, not being e ective for
the test set.</p>
      <p>ACKNOWLEDGMENTS.</p>
      <p>This research was funded by the Austrian Science Fund
(FWF) project number I1094-N23 (MUCKE) and project
number P25905-N23 (ADmIRE).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. L.</surname>
          </string-name>
          <article-title>G^nsca, and</article-title>
          <string-name>
            <given-names>H.</given-names>
            <surname>Mu</surname>
          </string-name>
          <article-title>ller</article-title>
          .
          <source>Retrieving Diverse Social Images at MediaEval</source>
          <year>2014</year>
          :
          <article-title>Challenge, Dataset and Evaluation</article-title>
          .
          <source>In MediaEval</source>
          <year>2014</year>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Samangooei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Preston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dupplaw</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Lewis</surname>
          </string-name>
          .
          <article-title>Experiments in diversifying ickr result sets</article-title>
          .
          <source>In MediaEval</source>
          <year>2013</year>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Karypis</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          .
          <article-title>A fast and high quality multilevel scheme for partitioning irregular graphs</article-title>
          .
          <source>SIAM J. Sci. Comput</source>
          .,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>CoRR</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Shi</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Malik</surname>
          </string-name>
          .
          <article-title>Normalized cuts and image segmentation</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach</source>
          . Intell.,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Ward</surname>
          </string-name>
          .
          <article-title>Hierarchical grouping to optimize an objective function</article-title>
          .
          <source>JASA</source>
          ,
          <year>1963</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>