<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CEA LIST's Participation at MediaEval 2013 Retrieving Diverse Social Images Task</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Adrian Popescu CEA, LIST, Vision &amp; Content Engineering Laboratory</institution>
          ,
          <addr-line>91190 Gif-sur-Yvette</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <fpage>18</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>Clustering is by far the most popular diversi cation technique described in literature. Its aim is to group together images that are related following some similarity criterion. Here we aim to tackle the problem di erently and explore a reranking-based techniques that increase diversity by considering the \informativeness" of each new image with respect to the set of images that were already selected. \Informativeness" is de ned using social cues, such as user ID and date, visual cues extracted from the low-level representation of the image or multimedia cues that combine visual and textual processing. For some of the runs, we also exploit an initial k Nearest Neighbors (k-NN) inspired image reranking that is meant to reduce the amount of noise present in the result set.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        An e cient information retrieval system should be able to
summarize search results so that it surfaces results that are
both relevant and that are covering di erent aspects of a
query. Relevance was more thoroughly studied than
diversication and, even though a considerable amount of diversi
cation literature exists, the topic remains a hot one. Usually,
given a set of items to diversify, results clustering is exploited
in order to propose a diversi ed representation of that set
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Our purpose at MediaEval 2013 Diverse Images [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is to
build on our previous work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and adapt it to social image
search. We aim to replace clustering by a simpler method
that is based on the \informativeness" (i.e. the amount of
novelty brought by every new image). We rst describe the
di erent cues that we use to approximate \informativeness"
and a k-NN inspired image reranking procedure that aims
to reduce the amount of noise in the result set. Then we
introduce the reranking procedure used for results diversi
cation. Finally, we present the submitted runs and discuss
the results obtained.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. DIVERSIFICATION CUES</title>
    </sec>
    <sec id="sec-3">
      <title>2.1 Social Cues</title>
      <p>
        Social cues were already successfully exploited in POI image
diversi cation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The most straightforward diversi cation
methods rely on the initial Flickr ranking and exploit
simple cues such as user ID or user ID associated to the day
when the photo was taken. The rst cue aims to maximize
the number of unique users that contribute to the results
set. The intuition behind its use is that di erent users will
photograph di erent aspects of a POI. The second cue is a
lighter version of the rst and it assumes that if a user
returns to a POI on a di erent day, she is likely to photograph
another aspect of it.
      </p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Visual Cues</title>
      <p>The visual content of the images is often used in
clusteringbased diversi cation techniques. Although they do not
convey semantic information directly, visual features can be
useful, especially for topics with a small semantic coverage, such
as points of interest. Preliminary tests realized with the
different features provided by the organizers showed that HOG
outperforms the other features, although the di erences were
not very signi cant. Given these preliminary results, we
decided to exploit HOG features in our runs.</p>
    </sec>
    <sec id="sec-5">
      <title>2.3 Textual Cues</title>
      <p>We tried to exploit the textual models provided with the dev
set but no accuracy improvement compared to the Flickr
ranking was observed. This negative result might be
explained by the fact that the precision of the Flickr ranking
is already high. Consequently, we did not perform any
textual processing and simply exploited the text-based ranking
provided by Flickr in our runs.</p>
    </sec>
    <sec id="sec-6">
      <title>3. RERANKING FOR NOISE REDUCTION</title>
      <p>The initial result set is noisy and we introduce a k-NN
inspired approach that exploits social and visual cues to rerank
results. We considered all the images of the POI as a
positive set and built a negative set of the same size by
sampling images of other POIs from the collection. Then we
compared the HOG features of each image to all other
images' features from positive and negative sets and retained
the top 5 most similar results. We counted the number of
di erent users that contributed to the top 5 neighbors and,
then the number of positive exemples in the top 5 neighbors
and the average distance to the rst 5 positive neighbors.
These cues were cascaded to rerank images and the top 70%
images from the reranked list are retained for experiments
that exploit this reranking technique.</p>
    </sec>
    <sec id="sec-7">
      <title>4. RERANKING FOR DIVERSIFICATION</title>
      <p>Given an initial list of results to diversify, the purpose of this
reranking step is to surface di erent aspects of the topic in
the top results. Hash tables are created to store the unique
combinations of diversi cation cues. To diversify results,
we start from the initial ranking, create a temporary
structure to store the diversi cation and initialize the reranked
list with the rst image. We assess the images from the
list and add them to the diversi ed list only if they satisfy
a \informativeness\ criterion. This criterion is de ned
using the diversi cation cues described in Section 2. When
we reach the end of the list, we reinitialize the temporary
structure and choose images that are not already in the
diversi ed reranking. The process is repeated until all images
are added to the diversi ed list of results.</p>
    </sec>
    <sec id="sec-8">
      <title>5. RESULTS AND DISCUSSION</title>
      <p>
        We submitted four di erent runs at this year's Diverse
Social Images Task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These runs produced by using di erent
types of cues and their combinations on the same dataset.
Our submissions are brie y described below: RUN1 is based
on the HOG visual feature provided by the organizers. We
rst apply the visual reranking procedure described in 3 to
reduce the amount of noise in the initial results and retain.
Then, we initialize the diversi ed list with the rst image
and then add new images by maximizing their average
visual distance with respect to the images that are already
in the diversi ed list. RUN2 is based on the initial Flickr
ranking and on the hash table of unique users described in
Subsection 2.1. In each diversi cation round, new images
are selected only if there is another image of the same user
was not already chosen in that round. RUN3 is similar to
RUN1 with a di erence concerning the reranking for noise
reduction. This reranking is done through a linear
combination of the ranks of the images in the initial Flickr results set
and of the ranks of the images in the HOG-based reranking
exploited for RUN1. Empirical tests on the dev set showed
that the optimal combination of results is that which gives
a weight of 0.3 to the Flickr ranking and 0.7 to the
HOGbased reranking. RUN4 is similar to RUN2 but it exploits
the user-date hash table instead of the user hash in order to
diversify results.
      </p>
      <p>The results in Table 1 show the best results for the
expert annotations were obtained with the simplest reranking
approaches, that exploit only social cues. The user-based
reranking (RUN2), which performs only a slight alteration
of Flickr results by maximizing the number of di erent users
represented in the top results, had the best performances.
The assumption that di erent users will capture di erent
aspects of a POI seems to be validated. The exploitation of
the user-date combination (RUN4) produces a performance
loss compared to RUN2. The good CR@10 scores obtained
for RUN2 and RUN4 indicate that the diversi cation
technique based on social cues is e cient. The improvement of
diversity is accompanied by a small improvement of P@10
for RUN2 and by a small precision loss for RUN4.
Consequently, the F1@10 measure, which combines relevance
and diversity is improved w.r.t. the original Flickr
ranking. RUN1 and RUN3, which are based on the exploitation
of visual and multimedia cues have performances that are
inferior to those of RUN2 and RUN4. They rely on more
complex processing, which includes the maximization of the
visual diversity of results, but this processing does not seem
to be useful for the test set.</p>
      <p>When considering the crowd sourcing ground truth, the
results obtained with social cues (RUN2, RUN4) are inferior to
the results obtained with visual and multimedia processing
(RUN1 and RUN3). However, the di erence CR@10
between the best and the worst run is small and it is di cult
to have de nitive conclusions based on these scores.</p>
    </sec>
    <sec id="sec-9">
      <title>6. CONCLUSIONS</title>
      <p>
        The results obtained on the expert annotation of the test
set are surprising since initial tests performed on the dev set
gave the following performance order: RUN3, RUN1, RUN2
and RUN4. On the test set, only the order of RUN2 and
RUN4 is respected. The results obtained on the crowd
sourcing ground truth are more inline with those obtained on the
development set. The run performances that we obtained
during the campaign con rm the ndings of [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] usefulness
of social cues in result diversi cation. The small e ect of
visual cues is in contradiction with the results of [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
but we need to investigate further the reasons of these poor
performances. One explanation might come from the poor
adaptation of HOG, a simple global descriptor, to the
application domain - i.e. tourism photos. In the future, we plan
to explore the integration of social and visual cues in order
to obtain a more e cient diversi cation.
      </p>
    </sec>
    <sec id="sec-10">
      <title>7. ACKNOWLEDGMENT</title>
      <p>This research was supported by the MUCKE project funded
within the FP7 CHIST-ERA scheme.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Menendez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Muller</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          .
          <article-title>Retrieving diverse social images at mediaeval 2013: Objectives, dataset and evaluation</article-title>
          . In MediaEval 2013 Workshop, CEUR-WS.org, ISSN:
          <fpage>1613</fpage>
          -
          <lpage>0073</lpage>
          , Barcelona, Spain, October
          <volume>18</volume>
          -19
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L. S.</given-names>
            <surname>Kennedy</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Naaman</surname>
          </string-name>
          .
          <article-title>Generating diverse and representative image search results for landmarks</article-title>
          .
          <source>In Proc. of WWW</source>
          <year>2008</year>
          , pages
          <fpage>297</fpage>
          {
          <fpage>306</fpage>
          , New York, NY, USA,
          <year>2008</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          , P.
          <article-title>-A. Moellic, I. Kanellos, and</article-title>
          <string-name>
            <given-names>R.</given-names>
            <surname>Landais</surname>
          </string-name>
          .
          <article-title>Lightweight web image reranking</article-title>
          .
          <source>In Proc. of ACM Multimedia</source>
          <year>2009</year>
          , pages
          <fpage>657</fpage>
          {
          <fpage>660</fpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>R. H. van Leuken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Olivares</surname>
          </string-name>
          , and R. van Zwol.
          <article-title>Visual diversi cation of image search results</article-title>
          .
          <source>In Proc. of WWW</source>
          <year>2009</year>
          , pages
          <fpage>341</fpage>
          {
          <fpage>350</fpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>