<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TIA-INAOE's approach for the 2013 Retrieving Diverse Social Images task∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hugo Jair Escalante</string-name>
          <email>C@10</email>
          <email>C@20</email>
          <email>hugojair@inaoep.mx</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alicia Morales-Reyes</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto Nacional de Astrofísica, Óptica y Electrónica Luis Enrique Erro 1</institution>
          ,
          <addr-line>72840, Puebla</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <fpage>18</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>This paper describes the approach adopted by the TIAINAOE team for the 2013 Retrieving Diverse Social Images task of MediaEval. The challenge consists in re-ranking a list of images returned by a retrieval system in such a way that visual diversity among images in the first positions is maximized [5]. A database of partially-annotated images of tourist destinations is provided. The problem is tackled as an optimization one with the aim of finding a new image ranking that shows improved diversity according to the criterion defined by MediaEval. The proposal is to apply a multi-objective evolutionary algorithm to simultaneously maximize image diversity in consecutive positions of the list and minimize divergence from the original list. Multi-modal information can be incorporated by the proposed approach. Results obtained in the MediaEval forum are reported and analyzed.</p>
      </abstract>
      <kwd-group>
        <kwd>Evolutionary algorithms</kwd>
        <kwd>Multi-objective optimization</kwd>
        <kwd>Retrieval result diversification</kwd>
        <kwd>Multi-modal image retrieval</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Retrieval results diversification has been a very active
research-topic and recently has had an increasing
popularity in information retrieval. In particular, diversification is
essential when searching very large image collections. For
instance, users do not want to see the same or very similar
views/images (e.g., the Eiffel tower) regarding a particular
query/topic (e.g., Paris), even when those images are
relevant to the query. Instead, it is desirable that multiple views
associated to the topic are shown in the first results (e.g.,
Eiffel Tower, Arch of Triumph, Notre Dame, etc.). Therefore,
maximizing relevancy should not be a unique objective for
image retrieval systems, but a trade-off between relevance
and diversity must be aimed.</p>
      <p>
        In this note, the solution to the 2013 Retrieving Diverse
Social Images MediaEval Task proposed by the TIA-INAOE1
research group is outlined. A detailed description of the
considered scenario is provided in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In a nutshell, the goal is
∗This work was partially supported by the LACCIR
programm under project id R1212LAC006.
1http://ccc.inaoep.mx/∼tia/
to develop a method able to re-rank a list of images, returned
by an image retrieval system, in such a way that images in
the first ranking positions are visually diverse to each other.
In addition, the images in the list must be ranked in
descending relevance order according to the query. Hence, the
proposed approach targets diversification as an optimization
challenge of two objectives: relevance and diversity [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        Several approaches for result diversification have been
proposed: Arni et al. reported results obtained by several
strategies evaluated in the ImageCLEF2008 photographic
retrieval task which focused on diversification [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Clustering, topic modeling and margin-maximization approaches
were proposed to re-rank images. Whereas these methods
proved to be effective, they still attempt to optimize a
single criterion. Deselaers et al. sustained that
diversification involves two conflicting objectives: relevance and
diversity [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Their proposal is an effective dynamic programming
approach to optimize an objective function that combines
relevance and diversity estimates into a single term. This
paper proposes a multi-objective evolutionary algorithm that
explicitly attempts to optimize relevance and diversity
objectives.
3.
      </p>
    </sec>
    <sec id="sec-3">
      <title>PROPOSED METHOD</title>
      <p>Considering that a list of N images (L = ⟨I1; : : : ; IN ⟩)
relevant to a particular query (Q) has an associated ranking
score: S0 = ⟨s1; : : : ; sN ⟩, where si ≥ sj; ∀i; j : i ̸= j and i &lt;
j (if the scoring value is unknown, an estimation is calculated
by si = 1i ). The goal is to find the ranking score S =
⟨s1; : : : ; sN ⟩ such that the ranking induced by S maximizes
the objectives of Equations (1) and (2):
(S0; S) = 1 − n(n26− 1) ∑ dri(S0; S)2
i
where dri(S0; S) is the difference in rankings at position i
induced by scores lists S0 and S. Spearman’s correlation
coefficient ( ) measures discrepancy between S and the initial
scores in S0. Implicitly, it is assumed that the initial list is
a good one in terms of relevance.</p>
      <p>
        Equation (2) is defined to evaluate visual similarity among
images ranked by S in consecutive positions:
(S) =
(1)
where is the diversity term, dd(Ii; I1;:::;i 1) measures the
visual distance between image in the ith rank position and
the rest of the images appearing in previous positions. Since
dd(Ii; I1;:::;i 1) is not associated to a particular feature
representation, it can be estimated by using any visual feature
provided for the task [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Moreover, dd can be estimated
by using textual information or meta-data associated to the
images. Calculating in this way to represent diversity
modifies the diversity term defined in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Aiming to find the score list S that offers the best
tradeoff between and is the main objective of this study. This
problem can be tackled by a multi-objective evolutionary
optimization technique maximizing simultaneously and .
It was decided to use the NSGA-II algorithm to target this
goal [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. NSGA-II is one of the most used multi-objective
evolutionary algorithms. Standard operators for selection,
crossover and mutation were adopted. The NSGA-II
algorithm returns as output the set of non-dominated solutions
(i.e., a set of re-ranked lists that optimize and ), an
estimate of the Pareto front for the problem at hand.
Theoretically, none of these solutions is better or worse than the
others, therefore all of them are valid solutions. However,
for our problem, a single solution has to be selected, thus
a strategy for selecting a single solution from the set of
solutions is also proposed. Specifically, values normalization
of the involved objectives is carried out across the returned
solutions and the sum of normalized objectives is ranked.
In this way, the solution at the first position offers a good
trade-off between relevance and diversity.
      </p>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENTAL RESULTS</title>
      <p>
        Three runs of the proposed Multi-Objective Result
Diversification (MORD) method were submitted to be considered
for evaluation, these are summarized in Table 1. In a
visual run, HOG features are used to estimate dd, this choice
is justified by preliminary experimentation in the
development data set. Also, a textual run was submitted in which
the term dd was estimated by the cosine dissimilarity
between the bag-of-words (BoW) representation obtained by
tags and the textual images descriptions. For the textual
information character 3-grams were used instead of words
for building the BoW. This type of representation is
particularly helpful for texts dealing with informal language,
because writing style patterns can be discovered (see e.g., [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]).
Finally, a multi-modal run was also submitted where three
objectives are simultaneously maximized: and diversity
terms dvd and dtd, considered for visual and textual runs
respectively.
      </p>
      <p>The average (test-set) performance for a number of
measures and for the three runs are reported in Table 2. We
show results using expert and crowd-sourcing ground-truth
(a 50 images sample was evaluated and the average over
three subjects is reported). Regarding expert’s evaluation,
it can be seen that performance difference among three runs
is roughly the same. Slightly better performance was
obtained when using only textual data to estimate diversity
among images (run 2). This is a somewhat unexpected
result because the aim is to maximize visual diversity.
Regarding the crowd-sourcing evaluation, a similar pattern is
observed, although the results are much higher than in the
expert evaluation. It is difficult to determine how good these
results are on test data since other systems results have not
been revealed yet. However, from results reported in this
working note, it is possible to observe that the achieved
performance on test data is similar regardless of the modality
for the proposed method. Nevertheless, it is worth
mentioning that during the development phase, performance of the
proposed method was evaluated and seemed competitive.
For instance, the multi-modal run using development data
(keywords only) obtained a C@R10 of 0:5043 ± 0:0111 (10
runs of MORD) compared to 0.4635, which is the C@R10
obtained when using the base system (using the list induced
by S0), this is a relative difference of ≈ 8:9%.
5.</p>
    </sec>
    <sec id="sec-5">
      <title>DISCUSSION AND FUTURE WORK</title>
      <p>In this study, visual results diversification has been tackled
as a multi-objective optimization problem while considering
relevance and diversity as two independent but
complementary objectives. To the best of our knowledge, this is the first
approach to explicitly optimize both objectives by using a
multi-objective evolutionary optimization technique.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Arni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Clough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanderson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Grubinger</surname>
          </string-name>
          .
          <article-title>Overview of the imageclefphoto 2008 photographic retrieval task</article-title>
          .
          <source>In CLEF</source>
          <year>2008</year>
          , volume
          <volume>5706</volume>
          <source>of LNCS</source>
          , pages
          <fpage>500</fpage>
          -
          <lpage>511</lpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Deb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pratap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Meyarivan</surname>
          </string-name>
          .
          <article-title>A fast and elitist multiobjective genetic algorithm: NSGA-II</article-title>
          .
          <source>IEEE Transactions on Evolutionary Computation</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          ):
          <fpage>182</fpage>
          -
          <lpage>197</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Deselaers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dreuw</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Ney</surname>
          </string-name>
          .
          <article-title>Jointly optimising relevance and diversity in image retrieval</article-title>
          .
          <source>In Proc. of ACM Conference on Image and Video Retrieval</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Escalante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Solorio</surname>
          </string-name>
          , and M. Montes y Gomez.
          <article-title>Local histograms of character n-grams for authorship attribution</article-title>
          .
          <source>In Proc. of ACL</source>
          , pages
          <fpage>288</fpage>
          -
          <lpage>298</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Men´endez, H. Mu¨ller, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          .
          <article-title>Retrieving diverse social images at mediaeval 2013: Objectives, dataset and evaluation</article-title>
          . In MediaEval 2013 Workshop, Barcelona, Spain, October
          <volume>18</volume>
          -19
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>