TIA-INAOE’s approach for the 2013
                       Retrieving Diverse Social Images task∗

                                     Hugo Jair Escalante, Alicia Morales-Reyes
                                    Instituto Nacional de Astrofísica, Óptica y Electrónica
                                          Luis Enrique Erro 1, 72840, Puebla, Mexico
                                            {hugojair, a.morales}@inaoep.mx

ABSTRACT                                                            to develop a method able to re-rank a list of images, returned
This paper describes the approach adopted by the TIA-               by an image retrieval system, in such a way that images in
INAOE team for the 2013 Retrieving Diverse Social Images            the ﬁrst ranking positions are visually diverse to each other.
task of MediaEval. The challenge consists in re-ranking a           In addition, the images in the list must be ranked in de-
list of images returned by a retrieval system in such a way         scending relevance order according to the query. Hence, the
that visual diversity among images in the ﬁrst positions is         proposed approach targets diversiﬁcation as an optimization
maximized [5]. A database of partially-annotated images             challenge of two objectives: relevance and diversity [3].
of tourist destinations is provided. The problem is tackled
as an optimization one with the aim of ﬁnding a new im-             2.    RELATED WORK
age ranking that shows improved diversity according to the             Several approaches for result diversiﬁcation have been pro-
criterion deﬁned by MediaEval. The proposal is to apply             posed: Arni et al. reported results obtained by several
a multi-objective evolutionary algorithm to simultaneously          strategies evaluated in the ImageCLEF2008 photographic
maximize image diversity in consecutive positions of the list       retrieval task which focused on diversiﬁcation [1]. Cluster-
and minimize divergence from the original list. Multi-modal         ing, topic modeling and margin-maximization approaches
information can be incorporated by the proposed approach.           were proposed to re-rank images. Whereas these methods
Results obtained in the MediaEval forum are reported and            proved to be eﬀective, they still attempt to optimize a sin-
analyzed.                                                           gle criterion. Deselaers et al. sustained that diversiﬁca-
                                                                    tion involves two conﬂicting objectives: relevance and diver-
Keywords                                                            sity [3]. Their proposal is an eﬀective dynamic programming
Evolutionary algorithms; Multi-objective optimization; Re-          approach to optimize an objective function that combines
trieval result diversiﬁcation; Multi-modal image retrieval          relevance and diversity estimates into a single term. This pa-
                                                                    per proposes a multi-objective evolutionary algorithm that
                                                                    explicitly attempts to optimize relevance and diversity ob-
1. INTRODUCTION                                                     jectives.
   Retrieval results diversiﬁcation has been a very active
research-topic and recently has had an increasing popular-
ity in information retrieval. In particular, diversiﬁcation is      3.    PROPOSED METHOD
essential when searching very large image collections. For             Considering that a list of N images (L = ⟨I1 , . . . , IN ⟩)
instance, users do not want to see the same or very similar         relevant to a particular query (Q) has an associated ranking
views/images (e.g., the Eiﬀel tower) regarding a particular         score: S 0 = ⟨s1 , . . . , sN ⟩, where si ≥ sj , ∀i, j : i ̸= j and i <
query/topic (e.g., Paris), even when those images are rele-         j (if the scoring value is unknown, an estimation is calculated
vant to the query. Instead, it is desirable that multiple views     by si = 1i ). The goal is to ﬁnd the ranking score S ∗ =
associated to the topic are shown in the ﬁrst results (e.g., Eif-   ⟨s∗1 , . . . , s∗N ⟩ such that the ranking induced by S ∗ maximizes
fel Tower, Arch of Triumph, Notre Dame, etc.). Therefore,           the objectives of Equations (1) and (2):
maximizing relevancy should not be a unique objective for
image retrieval systems, but a trade-oﬀ between relevance                                               6       ∑
and diversity must be aimed.                                                   ρ(S 0 , S) = 1 −                       dri (S 0 , S)2   (1)
                                                                                                    n(n2 − 1)     i
   In this note, the solution to the 2013 Retrieving Diverse
Social Images MediaEval Task proposed by the TIA-INAOE1             where dri (S 0 , S) is the diﬀerence in rankings at position i
research group is outlined. A detailed description of the con-      induced by scores lists S 0 and S. Spearman’s correlation co-
sidered scenario is provided in [5]. In a nutshell, the goal is     eﬃcient (ρ) measures discrepancy between S and the initial
∗This work was partially supported by the LACCIR pro-               scores in S 0 . Implicitly, it is assumed that the initial list is
gramm under project id R1212LAC006.                                 a good one in terms of relevance.
1
  http://ccc.inaoep.mx/∼tia/                                          Equation (2) is deﬁned to evaluate visual similarity among
                                                                    images ranked by S in consecutive positions:
                                                                                             ∑
                                                                                             N
Copyright is held by the author/owner(s).                                           β(S) =          min(dd (Ii , I1,...,i−1 ))         (2)
MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain                                i=2
     Table 1: Summary of submitted runs.                                 Table 2: Oﬃcial results obtained by MORD.
 ID Description                                                                       Expert evaluation
 1 MORD - HOG / Euclidean distance for dd                                 C@10 C@20 P@10 P@20 F@10 F@20
 2 MORD - Tags + descriptions / cosine dissim. for dr                   1 0.3808 0.5699 0.7146 0.7143 0.4795 0.6125
 3 MORD - 3 objectives: ρ, visual dvd and textual dtd div.              2 0.3885 0.5732 0.7091 0.7136 0.4801 0.6102
                                                                        3 0.3823 0.5690 0.6977 0.7076 0.4728 0.6067
                                                                                  Crowd-sourcing evaluation
where β is the diversity term, dd (Ii , I1,...,i−1 ) measures the         C@10 C@20 P@10 P@20 F@10 F@20
visual distance between image in the ith rank position and              1 0.7194 0.8499 0.6755 0.6745 0.6640 0.7245
the rest of the images appearing in previous positions. Since           2 0.7503 0.8625 0.6755 0.6898 0.6790 0.7392
dd (Ii , I1,...,i−1 ) is not associated to a particular feature rep-    3 0.7479 0.8675 0.6714 0.6918 0.6769 0.7464
resentation, it can be estimated by using any visual feature
provided for the task [5]. Moreover, dd can be estimated
by using textual information or meta-data associated to the
                                                                       tained when using only textual data to estimate diversity
images. Calculating β in this way to represent diversity
                                                                       among images (run 2). This is a somewhat unexpected re-
modiﬁes the diversity term deﬁned in [3].
                                                                       sult because the aim is to maximize visual diversity. Re-
   Aiming to ﬁnd the score list S ∗ that oﬀers the best trade-
                                                                       garding the crowd-sourcing evaluation, a similar pattern is
oﬀ between ρ and β is the main objective of this study. This
                                                                       observed, although the results are much higher than in the
problem can be tackled by a multi-objective evolutionary
                                                                       expert evaluation. It is diﬃcult to determine how good these
optimization technique maximizing simultaneously ρ and β.
                                                                       results are on test data since other systems results have not
It was decided to use the NSGA-II algorithm to target this
                                                                       been revealed yet. However, from results reported in this
goal [2]. NSGA-II is one of the most used multi-objective
                                                                       working note, it is possible to observe that the achieved per-
evolutionary algorithms. Standard operators for selection,
                                                                       formance on test data is similar regardless of the modality
crossover and mutation were adopted. The NSGA-II algo-
                                                                       for the proposed method. Nevertheless, it is worth mention-
rithm returns as output the set of non-dominated solutions
                                                                       ing that during the development phase, performance of the
(i.e., a set of re-ranked lists that optimize ρ and β), an es-
                                                                       proposed method was evaluated and seemed competitive.
timate of the Pareto front for the problem at hand. Theo-
                                                                       For instance, the multi-modal run using development data
retically, none of these solutions is better or worse than the
                                                                       (keywords only) obtained a C@R10 of 0.5043 ± 0.0111 (10
others, therefore all of them are valid solutions. However,
                                                                       runs of MORD) compared to 0.4635, which is the C@R10
for our problem, a single solution has to be selected, thus
                                                                       obtained when using the base system (using the list induced
a strategy for selecting a single solution from the set of so-
                                                                       by S 0 ), this is a relative diﬀerence of ≈ 8.9%.
lutions is also proposed. Speciﬁcally, values normalization
of the involved objectives is carried out across the returned
solutions and the sum of normalized objectives is ranked.              5.   DISCUSSION AND FUTURE WORK
In this way, the solution at the ﬁrst position oﬀers a good              In this study, visual results diversiﬁcation has been tackled
trade-oﬀ between relevance and diversity.                              as a multi-objective optimization problem while considering
                                                                       relevance and diversity as two independent but complemen-
4. EXPERIMENTAL RESULTS                                                tary objectives. To the best of our knowledge, this is the ﬁrst
                                                                       approach to explicitly optimize both objectives by using a
   Three runs of the proposed Multi-Objective Result Diver-            multi-objective evolutionary optimization technique.
siﬁcation (MORD) method were submitted to be considered
for evaluation, these are summarized in Table 1. In a vi-
sual run, HOG features are used to estimate dd , this choice           6.   REFERENCES
is justiﬁed by preliminary experimentation in the develop-             [1] T. Arni, P. Clough, M. Sanderson, and M. Grubinger.
ment data set. Also, a textual run was submitted in which                  Overview of the imageclefphoto 2008 photographic
the term dd was estimated by the cosine dissimilarity be-                  retrieval task. In CLEF 2008, volume 5706 of LNCS,
tween the bag-of-words (BoW) representation obtained by                    pages 500–511. Springer, 2009.
tags and the textual images descriptions. For the textual              [2] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A
information character 3-grams were used instead of words                   fast and elitist multiobjective genetic algorithm:
for building the BoW. This type of representation is partic-               NSGA-II. IEEE Transactions on Evolutionary
ularly helpful for texts dealing with informal language, be-               Computation, 6(2):182–197, 2002.
cause writing style patterns can be discovered (see e.g., [4]).        [3] T. Deselaers, T. Gass, P. Dreuw, and H. Ney. Jointly
Finally, a multi-modal run was also submitted where three                  optimising relevance and diversity in image retrieval. In
objectives are simultaneously maximized: ρ and diversity                   Proc. of ACM Conference on Image and Video
terms dvd and dtd , considered for visual and textual runs re-             Retrieval, pages 1–8, 2009.
spectively.                                                            [4] H. J. Escalante, T. Solorio, and M. Montes y Gomez.
   The average (test-set) performance for a number of mea-                 Local histograms of character n-grams for authorship
sures and for the three runs are reported in Table 2. We                   attribution. In Proc. of ACL, pages 288–298, 2011.
show results using expert and crowd-sourcing ground-truth              [5] B. Ionescu, M. Menéndez, H. Müller, and A. Popescu.
(a 50 images sample was evaluated and the average over                     Retrieving diverse social images at mediaeval 2013:
three subjects is reported). Regarding expert’s evaluation,                Objectives, dataset and evaluation. In MediaEval 2013
it can be seen that performance diﬀerence among three runs                 Workshop, Barcelona, Spain, October 18-19 2013.
is roughly the same. Slightly better performance was ob-