CEA LIST’s Participation at MediaEval 2013 Retrieving
                  Diverse Social Images Task

                                                           Adrian Popescu
                   CEA, LIST, Vision & Content Engineering Laboratory, 91190 Gif-sur-Yvette, France.
                                                      adrian.popescu@cea.fr


ABSTRACT                                                             the number of unique users that contribute to the results
Clustering is by far the most popular diversification tech-          set. The intuition behind its use is that different users will
nique described in literature. Its aim is to group together          photograph different aspects of a POI. The second cue is a
images that are related following some similarity criterion.         lighter version of the first and it assumes that if a user re-
Here we aim to tackle the problem differently and explore a          turns to a POI on a different day, she is likely to photograph
reranking-based techniques that increase diversity by consid-        another aspect of it.
ering the “informativeness” of each new image with respect
to the set of images that were already selected. “Informa-           2.2    Visual Cues
tiveness” is defined using social cues, such as user ID and          The visual content of the images is often used in clustering-
date, visual cues extracted from the low-level representation
                                                                     based diversification techniques. Although they do not con-
of the image or multimedia cues that combine visual and
                                                                     vey semantic information directly, visual features can be use-
textual processing. For some of the runs, we also exploit an         ful, especially for topics with a small semantic coverage, such
initial k Nearest Neighbors (k-NN) inspired image reranking          as points of interest. Preliminary tests realized with the dif-
that is meant to reduce the amount of noise present in the           ferent features provided by the organizers showed that HOG
result set.
                                                                     outperforms the other features, although the differences were
                                                                     not very significant. Given these preliminary results, we de-
1.   INTRODUCTION                                                    cided to exploit HOG features in our runs.
An efficient information retrieval system should be able to
summarize search results so that it surfaces results that are
both relevant and that are covering different aspects of a
                                                                     2.3    Textual Cues
query. Relevance was more thoroughly studied than diversi-           We tried to exploit the textual models provided with the dev
fication and, even though a considerable amount of diversifi-        set but no accuracy improvement compared to the Flickr
cation literature exists, the topic remains a hot one. Usually,      ranking was observed. This negative result might be ex-
given a set of items to diversify, results clustering is exploited   plained by the fact that the precision of the Flickr ranking
in order to propose a diversified representation of that set         is already high. Consequently, we did not perform any tex-
[4]. Our purpose at MediaEval 2013 Diverse Images [1] is to          tual processing and simply exploited the text-based ranking
build on our previous work [3] and adapt it to social image          provided by Flickr in our runs.
search. We aim to replace clustering by a simpler method
that is based on the “informativeness” (i.e. the amount of           3.    RERANKING FOR NOISE REDUCTION
novelty brought by every new image). We first describe the           The initial result set is noisy and we introduce a k-NN in-
different cues that we use to approximate “informativeness”          spired approach that exploits social and visual cues to rerank
and a k-NN inspired image reranking procedure that aims              results. We considered all the images of the POI as a pos-
to reduce the amount of noise in the result set. Then we             itive set and built a negative set of the same size by sam-
introduce the reranking procedure used for results diversifi-        pling images of other POIs from the collection. Then we
cation. Finally, we present the submitted runs and discuss           compared the HOG features of each image to all other im-
the results obtained.                                                ages’ features from positive and negative sets and retained
                                                                     the top 5 most similar results. We counted the number of
2. DIVERSIFICATION CUES                                              different users that contributed to the top 5 neighbors and,
                                                                     then the number of positive exemples in the top 5 neighbors
2.1 Social Cues                                                      and the average distance to the first 5 positive neighbors.
Social cues were already successfully exploited in POI image         These cues were cascaded to rerank images and the top 70%
diversification [2]. The most straightforward diversification        images from the reranked list are retained for experiments
methods rely on the initial Flickr ranking and exploit sim-          that exploit this reranking technique.
ple cues such as user ID or user ID associated to the day
when the photo was taken. The first cue aims to maximize
                                                                     4.    RERANKING FOR DIVERSIFICATION
                                                                     Given an initial list of results to diversify, the purpose of this
Copyright is held by the author/owner(s).
MediaEval 2013 Workshop October 18-19, 2013, Barcelona, Spain        reranking step is to surface different aspects of the topic in
                                                                     the top results. Hash tables are created to store the unique
                                                                    the user-date combination (RUN4) produces a performance
Table 1: Run performances with three official met-                  loss compared to RUN2. The good CR@10 scores obtained
rics: CR - cluster recall, P - precision, F1 - harmonic             for RUN2 and RUN4 indicate that the diversification tech-
mean of CR and P. All values are expressed after 10                 nique based on social cues is efficient. The improvement of
results. The first three columns present results ob-                diversity is accompanied by a small improvement of P@10
tained with expert annotations and the last three                   for RUN2 and by a small precision loss for RUN4. Con-
columns results obtained with crowdsourcing (aver-                  sequently, the F1@10 measure, which combines relevance
ages over the three workers).                                       and diversity is improved w.r.t. the original Flickr rank-
            Expert annotations    Crowd sourcing                    ing. RUN1 and RUN3, which are based on the exploitation
      Run CR        P      F1     CR     P     F1                   of visual and multimedia cues have performances that are
      #1    0.37 0.75     0.48   0.74 0.78 0.73                     inferior to those of RUN2 and RUN4. They rely on more
      #2    0.42 0.77     0.52   0.73 0.71 0.68                     complex processing, which includes the maximization of the
      #3    0.36 0.76     0.48   0.75 0.77 0.73                     visual diversity of results, but this processing does not seem
      #4    0.40 0.74      0.5   0.71 0.69 0.67                     to be useful for the test set.

                                                                    When considering the crowd sourcing ground truth, the re-
combinations of diversification cues. To diversify results,         sults obtained with social cues (RUN2, RUN4) are inferior to
we start from the initial ranking, create a temporary struc-        the results obtained with visual and multimedia processing
ture to store the diversification and initialize the reranked       (RUN1 and RUN3). However, the difference CR@10 be-
list with the first image. We assess the images from the            tween the best and the worst run is small and it is difficult
list and add them to the diversified list only if they satisfy      to have definitive conclusions based on these scores.
a “informativeness“ criterion. This criterion is defined us-
ing the diversification cues described in Section 2. When           6.   CONCLUSIONS
we reach the end of the list, we reinitialize the temporary         The results obtained on the expert annotation of the test
structure and choose images that are not already in the di-         set are surprising since initial tests performed on the dev set
versified reranking. The process is repeated until all images       gave the following performance order: RUN3, RUN1, RUN2
are added to the diversified list of results.                       and RUN4. On the test set, only the order of RUN2 and
                                                                    RUN4 is respected. The results obtained on the crowd sourc-
5.   RESULTS AND DISCUSSION                                         ing ground truth are more inline with those obtained on the
We submitted four different runs at this year’s Diverse So-         development set. The run performances that we obtained
cial Images Task [1]. These runs produced by using different        during the campaign confirm the findings of [2] usefulness
types of cues and their combinations on the same dataset.           of social cues in result diversification. The small effect of
Our submissions are briefly described below: RUN1 is based          visual cues is in contradiction with the results of [2] and [3]
on the HOG visual feature provided by the organizers. We            but we need to investigate further the reasons of these poor
first apply the visual reranking procedure described in 3 to        performances. One explanation might come from the poor
reduce the amount of noise in the initial results and retain.       adaptation of HOG, a simple global descriptor, to the appli-
Then, we initialize the diversified list with the first image       cation domain - i.e. tourism photos. In the future, we plan
and then add new images by maximizing their average vi-             to explore the integration of social and visual cues in order
sual distance with respect to the images that are already           to obtain a more efficient diversification.
in the diversified list. RUN2 is based on the initial Flickr
ranking and on the hash table of unique users described in          7.   ACKNOWLEDGMENT
Subsection 2.1. In each diversification round, new images           This research was supported by the MUCKE project funded
are selected only if there is another image of the same user        within the FP7 CHIST-ERA scheme.
was not already chosen in that round. RUN3 is similar to
RUN1 with a difference concerning the reranking for noise
reduction. This reranking is done through a linear combina-
                                                                    8.   REFERENCES
                                                                    [1] B. Ionescu, M. Menendez, H. Muller, and A. Popescu.
tion of the ranks of the images in the initial Flickr results set
                                                                        Retrieving diverse social images at mediaeval 2013:
and of the ranks of the images in the HOG-based reranking
                                                                        Objectives, dataset and evaluation. In MediaEval 2013
exploited for RUN1. Empirical tests on the dev set showed
                                                                        Workshop, CEUR-WS.org, ISSN: 1613-0073,
that the optimal combination of results is that which gives
                                                                        Barcelona, Spain, October 18-19 2013.
a weight of 0.3 to the Flickr ranking and 0.7 to the HOG-
based reranking. RUN4 is similar to RUN2 but it exploits            [2] L. S. Kennedy and M. Naaman. Generating diverse and
the user-date hash table instead of the user hash in order to           representative image search results for landmarks. In
diversify results.                                                      Proc. of WWW 2008, pages 297–306, New York, NY,
                                                                        USA, 2008. ACM.
The results in Table 1 show the best results for the ex-            [3] A. Popescu, P.-A. Moëllic, I. Kanellos, and R. Landais.
pert annotations were obtained with the simplest reranking              Lightweight web image reranking. In Proc. of ACM
approaches, that exploit only social cues. The user-based               Multimedia 2009, pages 657–660, New York, NY, USA,
reranking (RUN2), which performs only a slight alteration               2009. ACM.
of Flickr results by maximizing the number of different users       [4] R. H. van Leuken, L. Garcia, X. Olivares, and R. van
represented in the top results, had the best performances.              Zwol. Visual diversification of image search results. In
The assumption that different users will capture different              Proc. of WWW 2009, pages 341–350, New York, NY,
aspects of a POI seems to be validated. The exploitation of             USA, 2009. ACM.