Recod @ MediaEval 2015: Diverse Social Images Retrieval

                   Rodrigo T. Calumby1,2 , Iago B. A. do C. Araujo2 , Vinícius P. Santana2
                  Javier A. V. Munoz1 , Otávio A. B. Penatti3 , Lin T. Li1 , Jurandy Almeida1,5
                   Giovani Chiachia1 , Marcos A. Gonçalves4 , and Ricardo da S. Torres1
      1 RECOD Lab, University of Campinas, Campinas, SP – Brazil, 2 University of Feira de Santana, Feira de Santana, BA – Brazil
     3 SAMSUNG Research Institute Brazil, Campinas, SP – Brazil, 4 Federal University of Minas Gerais, Belo Horizonte, MG – Brazil
                            5 GIBIS Lab, Federal University of São Paulo, São José dos Campos, SP – Brazil

                   [rtcalumby,ibacaraujo,vpsantana]@ecomp.uefs.br, jalvarm.acm@gmail.com, o.penatti@samsung.com,
                       jurandy.almeida@unifesp.br, mgoncalv@dcc.ufmg.br, [lintzyli,chiachia,rtorres]@ic.unicamp.br


ABSTRACT
This paper presents the RECOD team experience in the Re-
trieving Diverse Social Images Task at MediaEval 2015. The
teams were required to develop a diversification approach for
social photo retrieval. Our proposal is based on irrelevant
image filtering, reranking, rank aggregation, and diversity
promotion. We proposed a multimodal approach and ex-
ploited image metadata and user credibility information.


1.     INTRODUCTION
   The relevance-diversity trade-off is an important problem
associated with several search scenarios. Promoting diver-
sity in retrieval results has been shown to positively impact
the user search experience specially for ambiguous, under-                 Figure 1: Overview of the proposed approach.
specified, and visual summarization queries [1]. The Re-
trieving Diverse Social Images Task [6] integrates it into
a tourism related representative image retrieval challenge.            used a face detection module of Face++1 and all images with
This paper describes the RECOD group contributions via                 more than one face detected were removed. Finally, out-of-
diversity promotion boosted by multimodal rank fusion.                 focus images were eliminated using the method from [14]
                                                                       with blur threshold set to 0.8.

2.     PROPOSED APPROACH                                               2.2     Visual Features and Text Similarity
   As a relevance enhancement step our approach includes                 For visual similarity, besides the provided features, we
irrelevant image filtering, multimodal image reranking, and            also extracted: (i) two general purpose global descriptors
rank aggregation. Image filtering was conducted according              (BIC [13] and GIST [9]); (ii) a bag of visual words (BoVW)
to face detection data and geographic location of the im-              descriptor, based on sparse (Harris-Laplace detector) SIFT,
ages. We also evaluated several different visual features and          with 512 visual words (randomly selected), soft assignment
text similarity measures. For reranking the original retrieval         (σ = 150), and max pooling or using Word Spatial Arrange-
list, we exploited textual, visual, geographic, and credibil-          ment (WSA) [11] for encoding the spatial arrangement of
ity information. As an additional step, the reranked lists             visual words; and (iii) fifteen features available in the Lire
were also aggregated with a Genetic Programming (GP) [7]               package [8].2
approach. Our proposal follows the general work-flow pre-                For text-only and multimodal runs, we used the Cosine,
sented in Figure 1. Each of these steps is described next.             BM25, Dice, Jaccard, and TF-IDF measures which were
                                                                       computed using the provided TF, DF, and TF-IDF vectors.
2.1      Filtering
   For irrelevant images removal we used three filtering strate-       2.3     Reranking and Aggregation
gies: geographic-based, face-based, and blur-based. The Ge-              For improving the original list ranking, we explored visual,
oFilter eliminated all images located farther than a given             textual, credibility, and geographic ranking. For text-only
range in relation to the reference lat/long. Following the             and multimodal reranking, the text-based scores were com-
result on the development set, the filtering radius was set to         puted as the similarity between the text vectors associated
10 km. In turn, the face-based procedure was introduced to             1
filter out images containing people as the main subject. We             http://www.faceplusplus.com (as of June/2015).
                                                                       2
                                                                        CEDD, FCTH, OpponentHistogram, JointHistogram, Au-
                                                                       toColorCorrelogram, ColorLayout, EdgeHistogram, Ga-
                                                                       bor, JCD, JpegCoefficientHistogram, ScalableColor, Simple-
Copyright is held by the author/owner(s).                              ColorHistogram, Tamura, LuminanceLayout, and PHOG.
MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany            Available at: http://www.lire-project.net/
Table 1: Runs Configurations (*only for one-topic                         Table 2: DevSet and TestSet Results
queries)
                                                                                    DevSet                         TestSet
Run Filtering         Reranking         Diversity
                                                                   Run     P@20     CR@20      F1@20     P@20      CR@20      F1@20
 1  Geo*, face, blur Visual*            BIC
                                                                    1      0.7487    0.4336    0.5409    0.7129     0.4111    0.5063
 2  Geo*, face, blur Textual            Cosine + Jaccard
                                                                    2      0.8013    0.4514    0.5694    0.6996     0.4248    0.5101
 3  Geo*, face, blur Visual*, textual Jaccard
                                                                    3      0.7837    0.4436    0.5592    0.7058     0.3881    0.4883
 4  Geo*, face, blur Credibility        Users
                                                                    4      0.7644    0.4446    0.5532    0.7198     0.4309    0.5219
 5  Geo*, face*, blur Visual*, textual, Jaccard
                                                                    5      0.8190    0.4637    0.5853    0.7324     0.4123    0.5084
                      credibility, geo*


with the images and the localities’ text vectors. The visual      Table 3: TestSet Results: One-topic and Multi-topic
method reranked the original list according to the similarity
in relation to the location’s representative Wikipedia images.
The visual distance from each image to the representative                           One-topic                     Multi-topic
set was computed as the minimum distance to each repre-            Run     P@20     CR@20 F1@20          P@20      CR@20 F1@20
sentative image. All credibility scores were individually used      1      0.6906    0.4000   0.4991     0.7350     0.4221    0.5133
                                                                    2      0.7130    0.4316   0.5205     0.6864     0.4181    0.4998
for reranking. Additionally, lat/long data were used to rank
                                                                    3      0.6942    0.3982   0.4970     0.7171     0.3782    0.4798
images according to the Haversine distance to the reference         4      0.7630    0.4301   0.5390     0.6771     0.4318    0.5051
point.                                                              5      0.7290    0.4286   0.5228     0.7357     0.3963    0.4942
   For feature fusion, the reranked lists were combined using
the GP approach from [15] which uses several rank aggre-
gation methods. This method was trained using the devel-
opment data and combined order-based (MRA [4], RRF [3],              In all runs, the geographic filtering and reranking were
and BordaCount [16]) and score-based (CombMIN, Comb-              only applied for one-topic queries since multi-topic queries
MAX, CombSUM, ComMED, CombANZ [12], and RL-                       do not have reference geo-location. Additionally, in run 5,
Sim [10]) rank fusion methods.                                    the face-based filters were also only applied to one-topic
   As a relevance-based filtering, from the final aggregated      queries since multi-topic queries have a different relevance
list, up to 150-top ranked images were selected as the input      constraint in relation to people in the foreground. Finally,
list for the summarization procedure.                             in runs 1, 3, and 5 no visual reranking was applied for multi-
                                                                  topic queries.
2.4    Diversification Method
   After filtering, reranking, and aggregation steps, the im-     4.   RESULTS AND DISCUSSION
proved relevance-based lists were submitted to explicit diver-       Table 2 presents the effectiveness results for the five runs
sification. We evaluated four methods: clustering-based (k-       for the development and test set. The best results (F1@20)
Medoids and agglomerative) and reranking-based (MMR [2],          on the development set were achieved on run 5, followed
MSD [5]). In all cases the k-Medoids method achieved sig-         by runs 2 and 3, in which textual information was used.
nificantly superior results on the development set and was        However, these were the runs with the greatest effectiveness
used in the submitted runs.                                       difference when comparing development and test queries,
   In the clustering step, the initial medoids were selected in   specially considering the multi-topic queries.
an offset fashion for equally sampling from the top to the           Table 3 presents the effectiveness results for one-topic and
bottom of the ranked list. The medoids updating procedure         multi-topic test queries. As we can observe, even with no
just selected the best connected image of the cluster, using      visual reranking, the visual-only run allowed slightly supe-
the average distance to all other images in the cluster. The      rior results for multi-topic queries considering all the run
process iterates until no intercluster image transition occurs    types and also comparing to one-topic queries. All other run
or up to 50 iterations.                                           types achieved superior effectiveness on one-topic queries,
   For runs 1, 2, 3, and 5, the clusters were ranked according    specially when the credibility information was used (runs 4
to their sizes in descending order and intra-cluster sorting      and 5). Our results suggests that visual features are impor-
was applied using average connectivity. For the credibility-      tant when considering multi-topic queries while the textual
based submission (run 4), the images were clustered accord-       information seems more suitable for one-topic queries.
ing to their owner (user) and the clusters were ranked ac-
cording to the users’ credibility computed as a linear com-
bination of tagSpecificity, uploadFrequency, meanTagRank,         5.   CONCLUSIONS
faceProportion, meanImageTagClarity, photoCount, visual-             For relevance and diversity maximization we proposed fil-
Score, and locationSimilarity [6].                                tering strategies and the combination of multiples features
   The representative images were selected in a round robin       with a rank fusion method. These improved ranked lists
fashion from the final clusters. The number of clusters was       were used as input for a clustering-based summarization
defined as 30 for runs 1 and 3, and 40 for runs 2 and 5.          method. Our experiments suggest that different summa-
                                                                  rization alternatives may result in different effectiveness for
                                                                  one-topic and multi-topic queries.
3.    RUNS SETUP
  We submitted five runs (Table 1). The features used for
diversification were selected according to the best results on
                                                                  6.   ACKNOWLEDGMENTS
the development set.                                                We thank the support of UEFS/PROBIC and FAPESP.
7.   REFERENCES                                                      detection for digital images using wavelet transform.
 [1] R. T. Calumby, R. da S. Torres, and M. A. Gonçalves.           In Proceedings of the IEEE International Conference
     Diversity-driven learning for multimodal image                  on Multimedia and Expo, pages 17–20 Vol.1, 2004.
     retrieval with relevance feedback. In Proceedings of the   [15] J. A. Vargas, R. da S. Torres, and M. A. Gonçalves. A
     21st IEEE International Conference on Image                     soft computing approach for learning to aggregate
     Processing, pages 2197–2201, 2014.                              rankings. In Proceedings of the 24th ACM
 [2] J. Carbonell and J. Goldstein. The use of mmr,                  International Conference on Conference on
     diversity-based reranking for reordering documents              Information and Knowledge Management, 2015. (in
     and producing summaries. In Proceedings of the 21st             press).
     Anual International ACM SIGIR Conference on                [16] H. P. Young. An axiomatization of borda’s rule.
     Research and Development in Information Retrieval,              Journal of Economic Theory, 9(1):43–52, 1974.
     pages 335–336, 1998.
 [3] G. V. Cormack, C. L. A. Clarke, and S. Buettcher.
     Reciprocal rank fusion outperforms condorcet and
     individual rank learning methods. In Proceedings of
     the 32nd International ACM SIGIR Conference on
     Research and Development in Information Retrieval,
     pages 758–759, 2009.
 [4] R. Fagin, R. Kumar, and D. Sivakumar. Efficient
     similarity search and classification via rank
     aggregation. In Proceedings of the 2003 ACM
     SIGMOD International Conference on Management of
     Data, pages 301–312, 2003.
 [5] S. Gollapudi and A. Sharma. An axiomatic approach
     for result diversification. In Proceedings of the 18th
     International Conference on World Wide Web, pages
     381–390, 2009.
 [6] B. Ionescu, A. Gı̂nscă, B. Boteanu, A. Popescu,
     M. Lupu, and H. Müller. Retrieving diverse social
     images at mediaeval 2015: Challenge, dataset and
     evaluation. In Working Notes Proceedings of the
     MediaEval 2015 Workshop, Wurzen, Setember 14-15
     2015. CEUR-WS.org.
 [7] J. R. Koza. Genetic Programming: On the
     Programming of Computers by Means of Natural
     Selection. MIT Press, Cambridge, MA, USA, 1992.
 [8] M. Lux and S. A. Chatzichristofis. LIRE: lucene image
     retrieval: an extensible java CBIR library. In
     Proceedings of the 16th ACM International Conference
     on Multimedia, pages 1085–1088, 2008.
 [9] A. Oliva and A. Torralba. Modeling the shape of the
     scene: A holistic representation of the spatial
     envelope. 42(3):145–175, 2001.
[10] D. C. G. Pedronette and R. da S. Torres. Image
     re-ranking and rank aggregation based on similarity of
     ranked lists. In Proceedings of the 14th International
     Conference on Computer Analysis of Images and
     Patterns - Volume Part I, pages 369–376, 2011.
[11] O. A. B. Penatti, F. B. Silva, E. Valle,
     V. Gouet-Brunet, and R. da S. Torres. Visual word
     spatial arrangement for image retrieval and
     classification. Pattern Recognition, 47(2):705–720,
     2014.
[12] J. A. Shaw, E. A. Fox, J. A. Shaw, and E. A. Fox.
     Combination of multiple searches. In The Second Text
     REtrieval Conference (TREC-2), pages 243–252, 1994.
[13] R. Stehling, M. Nascimento, and A. Falcão. A
     compact and efficient image retrieval approach based
     on border/interior pixel classification. In Proceedings
     of the 11th International Conference on Information
     and Knowledge Management, pages 102–109, 2002.
[14] H. Tong, M. Li, H. Zhang, and C. Zhang. Blur