Recod @ MediaEval 2016: Diverse Social Images Retrieval

                   Cristiano D. Ferreira1 , Rodrigo T. Calumby2 , Iago B. A. do C. Araujo2
                 Ícaro C. Dourado1 , Javier A. V. Munoz1 , Otávio A. B. Penatti1,3 , Lin T. Li1
                               Jurandy Almeida1,4 and Ricardo da S. Torres1
     1 RECOD Lab, University of Campinas, Campinas, SP – Brazil, 2 University of Feira de Santana, Feira de Santana, BA – Brazil
                                    3 SAMSUNG Research Institute Brazil, Campinas, SP – Brazil
                           4 GIBIS Lab, Federal University of São Paulo, São José dos Campos, SP – Brazil

              [rtcalumby,ibacaraujo]@ecomp.uefs.br, [crferreira,icarocd,jalvarm.acm]@gmail.com, o.penatti@samsung.com,
                                     jurandy.almeida@unifesp.br, [lintzyli,rtorres]@ic.unicamp.br


ABSTRACT
This paper presents the RECOD team experience in the Re-
trieving Diverse Social Images Task at MediaEval 2016. The
teams were required to develop a diversification approach for
social photo retrieval. Our proposal is based on re-ranking,
rank aggregation, and diversity promotion, allowing employ-
ment of textual and visual information apart or fused.


1.    INTRODUCTION
   The relevance-diversity trade-off is an important problem
associated with several search scenarios. Promoting diver-
sity in retrieval results has been shown to positively impact             Figure 1: Overview of the proposed approach.
the user search experience specially for ambiguous, under-
specified, and visual summarization queries [2].
   The Retrieving Diverse Social Images Task 2016 [7] task            in the Lire package [10].1
addresses the problem of image search result diversification            For text-only and multimodal runs, we used the cosine [1],
in the context of social media. This paper describes the RE-          BM25 [1], Dice [9], Jaccard [9], and TF-IDF measures which
COD group contributions via diversity promotion boosted               were computed using the provided TF, DF, and TF-IDF
by rank fusion.                                                       vectors.

                                                                      2.2     Re-ranking and Aggregation
2.    PROPOSED APPROACH                                                 For improving the original list ranking, several textual
  Our proposal follows the general workflow presented in              measures (Section 2.1) were employed for re-ranking. The
Figure 1. The first step, Re-ranking, ranks the original list         text-based scores were computed as the similarity between
provided by Flickr according to a text-based descriptor. The          the text vectors associated with the query topic and the
Fusion step employs Genetic Programming (GP) [8] to ag-               image associated text vectors. For visual only run, the re-
gregate lists re-ranked by several text-based descriptors. Fi-        ranking step was skipped.
nally, the Diversification step exploits visual and textual-            For feature fusion, re-ranked lists were combined using
based descriptors to promote diversification at the resulting         the GP approach from [16], which uses several rank ag-
ranking.                                                              gregation methods. This method took as input the Flickr
  The next sections provide a more detailed description of            query result re-ranked considering each textual similarity
our approach.                                                         measures. Then, it was trained using the development
                                                                      data and combined by order-based (MRA [5], RRF [4], and
2.1     Visual Features and Text Similarity                           BordaCount [17]) and score-based (CombMIN, CombMAX,
  For visual similarity, besides the provided features, we            CombSUM, ComMED, CombANZ [14], and RLSim [12])
also extracted: (i) two general-purpose global descriptors            rank fusion methods.
(BIC [15] and GIST [11]); (ii) a bag-of-visual-words (BoVW)
descriptor, based on sparse (Harris-Laplace detector) SIFT,           2.3     Diversification Method
with 1000 visual words (randomly selected), soft assignment             After re-ranking and aggregation steps, the improved
(σ = 150), and max pooling with spatial pyramids or Word              relevance-based lists were submitted to explicit diversifica-
Spatial Arrangement (WSA) [13] for encoding the spatial ar-
                                                                      1
rangement of visual words; and (iii) fifteen features available         CEDD, FCTH, OpponentHistogram, JointHistogram, Au-
                                                                      toColorCorrelogram, ColorLayout, EdgeHistogram, Ga-
                                                                      bor, JCD, JpegCoefficientHistogram, ScalableColor, Simple-
Copyright is held by the author/owner(s).                             ColorHistogram, Tamura, LuminanceLayout, and PHOG.
MediaEval 2016 Workshop, October 20-21, 2016, Hilversum,              Available at: http://www.lire-project.net/ (As of Sep.
Netherlands.                                                          2016).
tion. Visual and textual descriptors were employed (Sec-              CombMAX(                  MRA(
                                                                       BordaCount(               CombMNZ(
tion 2.1). We evaluated five methods: clustering-based                  CombMAX(                  CombMAX(
(k-Medoids, agglomerative and Birch [18]) and re-ranking-                CombMNZ(COSINE, BM25),    CombSUM(BM25, COSINE),
based (MMR [3] and MSD [6]).                                             COSINE_ME                 RRF(JACCARD, COSINE)),
   Agglomerative and Birch methods achieved significantly               ),                        BM25),
superior results on the development set, thus they were used            CombMAX(                 MRA(TFIDF, COSINE_ME))
                                                                         CombMNZ(DICE, DICE),
in the submitted runs.                                                   COSINE
   In the clustering step, for agglomerative method, centroid           )),
and average link linkage methods were employed using Col-              CombMIN(
orLayout for distance computing. Forty clusters were con-               RRF(
                                                                         CombMNZ(DICE, TFIDF),
sidered in our approach. For Birch method, a maximum of
                                                                         COSINE_ME
51 entries per node was admitted, with a distance threshold             ),
of 0.3, and also considering cluster refining. The representa-          CombMAX(
tive images were selected in a round robin fashion from the              MRA(TFIDF, BM25Orig),
final clusters.                                                          COSINE
                                                                        )))
2.4   Workflow Discussions                                                         (a)                          (b)

  It is important to observe that re-ranking and GP fusion
are optional steps at the workflow presented in Figure 1.             Figure 2: GP individuals for rank aggregation.
Depending on run requirements or on the desired experiment
goals, one or both can be skipped. For example, for run 1,
                                                                          Table 1: DevSet and TestSet Results
which is visual only, neither of those steps were employed.                         DevSet                      TestSet
  Furthermore, the diversification step can employ textual        Run     P@20      CR@20     F1@20    P@20     CR@20     F1@20
or visual information alone or together. It allows adherence       1      0.6821     0.4641   0.5359   0.5180    0.4001   0.4258
                                                                   2      0.6814    0.4643    0.5403   0.5000    0.3709   0.4045
to the single modality requirement of runs 1 and 2. Section 3      3      0.6714     0.4519   0.5268   0.4969    0.3881   0.4107
will present details of each run configuration.                    4      0.6614     0.4578   0.5248   0.5156   0.4173    0.4339
                                                                   5      0.6550     0.4525   0.5196   0.5156    0.4065   0.4379

3.    RUNS SETUP
   We submitted five runs.
   Run 1 – (required) visual information only. No re-ranking     the worst results on this set. However, on the test set, they
and no GP-fusion were employed. Diversification provided         presented the best results.
by Agglomerative method, using average link method with            As we can observe, in most cases, the re-ranking over the
ColorLayout as visual feature and grouping images into 40        Flickr initial ranking improved the overall results, even when
clusters.                                                        employing only textual features for this task.
   Run 2 – (required) text information only. Re-ranking con-       Considering test set results of runs 4 and 5 over 1 and 2, we
sidering cosine similarity was employed. Diversification pro-    can also notice that the retrieval process illustrated in Fig-
vided by Birch method, using 51 as maximum entries per           ure 1 can benefit from fusing visual and textual information.
node with distance threshold of 0.3.                             We believe that textual and visual information have a com-
   Run 3 – (required) text-visual fused. Re-ranking consider-    plementary nature for an effective image retrieval process.
ing cosine similarity was employed. Diversification provided     Textual information introduces the notion of context around
by Birch method, using 51 as maximum entries per node            retrieval, but ignores the image itself by not inspecting its
with distance threshold of 0.19;                                 content. On the other hand, content-based image retrieval
   Run 4 – (optional) general run. GP-fusion rank aggre-         lacks context. By initial retrieval and re-ranking based on
gation employed (Figure 2 - a). Diversification provided         textual information and introducing visual features at the
by Agglomerative method, using average link method with          diversification step this complementary nature is explored
ColorLayout as visual feature and grouping images into 40        and provided the best results.
clusters;
   Run 5 – (optional) general run. GP-fusion rank aggrega-       5.     CONCLUSIONS
tion employed (Figure 2 - b). Diversification provided by
Agglomerative method, using centroid linkage method with            For relevance and diversity maximization, we proposed
ColorLayout as visual feature and grouping images into 40        re-ranking strategies and the combination of multiple fea-
clusters.                                                        tures with a rank fusion method. These improved ranked
   The diversification methods, parameters, features, and        lists were used as input for a clustering-based summariza-
textual similarities used were selected according to the best    tion method. Our experiments suggest that aggregation of
results on the development set.                                  multiple re-ranked lists and fusion of visual and textual in-
                                                                 formation can improve retrieval effectiveness. It is interest-
                                                                 ing to observe the recurrent selection of the cosine measure
4.    RESULTS AND DISCUSSION                                     on both GP aggregation individuals.
   Table 1 presents the results for the five runs for the de-
velopment and test sets. The best results (F1@20) on the         Acknowledgments
development set were achieved on run 2, followed by runs 1
and 3, in which textual information was used. Runs 4 and         We thank the support of CAPES, CNPq, and FAPESP.
5, which employed GP-fusion rank aggregation, have shown
6.   REFERENCES                                                      REtrieval Conference (TREC-2), pages 243–252, 1994.
 [1] R. Baeza-Yates and B. Ribeiro-Neto. Modern                 [15] R. Stehling, M. Nascimento, and A. Falcão. A
     Information Retrieval - the concepts and technology             compact and efficient image retrieval approach based
     behind search, Second edition. Pearson Education Ltd.,          on border/interior pixel classification. In Proceedings
     Harlow, England, 2011.                                          of the 11th International Conference on Information
 [2] R. T. Calumby, R. da S. Torres, and M. A. Gonçalves.           and Knowledge Management, pages 102–109, 2002.
     Diversity-driven learning for multimodal image             [16] J. A. Vargas, R. da S. Torres, and M. A. Gonçalves. A
     retrieval with relevance feedback. In Proceedings of the        soft computing approach for learning to aggregate
     21st IEEE International Conference on Image                     rankings. In Proceedings of the 24th ACM
     Processing, pages 2197–2201, 2014.                              International on Conference on Information and
 [3] J. Carbonell and J. Goldstein. The use of mmr,                  Knowledge Management, CIKM ’15, pages 83–92, New
     diversity-based reranking for reordering documents              York, NY, USA, 2015. ACM.
     and producing summaries. In Proceedings of the 21st        [17] H. P. Young. An axiomatization of borda’s rule.
     Anual International ACM SIGIR Conference on                     Journal of Economic Theory, 9(1):43–52, 1974.
     Research and Development in Information Retrieval,         [18] T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An
     pages 335–336, 1998.                                            efficient data clustering method for very large
 [4] G. V. Cormack, C. L. A. Clarke, and S. Buettcher.               databases. In Proceedings of the 1996 ACM SIGMOD
     Reciprocal rank fusion outperforms condorcet and                International Conference on Management of Data,
     individual rank learning methods. In Proceedings of             SIGMOD ’96, pages 103–114, New York, NY, USA,
     the 32nd International ACM SIGIR Conference on                  1996. ACM.
     Research and Development in Information Retrieval,
     pages 758–759, 2009.
 [5] R. Fagin, R. Kumar, and D. Sivakumar. Efficient
     similarity search and classification via rank
     aggregation. In Proceedings of the 2003 ACM
     SIGMOD International Conference on Management of
     Data, pages 301–312, 2003.
 [6] S. Gollapudi and A. Sharma. An axiomatic approach
     for result diversification. In Proceedings of the 18th
     International Conference on World Wide Web, pages
     381–390, 2009.
 [7] B. Ionescu, A. L. Gı̂nscă, M. Zaharieva, B. Boteanu,
     M. Lupu, and H. Müller. Retrieving diverse social
     images at mediaeval 2016: Challenge, dataset and
     evaluation. In Proceedings of the MediaEval 2016
     Workshop, Hilversum, Netherlands, Oct. 20-21 2016.
 [8] J. R. Koza. Genetic Programming: On the
     Programming of Computers by Means of Natural
     Selection. MIT Press, Cambridge, MA, USA, 1992.
 [9] J. Lewis, S. Ossowski, J. Hicks, M. Errami, and H. R.
     Garner. Text similarity: an alternative way to search
     medline. Bioinformatics, 22(18):2298–2304, 2006.
[10] M. Lux and S. A. Chatzichristofis. LIRE: lucene image
     retrieval: an extensible java CBIR library. In
     Proceedings of the 16th ACM International Conference
     on Multimedia, pages 1085–1088, 2008.
[11] A. Oliva and A. Torralba. Modeling the shape of the
     scene: A holistic representation of the spatial
     envelope. International Journal of Computer Vision,
     42(3):145–175, 2001.
[12] D. C. G. Pedronette and R. da S. Torres. Image
     re-ranking and rank aggregation based on similarity of
     ranked lists. In Proceedings of the 14th International
     Conference on Computer Analysis of Images and
     Patterns - Volume Part I, pages 369–376, 2011.
[13] O. A. B. Penatti, F. B. Silva, E. Valle,
     V. Gouet-Brunet, and R. da S. Torres. Visual word
     spatial arrangement for image retrieval and
     classification. Pattern Recognition, 47(2):705–720,
     2014.
[14] J. A. Shaw, E. A. Fox, J. A. Shaw, and E. A. Fox.
     Combination of multiple searches. In The Second Text