=Paper= {{Paper |id=Vol-1739/MediaEval_2016_paper_17 |storemode=property |title=UNED-UV @ Retrieving Diverse Social Images Task |pdfUrl=https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_17.pdf |volume=Vol-1739 |dblpUrl=https://dblp.org/rec/conf/mediaeval/GonzalezGGC16 }} ==UNED-UV @ Retrieving Diverse Social Images Task== https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_17.pdf
     UNED-UV@2016 Retrieving Diverse Social Images Task

                   A. Castellanos                              X. Benavent                    A. García-Serrano
           NLP&IR Group UNED, Madrid                     Informatics, U. Valencia        NLP&IR Group UNED, Madrid
            acastellanos@lsi.uned.es                      xaro.benavent@uv.es               agarcia@lsi.uned.es
                                                                 E. de Ves
                                                         Informatics, U. Valencia
                                                           esther.deves@uv.es

ABSTRACT                                                                images [8]. In the next step, we select the highest proba-
This paper details the participation of the UNED-UV group               bility image at each textual FCA cluster or visual k-means
at the MediaEval 2016 Retrieving Diverse Social Images                  clustering to generate the list of similar and diverse images.
Task using a multimodal approach. Several Local Logistic
Regression models, which use the visual low-level features,             2.1    Relevancy via relevance feedback
estimate the relevance probability for all the images in the
                                                                           We try to get the relevance of each of the images to the
dataset. Then, the images are ranked by selecting the high-
                                                                        given query by using a relevance feedback algorithm [7]. The
est probability image at each of the textual clusters. These
                                                                        general methodology involves four steps:
textual clusters are generated by making use of a textual
algorithm based on Formal Concept Analysis (FCA) and                    1.     Reduction of the data dimensionality. The provided
Hierarchical Agglomerative Clustering (HAC) to detect the               low-level visual features [8] are used to generate a feature
latent topics addressed. The images will be then diversified            vector associated to each image that will be generically de-
according to detected topics.                                           noted as x in a dimensional space N = 8194. These features
                                                                        are reduced using a Principal Component Analysis (PCA).
                                                                        We retain only the first components that account for 80%
1.    INTRODUCTION                                                      of the data variability. We have used this idea to reduce the
   Information retrieval systems have mainly relied on opti-            original dimension of our characteristic space in a new char-
mizing the result list according to accuracy-based metrics.             acteristic vector of dimension M < N , being M = 52. One
However, when dealing with image retrieval, systems should              of the advantages of this reduction is that the new trans-
be able to offer relevant but also diverse results [1]. Then,           formed components are in decreasing order with respect to
we propose a multimodal approach that works with rele-                  the variance explained by the corresponding principal com-
vance and diversity. It uses a relevance feedback algorithm             ponent;
developed by the UV group [3, 6, 7]. This method estimates              2. Selecting the relevant and non-relevant sets. The user
the similarity probability of all the images from the dataset           looks at a few screens, each showing some images and marks
using visual low-level features by means of several Local Lo-           some of them as being relevant and non-relevant (run4 ). For
gistic Regression models (LLR) [10]. It has extensively been            the automatic runs (run1, run3 and run5 ), these sets are au-
proven that the visual information has a great impact in                tomatically selected. Relevant images are the first P images
the information retrieval systems [12]. For diversification,            of the ranked Flicker list that belong to different textual-
we propose an approach to represent an image by applying                FCA o visual (k-means) clusters, being P = 5. The goal
the concepts covered by the textual information of the im-              of selecting relevant images from different clusters is to give
ages. This conceptual representation is tackled by means of             diverse relevant example images for the model to improve
Formal Concept Analysis (FCA) [13], a data organization                 the diversity in the estimation. We use a K-means cluster-
technique. In our participation in previous editions of this            ing by using provided visual low-level features in run1 and
task, we proved that this approach was able to identify the             run5, and a textual FCA cluster by using provided textual
different topics addressed in the images, allowing the diver-           features [8] in run3. Non-relevant images are selected from
sification of the results list according to them [5, 4].                other topics at each query;
                                                                        3. Parameter estimation of the Local Logistic Regression
2.    SYSTEM DESCRIPTION                                                Models [10]. The reduced feature vectors (PCA) and the
  We present a two-step system that ranks a list of retrieved           relevant and non-relevant sets are the inputs or several Local
images taking into account the relevance of the retrieved               Logistic Regression models whose outputs are the probabil-
images to the given query and showing as much diversity                 ities that an image belongs to the relevant set. The feature
as possible. The first step is based on a relevance feedback            vector is splitted dynamically in m groups of non-fixed size.
algorithm that estimates the relevance of each image to the             Each group is used for adjusting the model of higher order,
query by using the provided visual low-level features of the            given the inputs sets and PCA components;
                                                                        4. Ranking of the database. Models are evaluated on all
                                                                        the images of the database and return the probabilities of
Copyright is held by the author/owner(s).                               being relevant for each estimated model; as results, we have
MediaEval 2016 Workshop, October 20-21, 2016, Hilversum, Netherlands.   a probability vector (p) of dimension m for each individual
image. We combine these probabilities in just one by using
a weighted average. The weights (w) for a given probability                 Table 1: Description of the runs.
                                                                                  Relevance algorithm                    Ranking
are obtained by the amount of variance accounted for the
                                                                               Relevant           Non-relevant
group of components used to adjust the model. Finally, this
                                                                                images               images              clusters
procedure gives us a score/probability for each image.
                                                                          text visual human other human                text visual
2.2     Clusters for diversity                                    run1           X                 X                           X
  The final rank is generated by selecting the highest prob-      run2                                                  X
ability image from the different clusters trying to give as       run3     X                          X                 X
much diversity as possible to the diverse final list. If there    run4                      X                  X        X
are less than 50 clusters, a second highest probability im-       run5             X                  X                 X
age selection is done. We have generated clusters for the
ranking using textual information, from run2 to run5, and
k-means clusters by using the visual information of the im-      Table 2: Official Metrics for Retrieving Diverse So-
ages (run1 ).                                                    cial Images Task. Best result, human-based is in
                                                                 bold, and best second result, automatic run, is in
2.2.1    Textual clusters with FCA                               italics.
                                                                           run     P@20     CR@20 F1@20
   The presented clustering procedure is based on the discov-
ering of the latent topics addressed by the textual informa-               run1 0.4180      0.3538   0.3637
tion in the images. To that end, Formal Concept Analysis                   run2 0.5367      0.4133   0.4425
is proposed to detect these topics and a Hierarchical Ag-                  run3 0.4305      0.3544   0.3745
glomerative Clustering (HAC) [11] to group similar images                  run4 0.5734 0.4252 0.4597
together into the detected topics. Each HAC-based cluster                  run5 0.5602      0.4179   0.4562
contains the image set covering a similar topic. Thereafter,
the images of each cluster are ranked according to their di-
versity based on their visual features.                          human-based run. All of them use our two-step system ex-
                                                                 cept run2 that uses only the second step, ranking the images
                                                                 by textual clusters with FCA. Three of them are multimodal
FCA-based Modelling.
                                                                 runs (run3, run4 and run5 ) using both textual and visual
   Formal Concept Analysis (FCA) is a theory of concept
                                                                 information.
formation [13] for the organization of content according to
                                                                    Results are presented in table 2. It is interesting to point
their related features. The basic formation of FCA is the
                                                                 out that our best result for both precision and diversification
formal context, a structure K := (G, M, I), where G is a set
                                                                 is obtained with the multimodal human-based approach: es-
of objects, M a set of attributes related to these objects and
                                                                 timating the probability of the relevance of each image to the
I a binary relationship between G and M , denoted by gIm:
                                                                 query by a LLR model, and then ranking the final list with
the object g has the attribute m. From the formal context,
                                                                 FCA clusters, run4, F @20 = 0.4597. Second best result,
a set of formal concepts can be inferred i.e., a formal con-
                                                                 run5, F @20 = 0.4562, is the same run4 approach, but au-
cept is a pair (A, B) of images A and the features shared
                                                                 tomatically selecting the relevant and non-relevant images
by those images B and organized in a lattice from the most
                                                                 sets for estimating the LLR models. It is also important
generic to the most specific one. By applying FCA to the
                                                                 to observe that both automatic multimodal runs, run5 and
formal context containing the textual information of the im-
                                                                 run4, overcome the automatic monomodal runs, run1 and
ages, they are modelled in terms of formal concepts, which
                                                                 run2 as expected.
group together the images sharing a same set of features.
In order to select only those most-representative features,
we applied Kullback-Leibler Divergence (KLD) [9] on the          4.   CONCLUSIONS
textual contents related to the images. This KLD-based se-         We presented a multimodal approach, that estimates the
lection represents each image by the textual contents that       relevance of the images by a relevance feedback algorithm
better differentiates a image from the other ones.               using the visual features for determining the similarity. To
                                                                 handle image diversification, we apply a conceptual-based
HAC-based grouping.                                              procedure (based on FCA and HAC) to cluster the images
   From the FCA formal concepts, a set of diverse image          according to the latent topics addressed by their textual con-
groups is created by applying a HAC algorithm [11]. Specif-      tent. Results show that our multimodal approach works
ically, we propose a Single Linking Hierarchical Clustering      properly for retrieving similar diverse images. Results also
that groups together similar formal concepts and the Zero-       show the importance of a proper selection of relevant and
Induces index to set the cluster similarity [2].                 non-relevant sets for the relevance algorithm. A human
                                                                 knows better the meaning and the diversity of the topics.
2.2.2    Visual clusters                                         Our challenge is to make the approach automatic to be able
  The clusters are made by a k-means procedure (k = 18)          to select these relevant and non-relevant images as a human
over the PCA components of the provided visual low-level         being. The presented results are encouraging.
features [8].
                                                                 Acknowledgements. This work has been partially sup-
3.    RESULTS                                                    ported by PhD fellowship (FPI-UNED 2014), VOXPOPULI
  We submitted five runs (see table 1), four of them are         (TIN2013-47090-C3-1-P), MUSACCES (S2015/HUM3494)
automatic (run1, run2, run3 and run5 ), and one, run4 a          and DPI2013-47279-C2-1-R projects.
5.   REFERENCES
 [1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong.
     Diversifying search results. In Proceedings of the
     Second ACM International Conference on Web Search
     and Data Mining, pages 5–14. ACM, 2009.
 [2] F. Alqadah and R. Bhatnagar. Similarity measures in
     formal concept analysis. Annals of Mathematics and
     Artificial Intelligence, 61(3):245–256, 2011.
 [3] X. Benavent, A. Garcia-Serrano, R. Granados,
     J. Benavent, and E. de Ves. Multimedia information
     retrieval based on late semantic fusion approaches:
     Experiments on a wikipedia image collection.
     Multimedia, IEEE Transactions on, PP(99):1–1, 2013.
 [4] A. Castellanos, X. Benavent, A. Garcı́a-Serrano,
     E. de Ves, and J. Cigarrán. Uned-uv @ retrieving
     diverse social images task. In MediaEval Multimedia
     Benchmark Workshop, CEUR-WS.org, 1436, ISSN
     1613-0073, 2015.
 [5] A. Castellanos, J. Cigarrán, and A. Garcı́a-Serrano.
     Uned @ retrieving diverse social images task. In
     MediaEval Multimedia Benchmark Workshop,
     CEUR-WS.org, 1263, ISSN 1613-0073, 2014.
 [6] E. de Ves, G. Ayala, X. Benavent, J. Domingo, and
     E. Dura. Modeling user preferences in content-based
     image retrieval: a novel attempt to bridge the
     semantic gap. Neurocomputing, 168:829–845, 2015.
 [7] E. de Ves, X. Benavent, I. Coma, and G. Ayala. A
     novel dynamic multi-model relevance feedback
     procedure for content-based image retrieval.
     Neurocomputing, pages 99–107, October 2016.
 [8] B. Ionescu, A. L. Gı̂nscǎ, M. Zaharieva, B. Boteanu,
     M. Lupu, , and H. Müller. Retrieving Diverse Social
     Images at MediaEval 2016: Challenge, Dataset and
     Evaluation. In MediaEval 2016 Workshop, October
     20-21, Hilversum, Netherlands, 2016.
 [9] S. Kullback and R. A. Leibler. On information and
     sufficiency. The Annals of Mathematical Statistics,
     22(1):79–86, 1951.
[10] C. Loader. Local regression and likelihood. New York:
     Springer-Verlag, 1999.
[11] C. D. Manning, P. Raghavan, and H. Schütze.
     Hierarchical clustering. pages 377–403. 2008.
[12] S. Rudinac, A. Hanjalic, and M. Larson. Generating
     visual summaries of geographic areas using
     community-contributed images. IEEE Transactions on
     Multimedia, 15(4):921–932, 2013.
[13] R. Wille. Concept lattices and conceptual knowledge
     systems. Computers & mathematics with applications,
     23(6):493–515, 1992.