=Paper= {{Paper |id=Vol-1739/MediaEval_2016_paper_17 |storemode=property |title=UNED-UV @ Retrieving Diverse Social Images Task |pdfUrl=https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_17.pdf |volume=Vol-1739 |dblpUrl=https://dblp.org/rec/conf/mediaeval/GonzalezGGC16 }} ==UNED-UV @ Retrieving Diverse Social Images Task== https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_17.pdf

UNED-UV@2016 Retrieving Diverse Social Images Task

A. Castellanos X. Benavent A. García-Serrano
NLP&IR Group UNED, Madrid Informatics, U. Valencia NLP&IR Group UNED, Madrid
acastellanos@lsi.uned.es xaro.benavent@uv.es agarcia@lsi.uned.es
E. de Ves
Informatics, U. Valencia
esther.deves@uv.es

ABSTRACT images [8]. In the next step, we select the highest proba-
This paper details the participation of the UNED-UV group bility image at each textual FCA cluster or visual k-means
at the MediaEval 2016 Retrieving Diverse Social Images clustering to generate the list of similar and diverse images.
Task using a multimodal approach. Several Local Logistic
Regression models, which use the visual low-level features, 2.1 Relevancy via relevance feedback
estimate the relevance probability for all the images in the
We try to get the relevance of each of the images to the
dataset. Then, the images are ranked by selecting the high-
given query by using a relevance feedback algorithm [7]. The
est probability image at each of the textual clusters. These
general methodology involves four steps:
textual clusters are generated by making use of a textual
algorithm based on Formal Concept Analysis (FCA) and 1. Reduction of the data dimensionality. The provided
Hierarchical Agglomerative Clustering (HAC) to detect the low-level visual features [8] are used to generate a feature
latent topics addressed. The images will be then diversified vector associated to each image that will be generically de-
according to detected topics. noted as x in a dimensional space N = 8194. These features
are reduced using a Principal Component Analysis (PCA).
We retain only the first components that account for 80%
1. INTRODUCTION of the data variability. We have used this idea to reduce the
Information retrieval systems have mainly relied on opti- original dimension of our characteristic space in a new char-
mizing the result list according to accuracy-based metrics. acteristic vector of dimension M < N , being M = 52. One
However, when dealing with image retrieval, systems should of the advantages of this reduction is that the new trans-
be able to offer relevant but also diverse results [1]. Then, formed components are in decreasing order with respect to
we propose a multimodal approach that works with rele- the variance explained by the corresponding principal com-
vance and diversity. It uses a relevance feedback algorithm ponent;
developed by the UV group [3, 6, 7]. This method estimates 2. Selecting the relevant and non-relevant sets. The user
the similarity probability of all the images from the dataset looks at a few screens, each showing some images and marks
using visual low-level features by means of several Local Lo- some of them as being relevant and non-relevant (run4 ). For
gistic Regression models (LLR) [10]. It has extensively been the automatic runs (run1, run3 and run5 ), these sets are au-
proven that the visual information has a great impact in tomatically selected. Relevant images are the first P images
the information retrieval systems [12]. For diversification, of the ranked Flicker list that belong to different textual-
we propose an approach to represent an image by applying FCA o visual (k-means) clusters, being P = 5. The goal
the concepts covered by the textual information of the im- of selecting relevant images from different clusters is to give
ages. This conceptual representation is tackled by means of diverse relevant example images for the model to improve
Formal Concept Analysis (FCA) [13], a data organization the diversity in the estimation. We use a K-means cluster-
technique. In our participation in previous editions of this ing by using provided visual low-level features in run1 and
task, we proved that this approach was able to identify the run5, and a textual FCA cluster by using provided textual
different topics addressed in the images, allowing the diver- features [8] in run3. Non-relevant images are selected from
sification of the results list according to them [5, 4]. other topics at each query;
3. Parameter estimation of the Local Logistic Regression
2. SYSTEM DESCRIPTION Models [10]. The reduced feature vectors (PCA) and the
We present a two-step system that ranks a list of retrieved relevant and non-relevant sets are the inputs or several Local
images taking into account the relevance of the retrieved Logistic Regression models whose outputs are the probabil-
images to the given query and showing as much diversity ities that an image belongs to the relevant set. The feature
as possible. The first step is based on a relevance feedback vector is splitted dynamically in m groups of non-fixed size.
algorithm that estimates the relevance of each image to the Each group is used for adjusting the model of higher order,
query by using the provided visual low-level features of the given the inputs sets and PCA components;
4. Ranking of the database. Models are evaluated on all
the images of the database and return the probabilities of
Copyright is held by the author/owner(s). being relevant for each estimated model; as results, we have
MediaEval 2016 Workshop, October 20-21, 2016, Hilversum, Netherlands. a probability vector (p) of dimension m for each individual
image. We combine these probabilities in just one by using
a weighted average. The weights (w) for a given probability Table 1: Description of the runs.
Relevance algorithm Ranking
are obtained by the amount of variance accounted for the
Relevant Non-relevant
group of components used to adjust the model. Finally, this
images images clusters
procedure gives us a score/probability for each image.
text visual human other human text visual
2.2 Clusters for diversity run1 X X X
The final rank is generated by selecting the highest prob- run2 X
ability image from the different clusters trying to give as run3 X X X
much diversity as possible to the diverse final list. If there run4 X X X
are less than 50 clusters, a second highest probability im- run5 X X X
age selection is done. We have generated clusters for the
ranking using textual information, from run2 to run5, and
k-means clusters by using the visual information of the im- Table 2: Official Metrics for Retrieving Diverse So-
ages (run1 ). cial Images Task. Best result, human-based is in
bold, and best second result, automatic run, is in
2.2.1 Textual clusters with FCA italics.
run P@20 CR@20 F1@20
The presented clustering procedure is based on the discov-
ering of the latent topics addressed by the textual informa- run1 0.4180 0.3538 0.3637
tion in the images. To that end, Formal Concept Analysis run2 0.5367 0.4133 0.4425
is proposed to detect these topics and a Hierarchical Ag- run3 0.4305 0.3544 0.3745
glomerative Clustering (HAC) [11] to group similar images run4 0.5734 0.4252 0.4597
together into the detected topics. Each HAC-based cluster run5 0.5602 0.4179 0.4562
contains the image set covering a similar topic. Thereafter,
the images of each cluster are ranked according to their di-
versity based on their visual features. human-based run. All of them use our two-step system ex-
cept run2 that uses only the second step, ranking the images
by textual clusters with FCA. Three of them are multimodal
FCA-based Modelling.
runs (run3, run4 and run5 ) using both textual and visual
Formal Concept Analysis (FCA) is a theory of concept
information.
formation [13] for the organization of content according to
Results are presented in table 2. It is interesting to point
their related features. The basic formation of FCA is the
out that our best result for both precision and diversification
formal context, a structure K := (G, M, I), where G is a set
is obtained with the multimodal human-based approach: es-
of objects, M a set of attributes related to these objects and
timating the probability of the relevance of each image to the
I a binary relationship between G and M , denoted by gIm:
query by a LLR model, and then ranking the final list with
the object g has the attribute m. From the formal context,
FCA clusters, run4, F @20 = 0.4597. Second best result,
a set of formal concepts can be inferred i.e., a formal con-
run5, F @20 = 0.4562, is the same run4 approach, but au-
cept is a pair (A, B) of images A and the features shared
tomatically selecting the relevant and non-relevant images
by those images B and organized in a lattice from the most
sets for estimating the LLR models. It is also important
generic to the most specific one. By applying FCA to the
to observe that both automatic multimodal runs, run5 and
formal context containing the textual information of the im-
run4, overcome the automatic monomodal runs, run1 and
ages, they are modelled in terms of formal concepts, which
run2 as expected.
group together the images sharing a same set of features.
In order to select only those most-representative features,
we applied Kullback-Leibler Divergence (KLD) [9] on the 4. CONCLUSIONS
textual contents related to the images. This KLD-based se- We presented a multimodal approach, that estimates the
lection represents each image by the textual contents that relevance of the images by a relevance feedback algorithm
better differentiates a image from the other ones. using the visual features for determining the similarity. To
handle image diversification, we apply a conceptual-based
HAC-based grouping. procedure (based on FCA and HAC) to cluster the images
From the FCA formal concepts, a set of diverse image according to the latent topics addressed by their textual con-
groups is created by applying a HAC algorithm [11]. Specif- tent. Results show that our multimodal approach works
ically, we propose a Single Linking Hierarchical Clustering properly for retrieving similar diverse images. Results also
that groups together similar formal concepts and the Zero- show the importance of a proper selection of relevant and
Induces index to set the cluster similarity [2]. non-relevant sets for the relevance algorithm. A human
knows better the meaning and the diversity of the topics.
2.2.2 Visual clusters Our challenge is to make the approach automatic to be able
The clusters are made by a k-means procedure (k = 18) to select these relevant and non-relevant images as a human
over the PCA components of the provided visual low-level being. The presented results are encouraging.
features [8].
Acknowledgements. This work has been partially sup-
3. RESULTS ported by PhD fellowship (FPI-UNED 2014), VOXPOPULI
We submitted five runs (see table 1), four of them are (TIN2013-47090-C3-1-P), MUSACCES (S2015/HUM3494)
automatic (run1, run2, run3 and run5 ), and one, run4 a and DPI2013-47279-C2-1-R projects.
5. REFERENCES
[1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong.
Diversifying search results. In Proceedings of the
Second ACM International Conference on Web Search
and Data Mining, pages 5–14. ACM, 2009.
[2] F. Alqadah and R. Bhatnagar. Similarity measures in
formal concept analysis. Annals of Mathematics and
Artificial Intelligence, 61(3):245–256, 2011.
[3] X. Benavent, A. Garcia-Serrano, R. Granados,
J. Benavent, and E. de Ves. Multimedia information
retrieval based on late semantic fusion approaches:
Experiments on a wikipedia image collection.
Multimedia, IEEE Transactions on, PP(99):1–1, 2013.
[4] A. Castellanos, X. Benavent, A. Garcı́a-Serrano,
E. de Ves, and J. Cigarrán. Uned-uv @ retrieving
diverse social images task. In MediaEval Multimedia
Benchmark Workshop, CEUR-WS.org, 1436, ISSN
1613-0073, 2015.
[5] A. Castellanos, J. Cigarrán, and A. Garcı́a-Serrano.
Uned @ retrieving diverse social images task. In
MediaEval Multimedia Benchmark Workshop,
CEUR-WS.org, 1263, ISSN 1613-0073, 2014.
[6] E. de Ves, G. Ayala, X. Benavent, J. Domingo, and
E. Dura. Modeling user preferences in content-based
image retrieval: a novel attempt to bridge the
semantic gap. Neurocomputing, 168:829–845, 2015.
[7] E. de Ves, X. Benavent, I. Coma, and G. Ayala. A
novel dynamic multi-model relevance feedback
procedure for content-based image retrieval.
Neurocomputing, pages 99–107, October 2016.
[8] B. Ionescu, A. L. Gı̂nscǎ, M. Zaharieva, B. Boteanu,
M. Lupu, , and H. Müller. Retrieving Diverse Social
Images at MediaEval 2016: Challenge, Dataset and
Evaluation. In MediaEval 2016 Workshop, October
20-21, Hilversum, Netherlands, 2016.
[9] S. Kullback and R. A. Leibler. On information and
sufficiency. The Annals of Mathematical Statistics,
22(1):79–86, 1951.
[10] C. Loader. Local regression and likelihood. New York:
Springer-Verlag, 1999.
[11] C. D. Manning, P. Raghavan, and H. Schütze.
Hierarchical clustering. pages 377–403. 2008.
[12] S. Rudinac, A. Hanjalic, and M. Larson. Generating
visual summaries of geographic areas using
community-contributed images. IEEE Transactions on
Multimedia, 15(4):921–932, 2013.
[13] R. Wille. Concept lattices and conceptual knowledge
systems. Computers & mathematics with applications,
23(6):493–515, 1992.