=Paper= {{Paper |id=Vol-1739/MediaEval_2016_paper_20 |storemode=property |title=LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feedback Diversification Perspective |pdfUrl=https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_20.pdf |volume=Vol-1739 |dblpUrl=https://dblp.org/rec/conf/mediaeval/BoteanuCI16 }} ==LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feedback Diversification Perspective== https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_20.pdf
   LAPI @ 2016 Retrieving Diverse Social Images Task:
A Pseudo-Relevance Feedback Diversification Perspective

                          Bogdan Boteanu, Mihai Gabriel Constantin, Bogdan Ionescu
                                      LAPI, University “Politehnica” of Bucharest, Romania
                             {bboteanu, mgconstantin, bionescu}@alpha.imag.pub.ro



ABSTRACT
In this paper we present the results achieved during the 2016 Media-
Eval Retrieving Diverse Social Images Task, using an approach
based on pseudo-relevance feedback, in which human feedback is
replaced by an automatic selection of images. The proposed ap-
proach is designed to have in priority the diversification of the re-
sults, in contrast to most of the existing techniques that address
only the relevance. Diversification is achieved by exploiting a hi-
erarchical clustering scheme followed by a diversification strategy.
Methods are tested on the benchmarking data and results are ana-
lyzed. Insights for future work conclude the paper.
                                                                                Figure 1: General scheme of the proposed approach
1.     INTRODUCTION
   An efficient information retrieval system should be able to pro-     stitutes the user needed in traditional RF and in proposing several
vide search results which are in the same time relevant for the query   diversity-adapted relevance feedback schemes.
and cover different aspects of it, i.e., diverse. The 2016 Retrieving
Diverse Social Images Task [1] addresses this issue in the context
of a general ad-hoc image retrieval system, which provides the user     2.    PROPOSED APPROACH
with diverse representations of the queries. The system should be          In traditional RF techniques, recording actual user feedback is
able to tackle complex and general-purpose multi-concept queries.       inefficient in terms of time and human resources. The proposed ap-
Given a ranked list of photos retrieved from Flickr1 , participating    proach, denoted in the following HC-RF, attempts to replace user
systems are expected to refine the results by providing up to 50        input with machine generated ground truth. It exploits the concept
images that are in the same time relevant and provide a diversified     of pseudo-relevance feedback. The concept is based on the assump-
summary of the query. The process is based on the social metadata       tion that top k ranked documents are relevant and the feedback is
associated with the images and/or on the visual characteristics. A      learned as in traditional RF under this assumption [5]. A general
complete overview of the task is presented in [1].                      diagram of the approach is depicted in Figure 1.
   Despite the current advances of machine intelligence techniques         This year we decided not to use the pre-processing step, i.e., the
used in the area of information retrieval and multimedia, in search     use of filters to remove un-relevant images. The motivation relies
for achieving high performance and adapting to user needs, more         on the specificity of the dataset proposed for this year [1], in par-
and more research is turning now towards the concept of “human in       ticular the use of multi-topic queries in the development and eval-
the loop” [2]. The idea is to bring the human expertise in the pro-     uation sets. An image containing people or depicting a location or
cessing chain, thus combining the accuracy of human judgements          a place which is geographically far away from the query, can be
with the computational power of machines.                               considered relevant as long as it is a common photo representation
   In this work we use an adapted version of the work in [4] that       of the query topics (all at once). Also, we noticed in an exten-
exploits the concept of pseudo-relevance feedback (RF). RF tech-        sive study [6] that the blur filter does not improve significantly the
niques attempt to introduce the user in the loop by harvesting feed-    overall performance, thus we decided to removed it to reduce com-
back about the relevance of the search results. This information        plexity. The algorithm is as follows.
is used as ground truth for re-computing a better representation of        First, we employ a pseudo-relevance feedback scheme based on
the data needed. Relevance feedback proved efficient in improving       the selection of the images assessed in an automated manner. We
the precision of the results [7], but its potential was not fully ex-   consider that most of the first returned results are relevant (i.e., pos-
ploited to diversification. The main contribution of our approach       itive examples). For instance, on devset [1], in average, 35 out of
is in proposing a pseudo-relevance feedback technique which sub-        50 returned images are relevant which supports our assumption. In
1
                                                                        contrast, the very last of the results are more likely non-relevant and
    http://flickr.com/.                                                 considered accordingly (i.e., negative examples). The positive and
                                                                        negative examples are fed to an Hierarchical Clustering2 scheme
                                                                        2
Copyright is held by the author/owner(s).                                 http://www.mathworks.com/help/stats/
MediaEval 2016 Workshop, Oct. 20-21, 2016, Hilversum, Netherlands.      hierarchical-clustering.html
Table 1: Best pseudo-relevance feedback results for each modality          Table 2: Results for the official runs on testset (best results are
or combination of modalities on devset (best results are depicted in       depicted in bold).
bold).

                                                                                 metric/run     Run1      Run2      Run3      Run4      Run5
               HC-RF HC-RF HC-RF HC-RF HC-RF                  Flickr             P @20         0.5437    0.5758    0.5219    0.5484     0.5539
  metric/run
               visual text vis-text cred. All                 init. res.
                                                                                 CR@20         0.3455    0.3881    0.3868    0.4374     0.3732
  P @20        0.6514   0.7479   0.7479    0.645    0.7479    0.6979             F 1@20        0.4037     0.447    0.4208    0.4638     0.4223
  CR@20        0.446    0.4572   0.4572    0.4274   0.4572    0.3717
  F 1@20       0.5202   0.5468   0.5468    0.5014   0.5468    0.4674

                                                                           Flickr initial retrieval results. From the modality point of view, text
                                                                           descriptors (TF, DF, TFIDF), separately and combined with other
which yields a dendrogram of classes. For a certain cutting point          descriptors (all visual-all textual, all visual-all textual-all credibil-
(i.e., number of classes), a class is declared non-relevant if con-        ity), lead to the highest results (F 1@20=0.5468), followed closely
tains only negative examples or the number of negative examples            by visual (CNN-ad) descriptors (F 1@20=0.5202) and then by all
is higher than the positive ones. The final step is the actual diversi-    credibility information (F 1@20=0.5014).
fication scheme, which is a round robin approach. We select from
each of the relevant classes one image which has the highest rank          3.2     Official results on testset
according to the initial ranking of the system. Then, we remove               Following the previous experiments, the final runs were deter-
the selected images from the clusters and proceed by selecting the         mined for the best modality/parameter combinations obtained on
remaining ones in the same manner. The process is repeated until           devset (see Table 1). We submitted five runs, computed as fol-
a maximum number of images is reached. The resulting images                lowing: Run1 - automated using visual information only: HC-RF
represent the output of the proposed system.                               visual CNN-ad; Run2 - automated using text information only: HC-
                                                                           RF all text; Run3 - automated using visual-text information: HC-
                                                                           RF all visual-all text; Run4 - everything allowed: HC-RF all cred.;
3.    EXPERIMENTAL RESULTS                                                 and Run5 - everything allowed: HC-RF all visual-all text-all cred.
   This section presents the experimental results achieved on devset       Results are presented in Table 2.
which consists of 70 multi-topic queries and 20,757 images and                What is interesting to observe is the fact that the highest pre-
testset, respectively, which consists in 64 multi-topic queries and        cision is achieved using textual information, (Run2 - P @20 =
19,017 images. We optimized the parameters of the proposed ap-             0.5758), whereas maximum diversification is achieved using cred-
proaches on devset to obtain best precision and diversity. Ground          ibility information (Run4 - CR@20 = 0.4374), with more than
truth was also provided with the data for this set for preliminary         2% over other types of descriptors, although this type of informa-
validation of the approaches. The final benchmarking is conducted          tion had the lowest performance on devset in terms of diversifica-
however on testset.                                                        tion. Another interesting observation is that relevance was also pre-
   In our approaches, images are represented with the content de-          served in this case, compared to other modalities, which leads to the
scriptors that were provided with the task data, i.e., visual (e.g.,       conclusion that credibility information was useful in the context of
convolutional neural network based descriptors), text (e.g., term          overall diversification. Credibility information gives an automatic
frequency - inverse document frequency representations of meta-            estimation of the quality of tag-image content relationships, telling
data) and user annotation credibility (e.g., face proportions, upload      which users are most likely to share relevant images in Flickr. Best
frequency) information. Detailed information about provided con-           diversification is achieved, CR@20 = 0.4374, due to the high
tent descriptors is available in [1]. Performance is assessed with         probability that different relevant images belong to different users
Precision at X images (P@X), Cluster Recall at X (CR@X) and                with a good credibility score. In terms of F 1 metric score, the use
F1-measure at X (F1@X).                                                    of credibility information, Run4 - F 1@20 = 0.4638, allows for
                                                                           better performance over all text descriptors by almost 2% and by
3.1     Results on devset                                                  6% over visual descriptor (CNN-ad).
   Several tests were performed with different descriptor combina-
tions and various cutoff points. Descriptors are combined with an
early fusion approach. We varied the number of initial images con-         4.     CONCLUSIONS
sidered as positive examples, from 100 to 280 with a step of 20 im-           We approached the image search result diversification issue from
ages, the number of last images considered as negative examples,           the perspective of relevance feedback techniques, when user feed-
from 0 to 20 with a step of 10, and the inconsistency coefficient          back is substituted with an automatic pseudo-relevance feedback
threshold for which HC divides the data into well-separated clus-          approach. Results show that in general, the automatic techniques
ters, from 0.5 to 1.3 with a step of 0.2. We select the combinations       improve the precision and diversification, which proves the real
yielding the highest F 1@20, which is the official metric.                 potential of relevance feedback to the diversification. Future de-
   While experimenting, we observed that, by increasing the num-           velopments will mainly address different efficient exploitations of
ber of analyzed images, precision tends to slightly decrease as the        re-ranking approaches, e.g., relevance-score estimation techniques,
probability of obtaining un-relevant images increases; in the same         to improve the relevance and consequently the overall diversifi-
time, diversity increases as having more images is more likely to          cation. Another perspective is to also exploit the advantages of
get more diverse representations. For brevity reasons, in the follow-      deep neural networks and use them in the context of automatic
ing we focus on presenting only the results at a cutoff of 20 images       relevance-feedback-based diversification scenarios, by classifying
which is the official cutoff point. These results are presented in Ta-     the selected positive and negative examples using unsupervised deep-
ble 1. To serve as baseline for the evaluation, we present also the        learning-based classifiers.
5.   REFERENCES
[1] B. Ionescu, A.L. Gînscă, M. Zaharieva, B. Boteanu, M.
    Lupu, H. Müller, “Retrieving Diverse Social Images at
    MediaEval 2016: Challenge, Dataset and Evaluation”,
    MediaEval 2016 Workshop, October 20-21, Hilversum,
    Netherlands, 2016.
[2] B. Emond, “Multimedia and Human-in-the-loop: Interaction
    as Content Enrichment”, ACM Int. Workshop on
    Human-Centered Multimedia, pp. 77–84, 2007.
[3] B. Boteanu, I. Mironică, B. Ionescu, “A Relevance Feedback
    Perspective to Image Search Result Diversification”, IEEE
    ICCP, September 4-6, Cluj-Napoca, Romania, 2014.
[4] B. Boteanu, I. Mironică, B. Ionescu, “LAPI @ 2016
    Retrieving Diverse Social Images Task: A Pseudo-Relevance
    Feedback Diversification Perspective”, MediaEval 2015
    Workshop, September 15-16, Wurzen, Germany, 2016.
[5] B. Boteanu, I. Mironică, B. Ionescu, “Hierarchical
    Clustering Pseudo-Relevance Feedback for Social Image
    Search Result Diversification”, IEEE CBMI, June 10-12,
    Prague, Czech Republic, 2015.
[6] B. Boteanu, I. Mironică, B. Ionescu, “Pseudo-Relevance
    Feedback Diversication of Social Image Retrieval Results”,
    Multimedia Tools and Applications, pp. 1–28, Springer 2016.
[7] J. Li, N.M. Allinson, “Relevance Feedback in Content-Based
    Image Retrieval: A Survey”, Handbook on Neural
    Information Processing, 49, pp. 433–469, Springer 2013.