=Paper=
{{Paper
|id=Vol-1739/MediaEval_2016_paper_20
|storemode=property
|title=LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feedback Diversification Perspective
|pdfUrl=https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_20.pdf
|volume=Vol-1739
|dblpUrl=https://dblp.org/rec/conf/mediaeval/BoteanuCI16
}}
==LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feedback Diversification Perspective==
LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feedback Diversification Perspective Bogdan Boteanu, Mihai Gabriel Constantin, Bogdan Ionescu LAPI, University “Politehnica” of Bucharest, Romania {bboteanu, mgconstantin, bionescu}@alpha.imag.pub.ro ABSTRACT In this paper we present the results achieved during the 2016 Media- Eval Retrieving Diverse Social Images Task, using an approach based on pseudo-relevance feedback, in which human feedback is replaced by an automatic selection of images. The proposed ap- proach is designed to have in priority the diversification of the re- sults, in contrast to most of the existing techniques that address only the relevance. Diversification is achieved by exploiting a hi- erarchical clustering scheme followed by a diversification strategy. Methods are tested on the benchmarking data and results are ana- lyzed. Insights for future work conclude the paper. Figure 1: General scheme of the proposed approach 1. INTRODUCTION An efficient information retrieval system should be able to pro- stitutes the user needed in traditional RF and in proposing several vide search results which are in the same time relevant for the query diversity-adapted relevance feedback schemes. and cover different aspects of it, i.e., diverse. The 2016 Retrieving Diverse Social Images Task [1] addresses this issue in the context of a general ad-hoc image retrieval system, which provides the user 2. PROPOSED APPROACH with diverse representations of the queries. The system should be In traditional RF techniques, recording actual user feedback is able to tackle complex and general-purpose multi-concept queries. inefficient in terms of time and human resources. The proposed ap- Given a ranked list of photos retrieved from Flickr1 , participating proach, denoted in the following HC-RF, attempts to replace user systems are expected to refine the results by providing up to 50 input with machine generated ground truth. It exploits the concept images that are in the same time relevant and provide a diversified of pseudo-relevance feedback. The concept is based on the assump- summary of the query. The process is based on the social metadata tion that top k ranked documents are relevant and the feedback is associated with the images and/or on the visual characteristics. A learned as in traditional RF under this assumption [5]. A general complete overview of the task is presented in [1]. diagram of the approach is depicted in Figure 1. Despite the current advances of machine intelligence techniques This year we decided not to use the pre-processing step, i.e., the used in the area of information retrieval and multimedia, in search use of filters to remove un-relevant images. The motivation relies for achieving high performance and adapting to user needs, more on the specificity of the dataset proposed for this year [1], in par- and more research is turning now towards the concept of “human in ticular the use of multi-topic queries in the development and eval- the loop” [2]. The idea is to bring the human expertise in the pro- uation sets. An image containing people or depicting a location or cessing chain, thus combining the accuracy of human judgements a place which is geographically far away from the query, can be with the computational power of machines. considered relevant as long as it is a common photo representation In this work we use an adapted version of the work in [4] that of the query topics (all at once). Also, we noticed in an exten- exploits the concept of pseudo-relevance feedback (RF). RF tech- sive study [6] that the blur filter does not improve significantly the niques attempt to introduce the user in the loop by harvesting feed- overall performance, thus we decided to removed it to reduce com- back about the relevance of the search results. This information plexity. The algorithm is as follows. is used as ground truth for re-computing a better representation of First, we employ a pseudo-relevance feedback scheme based on the data needed. Relevance feedback proved efficient in improving the selection of the images assessed in an automated manner. We the precision of the results [7], but its potential was not fully ex- consider that most of the first returned results are relevant (i.e., pos- ploited to diversification. The main contribution of our approach itive examples). For instance, on devset [1], in average, 35 out of is in proposing a pseudo-relevance feedback technique which sub- 50 returned images are relevant which supports our assumption. In 1 contrast, the very last of the results are more likely non-relevant and http://flickr.com/. considered accordingly (i.e., negative examples). The positive and negative examples are fed to an Hierarchical Clustering2 scheme 2 Copyright is held by the author/owner(s). http://www.mathworks.com/help/stats/ MediaEval 2016 Workshop, Oct. 20-21, 2016, Hilversum, Netherlands. hierarchical-clustering.html Table 1: Best pseudo-relevance feedback results for each modality Table 2: Results for the official runs on testset (best results are or combination of modalities on devset (best results are depicted in depicted in bold). bold). metric/run Run1 Run2 Run3 Run4 Run5 HC-RF HC-RF HC-RF HC-RF HC-RF Flickr P @20 0.5437 0.5758 0.5219 0.5484 0.5539 metric/run visual text vis-text cred. All init. res. CR@20 0.3455 0.3881 0.3868 0.4374 0.3732 P @20 0.6514 0.7479 0.7479 0.645 0.7479 0.6979 F 1@20 0.4037 0.447 0.4208 0.4638 0.4223 CR@20 0.446 0.4572 0.4572 0.4274 0.4572 0.3717 F 1@20 0.5202 0.5468 0.5468 0.5014 0.5468 0.4674 Flickr initial retrieval results. From the modality point of view, text descriptors (TF, DF, TFIDF), separately and combined with other which yields a dendrogram of classes. For a certain cutting point descriptors (all visual-all textual, all visual-all textual-all credibil- (i.e., number of classes), a class is declared non-relevant if con- ity), lead to the highest results (F 1@20=0.5468), followed closely tains only negative examples or the number of negative examples by visual (CNN-ad) descriptors (F 1@20=0.5202) and then by all is higher than the positive ones. The final step is the actual diversi- credibility information (F 1@20=0.5014). fication scheme, which is a round robin approach. We select from each of the relevant classes one image which has the highest rank 3.2 Official results on testset according to the initial ranking of the system. Then, we remove Following the previous experiments, the final runs were deter- the selected images from the clusters and proceed by selecting the mined for the best modality/parameter combinations obtained on remaining ones in the same manner. The process is repeated until devset (see Table 1). We submitted five runs, computed as fol- a maximum number of images is reached. The resulting images lowing: Run1 - automated using visual information only: HC-RF represent the output of the proposed system. visual CNN-ad; Run2 - automated using text information only: HC- RF all text; Run3 - automated using visual-text information: HC- RF all visual-all text; Run4 - everything allowed: HC-RF all cred.; 3. EXPERIMENTAL RESULTS and Run5 - everything allowed: HC-RF all visual-all text-all cred. This section presents the experimental results achieved on devset Results are presented in Table 2. which consists of 70 multi-topic queries and 20,757 images and What is interesting to observe is the fact that the highest pre- testset, respectively, which consists in 64 multi-topic queries and cision is achieved using textual information, (Run2 - P @20 = 19,017 images. We optimized the parameters of the proposed ap- 0.5758), whereas maximum diversification is achieved using cred- proaches on devset to obtain best precision and diversity. Ground ibility information (Run4 - CR@20 = 0.4374), with more than truth was also provided with the data for this set for preliminary 2% over other types of descriptors, although this type of informa- validation of the approaches. The final benchmarking is conducted tion had the lowest performance on devset in terms of diversifica- however on testset. tion. Another interesting observation is that relevance was also pre- In our approaches, images are represented with the content de- served in this case, compared to other modalities, which leads to the scriptors that were provided with the task data, i.e., visual (e.g., conclusion that credibility information was useful in the context of convolutional neural network based descriptors), text (e.g., term overall diversification. Credibility information gives an automatic frequency - inverse document frequency representations of meta- estimation of the quality of tag-image content relationships, telling data) and user annotation credibility (e.g., face proportions, upload which users are most likely to share relevant images in Flickr. Best frequency) information. Detailed information about provided con- diversification is achieved, CR@20 = 0.4374, due to the high tent descriptors is available in [1]. Performance is assessed with probability that different relevant images belong to different users Precision at X images (P@X), Cluster Recall at X (CR@X) and with a good credibility score. In terms of F 1 metric score, the use F1-measure at X (F1@X). of credibility information, Run4 - F 1@20 = 0.4638, allows for better performance over all text descriptors by almost 2% and by 3.1 Results on devset 6% over visual descriptor (CNN-ad). Several tests were performed with different descriptor combina- tions and various cutoff points. Descriptors are combined with an early fusion approach. We varied the number of initial images con- 4. CONCLUSIONS sidered as positive examples, from 100 to 280 with a step of 20 im- We approached the image search result diversification issue from ages, the number of last images considered as negative examples, the perspective of relevance feedback techniques, when user feed- from 0 to 20 with a step of 10, and the inconsistency coefficient back is substituted with an automatic pseudo-relevance feedback threshold for which HC divides the data into well-separated clus- approach. Results show that in general, the automatic techniques ters, from 0.5 to 1.3 with a step of 0.2. We select the combinations improve the precision and diversification, which proves the real yielding the highest F 1@20, which is the official metric. potential of relevance feedback to the diversification. Future de- While experimenting, we observed that, by increasing the num- velopments will mainly address different efficient exploitations of ber of analyzed images, precision tends to slightly decrease as the re-ranking approaches, e.g., relevance-score estimation techniques, probability of obtaining un-relevant images increases; in the same to improve the relevance and consequently the overall diversifi- time, diversity increases as having more images is more likely to cation. Another perspective is to also exploit the advantages of get more diverse representations. For brevity reasons, in the follow- deep neural networks and use them in the context of automatic ing we focus on presenting only the results at a cutoff of 20 images relevance-feedback-based diversification scenarios, by classifying which is the official cutoff point. These results are presented in Ta- the selected positive and negative examples using unsupervised deep- ble 1. To serve as baseline for the evaluation, we present also the learning-based classifiers. 5. REFERENCES [1] B. Ionescu, A.L. Gînscă, M. Zaharieva, B. Boteanu, M. Lupu, H. Müller, “Retrieving Diverse Social Images at MediaEval 2016: Challenge, Dataset and Evaluation”, MediaEval 2016 Workshop, October 20-21, Hilversum, Netherlands, 2016. [2] B. Emond, “Multimedia and Human-in-the-loop: Interaction as Content Enrichment”, ACM Int. Workshop on Human-Centered Multimedia, pp. 77–84, 2007. [3] B. Boteanu, I. Mironică, B. Ionescu, “A Relevance Feedback Perspective to Image Search Result Diversification”, IEEE ICCP, September 4-6, Cluj-Napoca, Romania, 2014. [4] B. Boteanu, I. Mironică, B. Ionescu, “LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feedback Diversification Perspective”, MediaEval 2015 Workshop, September 15-16, Wurzen, Germany, 2016. [5] B. Boteanu, I. Mironică, B. Ionescu, “Hierarchical Clustering Pseudo-Relevance Feedback for Social Image Search Result Diversification”, IEEE CBMI, June 10-12, Prague, Czech Republic, 2015. [6] B. Boteanu, I. Mironică, B. Ionescu, “Pseudo-Relevance Feedback Diversication of Social Image Retrieval Results”, Multimedia Tools and Applications, pp. 1–28, Springer 2016. [7] J. Li, N.M. Allinson, “Relevance Feedback in Content-Based Image Retrieval: A Survey”, Handbook on Neural Information Processing, 49, pp. 433–469, Springer 2013.