LAPI @ 2017 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feedback Diversification Perspective Bogdan Boteanu, Mihai Gabriel Constantin, Bogdan Ionescu LAPI, University “Politehnica” of Bucharest, Romania {bboteanu,mgconstantin,bionescu}@alpha.imag.pub.ro ABSTRACT In this paper we present the results achieved during the 2017 Media- Eval Retrieving Diverse Social Images Task, using an approach based on pseudo-relevance feedback (RF), in which human feed- back is replaced by an automatic selection of images. The proposed approach is designed to have in priority the diversification of the results, in contrast to most of the existing techniques that address only the relevance. Diversification is achieved by exploiting a hi- erarchical clustering (HC) scheme followed by a diversification strategy. Methods are tested on the benchmarking data and results are analyzed. Insights for future work conclude the paper. Figure 1: General scheme of the proposed approach 1 INTRODUCTION needed in traditional RF and in proposing several diversity-adapted An efficient information retrieval system should be able to provide relevance feedback schemes. search results which are in the same time relevant for the query and cover different aspects of it, i.e., diverse. The 2017 Retrieving 2 APPROACH Diverse Social Images Task [7] addresses this issue in the context In traditional RF techniques, recording actual user feedback is ineffi- of a general ad-hoc image retrieval system, which provides the user cient in terms of time and human resources. The proposed approach, with diverse representations of the queries. The system should be denoted in the following HC-RF, attempts to replace user input with able to tackle complex and general-purpose multi-concept queries. machine generated ground truth. It exploits the concept of pseudo- Given a ranked list of photos retrieved from Flickr1 , participating relevance feedback. The concept is based on the assumption that systems are expected to refine the results by providing up to 50 top k ranked documents are relevant and the feedback is learned images that are in the same time relevant and provide a diversified as in traditional RF under this assumption [1]. A general diagram summary of the query. The process is based on the social metadata of the approach is depicted in Figure 1. associated with the images and/or on the visual characteristics. A Similarly to [3] we didn’t opt for the use of the pre-processing complete overview of the task is presented in [7]. step, i.e., the use of filters for the non-relevant images. The moti- Despite the current advances of machine intelligence techniques vation is based on the specificity of the dataset proposed for this used in the area of information retrieval and multimedia, in search year [7], i.e. the use of multi-topic queries in the development and for achieving high performance and adapting to user needs, more evaluation sets. An image containing people or depicting a loca- and more research is turning now towards the concept of “human tion or a place which is geographically far away from the query, in the loop” [5]. The idea is to bring the human expertise in the pro- can be considered relevant as long as it is a common photo rep- cessing chain, thus combining the accuracy of human judgements resentation of the query topics (all at once). Also, we noticed in with the computational power of machines. an extensive study [4] that the blur filter does not improve signifi- Due to good performance achieved in [3], this year we decided cantly the overall performance, thus we decided to removed it to to follow the same work, which is an adapted version of the work reduce complexity. The algorithm is as follows. in [2] that exploits the concept of RF. RF techniques attempt to First, we employ a pseudo-relevance feedback scheme based on introduce the user in the loop by harvesting feedback about the an automatic selection of the images. We consider that the first relevance of the search results. This information is used as ground returned results are relevant (i.e., positive examples). For instance, truth for re-computing a better representation of the data needed. on devset [7], in average, 26 out of 50 returned images are relevant Relevance feedback proved efficient in improving the precision of which supports our assumption. In contrast, the very last of the the results [6], but its potential was not fully exploited to diversi- results are more likely non-relevant and considered accordingly fication. The main contribution of our approach is in proposing a (i.e., negative examples). The positive and negative examples are pseudo-relevance feedback technique which substitutes the user fed to an HC2 scheme which yields a dendrogram of classes. For 1 http://flickr.com/. a certain cutting point (i.e., number of classes), a class is declared non-relevant if contains only negative examples or the number Copyright held by the owner/author(s). of negative examples is higher than the positive ones. The final MediaEval’17, 13-15 September 2017, Dublin, Ireland 2 http://www.mathworks.com/help/stats/hierarchical-clustering.html MediaEval’17, 13-15 September 2017, Dublin, Ireland B. Boteanu, M.G. Constantin, B. Ionescu Table 1: Best RF results for each modality or combination of Table 2: Results for the official runs on testset (best results modalities on devset (best results are depicted in bold). are depicted in bold). metric/ HC-RF HC-RF HC-RF HC-RF HC-RF Flickr metric/run Run1 Run2 Run3 Run4 Run5 run visual text vis-text CNN cred. init. res. P @20 0.6333 0.6214 0.6196 0.5845 0.6018 P @20 0.575 0.575 0.6136 0.575 0.575 0.5864 C R@20 0.5791 0.5794 0.5729 0.5216 0.6045 C R@20 0.3969 0.3969 0.4234 0.3969 0.3969 0.3646 F 1@20 0.5753 0.5733 0.5741 0.5253 0.5777 F 1@20 0.4473 0.4473 0.4773 0.4473 0.4473 0.4277 performance was followed by visual (visual-all without CNN), tex- step is the actual diversification scheme, which is a round robin tual (textual-all), CNN and credibility descriptors (F 1@20=0.4473) approach. We select from each of the relevant classes one image all with (180-20-1.1) parameter setup. which has the highest rank according to the initial ranking of the system. Then, we remove the selected images from the clusters and 3.2 Official results on testset proceed by selecting the remaining ones in the same manner. The Following the previous experiments, the final runs were determined process is repeated until a maximum number of images is reached. for the best modality/parameter combinations obtained on devset The resulting images represent the output of the proposed system. (see Table 1). We submitted five runs, computed as following: Run1 - automated using visual information only: HC-RF all visual; Run2 3 EXPERIMENTAL RESULTS - automated using text information only: HC-RF all text; Run3 - This section presents the experimental results achieved on devset automated using visual-text information: HC-RF all visual-all text; which consists of 110 multi-topic queries and 32,487 images and Run4 - everything allowed: HC-RF CNN.; and Run5 - everything testset, respectively, which consists of 84 multi-topic queries and allowed: HC-RF cred. Results are presented in Table 2. 24,986 images. We optimized the parameters of the proposed ap- What is interesting to observe is the fact that the highest preci- proach on devset to obtain best precision and diversity. The final sion is achieved using visual information, (Run1 - P@20 = 0.6333), benchmarking is conducted however on testset. whereas maximum diversification is achieved using credibility in- In our approaches, images are represented with the content formation (Run5 - CR@20 = 0.6045), with more than 2% over other descriptors that were provided with the task data, i.e., visual (e.g., types of descriptors. Relevance was also preserved, which leads to convolutional neural network based descriptors), text (e.g., term fre- the conclusion that credibility information was useful in the con- quency - inverse document frequency representations of metadata) text of overall diversification. Credibility information estimates the and user annotation credibility (e.g. upload frequency) informa- quality of tag-image content relationships, telling which users are tion. Detailed information about provided content descriptors is most likely to share relevant images in Flickr. Best diversification is available in [7]. Performance is assessed with Precision at X images achieved in this case due to a high probability that different and rel- (P@X), Cluster Recall at X (CR@X) and F1-measure at X (F1@X). evant images belong to different users with a good credibility score. In terms of F 1 metric score, the use of credibility information, Run5 3.1 Results on devset - F 1@20 = 0.5777, allows for best performance, followed closely Several tests were performed with different descriptor combinations by visual descriptors, Run1 - F 1@20 = 0.5753. Visual-textual infor- and various cutoff points. Descriptors are combined with an early mation achieved also good performance, Run3 - F 1@20 = 0.5741, fusion approach (normalization and concatenation). We varied the followed by textual information, Run2 - F 1@20 = 0.5733. The CNN number of initial images considered as positive examples (Np) from descriptors had the lowest performance, by more than 5% under 100 to 280 with a step of 20 images, the number of last images the credibility information, Run4 - F 1@20 = 0.5253. considered as negative examples (Nn) from 0 to 20 with a step of 10, and the inconsistency coefficient threshold for which HC divides 4 CONCLUSIONS the data into well-separated clusters (Nc) from 0.5 to 1.3 with a We approached the image search result diversification issue from step of 0.2. We select the combinations yielding the highest F 1@20, the perspective of relevance feedback techniques, when user feed- which is the official metric. back is substituted with an automatic pseudo-relevance feedback While experimenting, we observed that, by increasing the num- approach. Results show that in general, the automatic techniques ber of analyzed images, precision tends to decrease as the probabil- improve the precision and diversification, which proves the real ity of obtaining non-relevant images increases; in the same time, potential of relevance feedback to the diversification. Future de- diversity increases as having more images is more likely to get more velopments will mainly address different efficient exploitations of diverse representations. For brevity reasons, in the following we re-ranking approaches, e.g., relevance-score estimation techniques, focus on presenting only the results at a cutoff of 20 images which to improve the relevance and consequently the overall diversifi- is the official cutoff point. These results are presented in Table 1. We cation. Another perspective is to also exploit the advantages of present also the Flickr initial retrieval results to serve as baseline deep neural networks and use them in the context of automatic for the evaluation. From the modality point of view, visual-text relevance-feedback-based diversification scenarios, by classifying information (visual-all textual-all) with the parameter setup (Np- the selected positive and negative examples using unsupervised Nn-Nc)=(180-0-1.1) lead to the highest results (F 1@20=0.4773). This deep-learning-based classifiers. Retrieving Diverse Social Images Task MediaEval’17, 13-15 September 2017, Dublin, Ireland REFERENCES [1] Bogdan Boteanu, Ionuţ Mironică, and Bogdan Ionescu. 2015. Hierar- chical Clustering Pseudo-Relevance Feedback for Social Image Search Result Diversification. Content-Based Multimedia Indexing (CBMI), 2015 13th International Workshop on (September 2015), 1–6. [2] Bogdan Boteanu, Ionuţ Mironică, and Bogdan Ionescu. 2015. LAPI @ 2015 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feedback Diversification Perspective. MediaEval 2015 Workshop (Sep- tember 2015). [3] Bogdan Boteanu, Ionuţ Mironică, and Bogdan Ionescu. 2016. LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feed- back Diversification Perspective. MediaEval 2016 Workshop (October 2016). [4] Bogdan Boteanu, Ionuţ Mironică, and Bogdan Ionescu. 2016. Pseudo- Relevance Feedback Diversication of Social Image Retrieval Results. Multimedia Tools and Applications 76, 9 (2016), 11889–11916. [5] Bruno Emond. 2007. Multimedia and Human-in-the-loop: Interaction as Content Enrichment. ACM International Workshop on Human- Centered Multimedia (2007), 77–84. [6] Jing Li and Nigel M Allinson. 2013. Relevance Feedback in Content- Based Image Retrieval: A Survey. Handbook on Neural Information Processing 49 (2013), 433–469. [7] Maia Zaharieva, Bogdan Ionescu, Alexandru Lucian Gînscă, Rodrygo L.T. Santos, and Henning H. Müller. 2017. Retrieving Diverse Social Images at MediaEval 2017: Challenges, Dataset and Evaluation. MediaEval 2017 Workshop (September 2017).