REGIM@ 2016 Retrieving Diverse Social Images Task Ghada Feki, Rim Fakhfakh, Noura Bouhlel, Anis Ben Ammar and Chokri Ben Amar REsearch Groups on Intelligent Machines (REGIM), University of Sfax, National Engineering School of Sfax (ENIS), Sfax, Tunisia {ghada.feki, rim.fakhfakh, noura.bouhlel.tn, anis.benammar.tn, chokri.benamar}@ieee.org ABSTRACT description which is used for this run is CNN generic. It is a In this paper, we describe our participation in the MediaEval descriptor based on the reference convolutional (CNN) neural 2016 Retrieving Diverse Social Images Task. The proposed network model which is learned with the 1,000 ImageNet1 approach refers to the Hierarchical, EM, Make Density-based classes. The visual process contains four steps which are as clustering and the hypergraph-based learning to exploit visual, follows. First, we estimate k number of clusters for each query by textual and user credibility-based descriptions in order to running EM clustering on the CNN generic description. Second, generate diversified results. We achieved promising results that we carry out the make density-based clustering (wraps k-means our best run reached a F1@20 of 0.4105. algorithm) with a new empirical value of k'=k+n. Third, we extract the description of the cluster centers. Forth, based on the cosine similarity, we sort images by altering between the centers 1. INTRODUCTION and choosing the closest one to selected center. Diversity is currently a hot research topic in the image retrieval context. The Retrieving Diverse Social Images task [1] 3.2 Run 2 deals with the problem of removing the redundant images from The second run consists in the Hierarchical Clustering using the top ranked images. Existing works always built clustering only the textual information. Indeed, we have generated a schemes to deal with this problem. The commonly used hierarchical clustering over the corresponding image dataset clustering techniques are hierarchical [2][3][4][5], spectral [6] based on textual descriptions [11][12]. The proposed hierarchical and k-Medoids [7]. The general idea consists in returning images clustering contains four levels which are as follows: from different clusters in order to maximize the diversity rate − First level: Description among results. We propose a method considering both relevance − Second level: Tags and diversity. Indeed, based on the clustering techniques and the − Third level: Title hypergraph-based learning, we exploit visual, textual and user − Fourth level: Location credibility-based descriptions in order to generate diversified results. 3.3 Run 3 The third run consists in the combination of the EM and 2. PREVIOUS WORK Make Density-based Clustering using the visual information (run In 2012, our group within REGIM research laboratory 1) and the Hierarchical Clustering using the textual information participated in the Personal Photo Retrieval task [8] which also (run 2) [13]. The final scores are generated as the mean average aims to diversify the results of the image retrieval systems by of the clustering-based visual scores and the textual scores [14]. removing the redundant images from the top ranked images. The general idea of our proposed approach was based on browsing 3.4 Run 4 graphs which describe the similarity between images. Indeed, we The fourth run consists of the combination between the generate an inter-images semantic similarity graph and an inter- aforementioned textual approach and a hypergraph-based visual images visual similarity graph. The retrieval process for each approach. The visual description which is used for this run is query takes into account not only the relevance-based ranking but CNN adapted. It is a descriptor based on the reference also the diversity-based ranking which refers to the inter-images convolutional (CNN) neural network model which is learned graphs to generate diversity scores. The approach was evaluated with 1,000 tourist points of interest classes whose images were in more detail in [9] [10]. automatically collected from the Web. The hypergraph-based visual approach consists in the following steps. First, we use a 3. RUN DESCRIPTION hypergraph to model higher-order relationships between images. Among the participation in the Retrieving Diverse Social Images In such representation, the set of vertices denote the images for task, we focused on using the clustering techniques and the ranking. Each image is taken as a "centroid" vertex [15] and hypergraph-based learning since pairwise simple graphs used in forms a hyperedge with its k-nearest neighbors. The Euclidean our previous work scarcely represent relationships among distance is used as a similarity function. Each hyperedege is images. Five runs were submitted as follows: weighted with a positive scalar denoting its importance in the hypergraph. Second, given the constructed hypergraph and an 3.1 Run 1 image query, we perform a hypergraph-based diverse ranking The first run consists in the EM and Make Density-based [16] algorithm with absorbing nodes to rank all remaining Clustering [21] using only the visual information. The visual vertices in the hypergraph with respect to the query. The Copyright is held by the author/owner(s). 1 MediaEval 2016 Workshop, October 20-21, 2016, Hilversum, image-net.org Netherlands. absorbing nodes [17] are used to avoid the redundant vertices 0,6 from hitting higher ranking scores. The final scores are generated 0,55 as the mean average of the hypergraph-based visual scores and the textual scores. 0,5 Precision 3.5 Run 5 0,45 Ru n 4 In addition to the basic treatment provided by the run 3, the 0,4 Ru n 5 fifth run contains a refinement process which is based on the user 0,35 credibility scores. Indeed, each image has a credibility-based score which consists in the mean average of the descriptors 0,3 Visualscore, inverse of Faceproportion, tagSpecificity and 5 10 20 30 40 50 meanImageTagClarity. The final ranking combines the user- @N based image ranking [18][19][20] with the ranking which is provided by the fusion of the EM and Make Density-based Figure 1. Precision comparison between Run 4 and Run 5 Clustering using the visual information and the Hierarchical Clustering using the textual information. 0 ,7 0 ,6 4. RESULTS AND DISCUSSION 0 ,5 This section presents the experimental results achieved on Clust er Recall test set which contains 65 queries and about 19500 images. 0 ,4 Table 1 shows the performance of the submitted runs according 0 ,3 Run 4 to both diversity and relevance. The used evaluation metrics are 0 ,2 Run 5 the Cluster Recall at X (CR@X) which is a measure that 0 ,1 assesses how many different clusters from the ground truth are represented among the top X results, the Precision at X (P@X) 0 which measures the number of relevant photos among the top X 5 10 20 30 40 50 results and F1-measure at X (F1@X) which is the harmonic @N mean of the previous two. Figure 2. Cluster Recall comparison between Run4 and Run5 Table 1. Run performance on Test Set 0,6 Run1 Run2 Run3 Run4 Run5 0,5 P@20 0.5086 0.4797 0.5039 0.4852 0.5266 CR@20 0.36 0.3542 0.3501 0.3738 0.3702 0,4 F1-m easure F1@20 0.4024 0.3862 0.3964 0.4013 0.4105 0,3 Run 4 Our clustering-based visual run (run 1) outperformed our 0,2 Run 5 textual run (run 2) in terms of all the metrics. The run which 0,1 combined these two approaches performed better than the textual run in terms of P@20 but not in terms of CR@20. Indeed, 0 comparing to the CR@20 achieved separately by the visual and 5 10 20 30 40 50 the textual approaches, the combination decreases the diversity @N rate. However, with the refinement process which is based on the user credibility (run 5), all of the metrics had increased values. In Figure 3. F1-measure comparison between Run 4 and Run 5 fact, run 5 (clustering-based visual + textual + user credibility) outperformed run 1 (clustering-based visual), run 2 (textual), and 5. CONCLUSION AND FUTURE WORKS run 3 (clustering-based visual + textual). Our participation in the MediaEval 2016 Retrieving Diverse Similarly to the run 3, the run 4 is also a combined visual- Social Images Task achieved promising results. The proposed textual run. Nevertheless, among the run 4, we have completely approach which refers to the clustering techniques and the changed the visual approach. We notice that it outperformed the hypergraph-based learning to exploit visual, textual and user run 3 in terms of CR@20 but not in terms of P@20. credibility-based descriptions takes into account both relevance Thus, the best runs are the run 4 in term of CR@20 and the and diversity. Visual runs outperformed the textual one therefore run 5 in terms of P@20 and F1@20. Consequently, we will further research will be mainly on the enhancement of the textual detail more their results. As shown in figure 1, we notice that run approach. Finally, we will also focus on the exploit of the user- 5 outperforms significantly the run 4 in term of Precision credibility descriptors which seem to be very useful in the social especially for the top ranked images. Concerning the Cluster media based retrieval. Recall (Figure 2), we note that the two runs have close values. Until the top 10 ranked images run 5 outperforms slightly the run ACKNOWLEDGEMENT 4 and similarly by considering the top 50 ranked images run 4 outperforms slightly the run 5. Finally, with the F1-measure, we The authors would like to acknowledge the financial support of conclude that run 5 is almost the best run. this work by grants from General Direction of Scientific Research (DGRST), Tunisia, under the ARUB program. Proceedings of the International Conference on Knowledge REFERENCES Discovery and Information Retrieval 2014,pp 444-449. [1] B. Ionescu, A. L. Gînsca, M. Zaharieva, B. Boteanu, M. [12] A. Ksibi, G. Feki, , A. Ben Ammar, C. Ben Amar "Effective Lupu, H. Müller. Retrieving Diverse Social Images at Diversification for Ambiguous Queries in Social Image MediaEval 2016: Challenge, Dataset and Evaluation. Retrieval, 15th International Conference on Computer MediaEval 2016 Workshop, October 20-21, Hilversum, Analysis of Images and Patterns, CAIP 2013, pp571-578. Netherlands, 2016. [13] R. Fakhfakh, G. Feki, A Ksibi, A. Ben Ammar, C. Ben [2] M. Zaharieva1, L. Diem, MIS @ Retrieving Diverse Social Amar, “REGIMvid at ImageCLEF2012: Concept-based Images Task 2015. MediaEval 2015 Workshop, September Query Refinementand Relevance-based Ranking 14-15, Wurzen, Germany, 2015. Enhancement for Image Retrieval”, CLEF 2012 [3] B. Boteanu, I. Mironica, B. Ionescu, LAPI @ 2015 [14] G. Feki, R. Fakhfakh, A. Ben Ammar, C. Ben Amar, Retrieving Diverse Social Images Task: A Pseudo- “Knowledge Structures: Which one to use for the query Relevance Feedback Diversification Perspective, MediaEval disambiguation?”, In 15th International Conference on 2015 Workshop, September 14-15, Wurzen, Germany, Intelligent Systems Design and Applications 2015, pp 499- 2015. 504. [4] X. Chen, H. Liu, Z. Deng, Y. Yang, Retrieving Relevant [15] N. Bouhlel, A. Ksibi, A. Ben Ammar, and C. Ben Amar, and Diverse Image from Social Media Images, MediaEval “Semantic-aware framework for Mobile Image Search,” in 2015 Workshop, September 14-15, Wurzen, Germany, 15th International Conference on Intelligent Systems Design 2015. and Applications (ISDA), 2015, pp. 479–484. [5] A. Castellanos, X. Benavent, A. García-Serrano, E. de Ves, [16] Y. Huang, Q. Liu, S. Zhang, and D. Metaxas, “Image J. Cigarrán, UNED-UV @ Retrieving Diverse Social Images retrieval via probabilistic hypergraph ranking,” in Proc. Task, MediaEval 2015 Workshop, September 14-15, IEEE Int. Conf. Comput.Vis. Pattern Recognit., Jun. 2010, Wurzen, Germany, 2015. pp. 3376–3383 [6] Z. Paróczi, M. Kis-Király, B. Fodor, DCLab at MediaEval [17] S. Sunderrajan and B. Manjunath, "Context-Aware 2015 Retrieving Diverse Social Images Task, MediaEval Hypergraph Modeling for Re-identification and 2015 Workshop, September 14-15, Wurzen, Germany, Summarization," IEEE Transactions on Multimedia, vol. 18, 2015. pp. 51-63, 2016 [7] R. Calumby, I. Araujo, V. Santana, J. Munoz, O. Penatti, L. [18] R. Fakhfakh, A. Ksibi, A. Ben Ammar, C. Ben Amar, Fuzzy Li, J. Almeida, G. Chiachia, M. Gonçalves, R. Torres. User Profile Modeling for Information Retrieval, KDIR Recod @ MediaEval 2015: Diverse Social Images Retrieval, 2014 (Knowledge Discovery and Information Retrieval), MediaEval 2015 Workshop, September 14-15, Wurzen, Rome, Italy, October 21-24, 2014: 431-436 Germany, 2015. [19] R. Fakhfakh, A. Ksibi, A. Ben Ammar, C. Ben Amar, [8] D. Zellhofer, Overview of the Personal Photo Retrieval Pilot Enhancing query interpretation by combining textual Task at ImageCLEF 2012, ImageCLEF 2012 Workshop. and visual analyses, International Conference on Advanced [9] G. Feki, A. Ksibi, A. Ben Ammar, C. Ben Logistics and Transport (ICALT), 2013, pp. 170-175 Amar,"REGIMvid at ImageCLEF2012: Improving Diversity [20] G. Feki, R. Fakhfakh, A. Ben Ammar, C. Ben Amar. “Query in Personal Photo Ranking Using Fuzzy Logic", In CLEF Disambiguation: User-centric Approach”, In Journal of 2012. Information Assurance and Security, 2016 ,vol 11, pp. 144- [10] G. Feki, A. Ksibi, A. Ben Ammar, C. Ben Amar 156. ,"Improving image search effectiveness by integrating [21] T. S. Madhulatha, "An overview on clustering methods", contextual information, In 11th International Workshop on IOSR Journal of Engineering, Apr. 2012, Vol. 2(4) pp:719- Content-Based Multimedia Indexing, CBMI 2013, pp149- 725. 154. [11] G. Feki, A. Ben Ammar, C. Ben Amar "Adaptive Semantic Construction For Diversity-based Image Retrieval,