REGIM@ 2016 Retrieving Diverse Social Images Task
                   Ghada Feki, Rim Fakhfakh, Noura Bouhlel, Anis Ben Ammar and Chokri Ben Amar
                        REsearch Groups on Intelligent Machines (REGIM), University of Sfax,
                               National Engineering School of Sfax (ENIS), Sfax, Tunisia
               {ghada.feki, rim.fakhfakh, noura.bouhlel.tn, anis.benammar.tn, chokri.benamar}@ieee.org


ABSTRACT                                                                  description which is used for this run is CNN generic. It is a
In this paper, we describe our participation in the MediaEval             descriptor based on the reference convolutional (CNN) neural
2016 Retrieving Diverse Social Images Task. The proposed                  network model which is learned with the 1,000 ImageNet1
approach refers to the Hierarchical, EM, Make Density-based               classes. The visual process contains four steps which are as
clustering and the hypergraph-based learning to exploit visual,           follows. First, we estimate k number of clusters for each query by
textual and user credibility-based descriptions in order to               running EM clustering on the CNN generic description. Second,
generate diversified results. We achieved promising results that          we carry out the make density-based clustering (wraps k-means
our best run reached a F1@20 of 0.4105.                                   algorithm) with a new empirical value of k'=k+n. Third, we
                                                                          extract the description of the cluster centers. Forth, based on the
                                                                          cosine similarity, we sort images by altering between the centers
1. INTRODUCTION                                                           and choosing the closest one to selected center.
      Diversity is currently a hot research topic in the image
retrieval context. The Retrieving Diverse Social Images task [1]          3.2 Run 2
deals with the problem of removing the redundant images from                   The second run consists in the Hierarchical Clustering using
the top ranked images. Existing works always built clustering             only the textual information. Indeed, we have generated a
schemes to deal with this problem. The commonly used                      hierarchical clustering over the corresponding image dataset
clustering techniques are hierarchical [2][3][4][5], spectral [6]         based on textual descriptions [11][12]. The proposed hierarchical
and k-Medoids [7]. The general idea consists in returning images          clustering contains four levels which are as follows:
from different clusters in order to maximize the diversity rate                     − First level: Description
among results. We propose a method considering both relevance                       − Second level: Tags
and diversity. Indeed, based on the clustering techniques and the                   − Third level: Title
hypergraph-based learning, we exploit visual, textual and user                      − Fourth level: Location
credibility-based descriptions in order to generate diversified
results.                                                                  3.3 Run 3
                                                                               The third run consists in the combination of the EM and
2. PREVIOUS WORK                                                          Make Density-based Clustering using the visual information (run
      In 2012, our group within REGIM research laboratory                 1) and the Hierarchical Clustering using the textual information
participated in the Personal Photo Retrieval task [8] which also          (run 2) [13]. The final scores are generated as the mean average
aims to diversify the results of the image retrieval systems by           of the clustering-based visual scores and the textual scores [14].
removing the redundant images from the top ranked images. The
general idea of our proposed approach was based on browsing               3.4 Run 4
graphs which describe the similarity between images. Indeed, we                The fourth run consists of the combination between the
generate an inter-images semantic similarity graph and an inter-          aforementioned textual approach and a hypergraph-based visual
images visual similarity graph. The retrieval process for each            approach. The visual description which is used for this run is
query takes into account not only the relevance-based ranking but         CNN adapted. It is a descriptor based on the reference
also the diversity-based ranking which refers to the inter-images         convolutional (CNN) neural network model which is learned
graphs to generate diversity scores. The approach was evaluated           with 1,000 tourist points of interest classes whose images were
in more detail in [9] [10].                                               automatically collected from the Web. The hypergraph-based
                                                                          visual approach consists in the following steps. First, we use a
3. RUN DESCRIPTION                                                        hypergraph to model higher-order relationships between images.
Among the participation in the Retrieving Diverse Social Images           In such representation, the set of vertices denote the images for
task, we focused on using the clustering techniques and the               ranking. Each image is taken as a "centroid" vertex [15] and
hypergraph-based learning since pairwise simple graphs used in            forms a hyperedge with its k-nearest neighbors. The Euclidean
our previous work scarcely represent relationships among                  distance is used as a similarity function. Each hyperedege is
images. Five runs were submitted as follows:                              weighted with a positive scalar denoting its importance in the
                                                                          hypergraph. Second, given the constructed hypergraph and an
3.1 Run 1                                                                 image query, we perform a hypergraph-based diverse ranking
    The first run consists in the EM and Make Density-based               [16] algorithm with absorbing nodes to rank all remaining
Clustering [21] using only the visual information. The visual             vertices in the hypergraph with respect to the query. The

Copyright is held by the author/owner(s).                                 1
MediaEval 2016 Workshop, October            20-21,   2016,   Hilversum,       image-net.org
Netherlands.
absorbing nodes [17] are used to avoid the redundant vertices                                 0,6
from hitting higher ranking scores. The final scores are generated
                                                                                             0,55
as the mean average of the hypergraph-based visual scores and
the textual scores.                                                                           0,5


                                                                           Precision
3.5 Run 5                                                                                    0,45
                                                                                                                                          Ru n 4
      In addition to the basic treatment provided by the run 3, the                           0,4
                                                                                                                                          Ru n 5
fifth run contains a refinement process which is based on the user
                                                                                             0,35
credibility scores. Indeed, each image has a credibility-based
score which consists in the mean average of the descriptors                                   0,3
Visualscore, inverse of Faceproportion, tagSpecificity and                                              5   10   20        30   40   50
meanImageTagClarity. The final ranking combines the user-                                                             @N
based image ranking [18][19][20] with the ranking which is
provided by the fusion of the EM and Make Density-based                 Figure 1. Precision comparison between Run 4 and Run 5
Clustering using the visual information and the Hierarchical
Clustering using the textual information.                                                    0 ,7

                                                                                             0 ,6
4. RESULTS AND DISCUSSION
                                                                                             0 ,5
     This section presents the experimental results achieved on


                                                                           Clust er Recall
test set which contains 65 queries and about 19500 images.                                   0 ,4

Table 1 shows the performance of the submitted runs according                                0 ,3                                         Run 4
to both diversity and relevance. The used evaluation metrics are                             0 ,2                                         Run 5
the Cluster Recall at X (CR@X) which is a measure that
                                                                                             0 ,1
assesses how many different clusters from the ground truth are
represented among the top X results, the Precision at X (P@X)                                  0

which measures the number of relevant photos among the top X                                        5       10   20        30   40   50

results and F1-measure at X (F1@X) which is the harmonic                                                              @N
mean of the previous two.
                                                                        Figure 2. Cluster Recall comparison between Run4 and Run5
Table 1. Run performance on Test Set
                                                                                             0,6
            Run1       Run2       Run3        Run4        Run5
                                                                                             0,5
P@20        0.5086     0.4797     0.5039      0.4852      0.5266
CR@20       0.36       0.3542     0.3501      0.3738      0.3702                             0,4
                                                                           F1-m easure


F1@20       0.4024     0.3862     0.3964      0.4013      0.4105
                                                                                             0,3
                                                                                                                                          Run 4
      Our clustering-based visual run (run 1) outperformed our                               0,2
                                                                                                                                          Run 5
textual run (run 2) in terms of all the metrics. The run which
                                                                                             0,1
combined these two approaches performed better than the textual
run in terms of P@20 but not in terms of CR@20. Indeed,                                        0

comparing to the CR@20 achieved separately by the visual and                                        5       10   20        30   40   50

the textual approaches, the combination decreases the diversity                                                       @N
rate. However, with the refinement process which is based on the
user credibility (run 5), all of the metrics had increased values. In   Figure 3. F1-measure comparison between Run 4 and Run 5
fact, run 5 (clustering-based visual + textual + user credibility)
outperformed run 1 (clustering-based visual), run 2 (textual), and      5. CONCLUSION AND FUTURE WORKS
run 3 (clustering-based visual + textual).                                   Our participation in the MediaEval 2016 Retrieving Diverse
      Similarly to the run 3, the run 4 is also a combined visual-
                                                                        Social Images Task achieved promising results. The proposed
textual run. Nevertheless, among the run 4, we have completely
                                                                        approach which refers to the clustering techniques and the
changed the visual approach. We notice that it outperformed the
                                                                        hypergraph-based learning to exploit visual, textual and user
run 3 in terms of CR@20 but not in terms of P@20.                       credibility-based descriptions takes into account both relevance
      Thus, the best runs are the run 4 in term of CR@20 and the        and diversity. Visual runs outperformed the textual one therefore
run 5 in terms of P@20 and F1@20. Consequently, we will
                                                                        further research will be mainly on the enhancement of the textual
detail more their results. As shown in figure 1, we notice that run
                                                                        approach. Finally, we will also focus on the exploit of the user-
5 outperforms significantly the run 4 in term of Precision
                                                                        credibility descriptors which seem to be very useful in the social
especially for the top ranked images. Concerning the Cluster            media based retrieval.
Recall (Figure 2), we note that the two runs have close values.
Until the top 10 ranked images run 5 outperforms slightly the run       ACKNOWLEDGEMENT
4 and similarly by considering the top 50 ranked images run 4
outperforms slightly the run 5. Finally, with the F1-measure, we        The authors would like to acknowledge the financial support of
conclude that run 5 is almost the best run.                             this work by grants from General Direction of Scientific
                                                                        Research (DGRST), Tunisia, under the ARUB program.
                                                                       Proceedings of the International Conference on Knowledge
REFERENCES                                                             Discovery and Information Retrieval 2014,pp 444-449.
[1] B. Ionescu, A. L. Gînsca, M. Zaharieva, B. Boteanu, M.
                                                                   [12] A. Ksibi, G. Feki, , A. Ben Ammar, C. Ben Amar "Effective
    Lupu, H. Müller. Retrieving Diverse Social Images at
                                                                        Diversification for Ambiguous Queries in Social Image
    MediaEval 2016: Challenge, Dataset and Evaluation.
                                                                        Retrieval, 15th International Conference on Computer
    MediaEval 2016 Workshop, October 20-21, Hilversum,
                                                                        Analysis of Images and Patterns, CAIP 2013, pp571-578.
    Netherlands, 2016.
                                                                   [13] R. Fakhfakh, G. Feki, A Ksibi, A. Ben Ammar, C. Ben
[2] M. Zaharieva1, L. Diem, MIS @ Retrieving Diverse Social
                                                                        Amar, “REGIMvid at ImageCLEF2012: Concept-based
    Images Task 2015. MediaEval 2015 Workshop, September
                                                                        Query    Refinementand      Relevance-based Ranking
    14-15, Wurzen, Germany, 2015.
                                                                        Enhancement for Image Retrieval”, CLEF 2012
[3] B. Boteanu, I. Mironica, B. Ionescu, LAPI @ 2015
                                                                   [14] G. Feki, R. Fakhfakh, A. Ben Ammar, C. Ben Amar,
    Retrieving Diverse Social Images Task: A Pseudo-
                                                                        “Knowledge Structures: Which one to use for the query
    Relevance Feedback Diversification Perspective, MediaEval
                                                                        disambiguation?”, In 15th International Conference on
    2015 Workshop, September 14-15, Wurzen, Germany,
                                                                        Intelligent Systems Design and Applications 2015, pp 499-
    2015.
                                                                        504.
[4] X. Chen, H. Liu, Z. Deng, Y. Yang, Retrieving Relevant
                                                                   [15] N. Bouhlel, A. Ksibi, A. Ben Ammar, and C. Ben Amar,
    and Diverse Image from Social Media Images, MediaEval
                                                                        “Semantic-aware framework for Mobile Image Search,” in
    2015 Workshop, September 14-15, Wurzen, Germany,
                                                                        15th International Conference on Intelligent Systems Design
    2015.
                                                                        and Applications (ISDA), 2015, pp. 479–484.
[5] A. Castellanos, X. Benavent, A. García-Serrano, E. de Ves,
                                                                   [16] Y. Huang, Q. Liu, S. Zhang, and D. Metaxas, “Image
    J. Cigarrán, UNED-UV @ Retrieving Diverse Social Images
                                                                        retrieval via probabilistic hypergraph ranking,” in Proc.
    Task, MediaEval 2015 Workshop, September 14-15,
                                                                        IEEE Int. Conf. Comput.Vis. Pattern Recognit., Jun. 2010,
    Wurzen, Germany, 2015.
                                                                        pp. 3376–3383
[6] Z. Paróczi, M. Kis-Király, B. Fodor, DCLab at MediaEval
                                                                   [17] S. Sunderrajan and B. Manjunath, "Context-Aware
    2015 Retrieving Diverse Social Images Task, MediaEval
                                                                        Hypergraph     Modeling   for    Re-identification   and
    2015 Workshop, September 14-15, Wurzen, Germany,
                                                                        Summarization," IEEE Transactions on Multimedia, vol. 18,
    2015.
                                                                        pp. 51-63, 2016
[7] R. Calumby, I. Araujo, V. Santana, J. Munoz, O. Penatti, L.
                                                                   [18] R. Fakhfakh, A. Ksibi, A. Ben Ammar, C. Ben Amar, Fuzzy
    Li, J. Almeida, G. Chiachia, M. Gonçalves, R. Torres.
                                                                        User Profile Modeling for Information Retrieval, KDIR
    Recod @ MediaEval 2015: Diverse Social Images Retrieval,
                                                                        2014 (Knowledge Discovery and Information Retrieval),
    MediaEval 2015 Workshop, September 14-15, Wurzen,
                                                                        Rome, Italy, October 21-24, 2014: 431-436
    Germany, 2015.
                                                                   [19] R. Fakhfakh, A. Ksibi, A. Ben Ammar, C. Ben Amar,
[8] D. Zellhofer, Overview of the Personal Photo Retrieval Pilot
                                                                        Enhancing query interpretation by combining textual
    Task at ImageCLEF 2012, ImageCLEF 2012 Workshop.
                                                                        and visual analyses, International Conference on Advanced
[9] G. Feki, A. Ksibi, A. Ben Ammar, C. Ben                             Logistics and Transport (ICALT), 2013, pp. 170-175
    Amar,"REGIMvid at ImageCLEF2012: Improving Diversity
                                                                   [20] G. Feki, R. Fakhfakh, A. Ben Ammar, C. Ben Amar. “Query
    in Personal Photo Ranking Using Fuzzy Logic", In CLEF
                                                                        Disambiguation: User-centric Approach”, In Journal of
    2012.
                                                                        Information Assurance and Security, 2016 ,vol 11, pp. 144-
[10] G. Feki, A. Ksibi, A. Ben Ammar, C. Ben Amar                       156.
     ,"Improving image search effectiveness by integrating
                                                                   [21] T. S. Madhulatha, "An overview on clustering methods",
     contextual information, In 11th International Workshop on
                                                                        IOSR Journal of Engineering, Apr. 2012, Vol. 2(4) pp:719-
     Content-Based Multimedia Indexing, CBMI 2013, pp149-
                                                                        725.
     154.
[11] G. Feki, A. Ben Ammar, C. Ben Amar "Adaptive Semantic
     Construction For Diversity-based Image Retrieval,