=Paper= {{Paper |id=Vol-1984/Mediaeval_2017_paper_23 |storemode=property |title=CFM@MediaEval 2017 Retrieving Diverse Social Images Task via Re-ranking and Hierarchical Clustering |pdfUrl=https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_23.pdf |volume=Vol-1984 |authors=Liang Peng,Yi Bin,Xiyao Fu,Jie Zhou,Yang Yang,Heng Tao Shen |dblpUrl=https://dblp.org/rec/conf/mediaeval/PengBFZYS17 }} ==CFM@MediaEval 2017 Retrieving Diverse Social Images Task via Re-ranking and Hierarchical Clustering== https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_23.pdf
    CFM@MediaEval 2017 Retrieving Diverse Social Images Task
         via Re-ranking and Hierarchical Clustering
                                                           Liang Peng, Yi Bin, Xiyao Fu
                                                      Jie Zhou, Yang Yang, Heng Tao Shen
                                   Center for Future Media and School of Computer Science and Engineering
                                           University of Electronic Science and Technology of China
                                   pliang951125@outlook.com,yi.bin@hotmail.com,fu.xiyao.gm@gmail.com
                                   jiezhou0714@gmail.com,dlyyang@gmail.com,shenhengtao@hotmail.com

ABSTRACT                                                                                                         Re-ranking
This paper presents an approach based on re-ranking and hierar-
                                                                                         Images                           Textual
chical clustering (HC) for MediaEval 2017 Retrieving Diverse Social
Images Task. The experimental results on the development and
test set demonstrate that the proposed approach can significantly
                                                                                       Clustering
improve relevance and visual diversity of the query results. Our
approach achieves a good tradeoff between relevance and diversity                         HC             Feature Fusion        Top k
and a result in F1@20 of 0.6533 for the employed test data.

                                                                                       Diversification
1    INTRODUCTION
Modern image retrieval systems should be able to give both rel-                          Representative Selection             50 images
evant and visually diverse results. In other words, the presented
images should not only be relevant to the query, but also cover
different visual facets. The MediaEval 2017 Retrieving Diverse Social                  Figure 1: Overview of our approach.
Images Task [8] fosters the development of such image retrieval
systems, which aim at the improvement of both the relevance and
                                                                         topic using the available textual metadata. Then we apply a hier-
the diversity of image search results. More specifically, given im-
                                                                         archical clustering algorithm to diversify the refined image rank
ages retrieved from Flickr using text queries, the goal of the task is
                                                                         list and balance relevance and diversity of the final results using a
to refine the results by ranking images according to their relevance
                                                                         diversification approach.
to the query and offering visually diversity. More details about the
task and data description can be found in [8].                           2.1    Re-ranking
   Previous participants, such as [3, 7] employed a re-ranking method
to improve the relevance and showed a promising performance.             The given dataset consists of images that are retrieved from Flickr
Clustering algorithms were widely used in this task to cluster sim-      using text-based, multi-topic queries. For each topic, images are
ilar images, such as k-Medoids [4], hierarchical clustering [1, 2],      arranged using the initial rank list by the Flickr retrieval system.
random forest [6], etc.                                                  The rank list, however, commonly mismatches the probability of
   In this work, we propose to re-rank images using the available        relevance to its query topic. To improve the relevance of the initial
social metadata in order to refine the original relevance rank list      rank list, we propose to re-rank the images of each topic by ex-
of images provided by Flickr. Then, a hierarchical clustering (HC)       ploiting their social metadata (e.g., titles, geo-tags, and usernames).
algorithm is employed to diversify the top k images. Unlike Tol-         Concretely, we compute the Cosine Similarity according to the TF-
lari [7], who merges different features based on similarities, we        IDF weights of the combined social metadata (i.e., concatenating
merge features using weighted distances. In order to balance rele-       corresponding feature vectors). Note that we do not utilize the de-
vance and diversity well, we apply a diversification strategy similar    scriptions in medata, because we find that there is much redundant
to [1].                                                                  or irrelevant information in the descriptions, which may decrease
                                                                         the relevance of the underlying images and the final performance
2    APPROACH                                                            notably. Since the queries are represented by text, our re-ranking
                                                                         could narrow the semantic gap between the queries and the rele-
As shown in Figure 1, our approach is composed of three compo-           vance of the corresponding image results.
nents: Re-ranking, Clustering and Diversification. Specifically, to
improve the semantic relevance between images and their corre-           2.2    Clustering
sponding queries, we first re-rank the images retrieved for each
                                                                         After re-ranking, we obtain a more relevant rank list of the initial
Copyright held by the owner/author(s).
                                                                         N images for each topic (N = 300 in the task). For the purpose of
MediaEval’17, 13-15 September 2017, Dublin, Ireland                      reducing irrelevant images, we only select the top k images from
                                                                         the refined rank list for the hierarchical clustering (See Section 4
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                                           L. Peng et al.

               Table 1: Run performance on Devset                                                 Table 2: Run performance on Testset

             run1        run2       run3        run4       run5        Flickr                          run1     run2      run3      run4      run5
    P@20 0.55            0.6473     0.6595      0.6509     0.6605 0.5864                    P@20       0.628    0.6833    0.6881    0.6863    0.6881
    CR@20 0.3959         0.453      0.4768      0.4838     0.4888 0.3646                    CR@20      0.6009   0.6274    0.6561    0.6636    0.6671
    F1@20 0.4409         0.5152     0.5323      0.534      0.5402 0.4277                    F1@20      0.5871   0.6334    0.6488    0.6505    0.6533


for more details). In other words, the images with low ranks, which                  with Precision at X images (P@X), Cluster Recall at X (CR@X), and
are assumed to be the most irrelevant to the topic, are discarded.                   F1-measure at X (F1@X). F1@20 is the official ranking metric. Best
The experimental results indicate that the employed performance                      results are depicted in bold. Flickr is the baseline result using the
scores and F1@20 benefit from this selective operation.                              top 50 images based on the initial Flickr ranking.
   At the clustering step, one or more visual features, textual fea-                    In run1, we do not use the top k strategy before clustering be-
tures and credibility descriptors are utilized. In order to take advan-              cause of no re-ranking. Our preliminary experiments show that
tage of several features, we merge them by summing the weighted                      acc can achieve better performance than other provided visual
distance:                                                                            features (e.g., CNN, sc) in run1. The performance of run1 in Ta-
                                                 n
                                                 Õ                                   ble 1 shows that clustering with visual feature acc can improve
            dist f1, f2, ··· , fn (I 1 , I 2 ) =   w i · dist fi (I 1 , I 2 ), (1)
                                                                                     the diversity to a certain degree. In run2, we only employ textual
                                          i=1
                                                                                     features (title, tags, username) for both re-ranking and clustering.
where fi denotes the i-th feature, dist fi (I 1 , I 2 ) is the distance be-
                                                                                     Before clustering, we select the top k (100, 150, 200, 250, 300) im-
tween image 1 and image 2 based on i-th feature, and w i is a man-
                                                                                     ages. Preliminary experiments indicate that k=150 achieves the
ually set weight for the distance of the i-th feature.
                                                                                     best performance in terms of F1@20. The results achieved on the
   We employ HC with single, complete, ward and average link-
                                                                                     development and test data show that re-ranking and hierarchical
age methods [5] respectively based on the Euclidean distance, and
                                                                                     clustering using textual features can improve the relevance and the
cluster 50 clusters for each topic.
                                                                                     diversity in comparison to run1 and Flickr to a large degree.
                                                                                        The core difference between run2, run3, run4 and run5 is that
2.3     Diversification                                                              they employ different features for clustering. Run3 is a combined
HC distributes the selected top k images to several sets, each of                    visual-textual run and employs a ward linkage method for cluster-
which potentially covers one scene or situation with related se-                     ing. The reason we do not use the same visual feature as run1 is
mantic meaning. Intuitively, to improve the diversity of a retrieval                 that better perfomance will be achieved using sc than acc or any
system, the results should cover as many clusters as possible. In                    other provided visual features in run3. The weights of textual and
the similiar way, the higher ranks indicate higher relevance. To bal-                visual feature are 1 and 0.02 respectively. In contrast to the result of
ance diversity and relevance, we directly select the images with the                 run2 on Devset and Testset, the value of every metric is increased,
highest relevance ranks from each cluster and sort them according                    CR@20 especially. The results on Devset and Testset indicate that
to their relevance as final result (see re-ranking in Section 2.1).                  the combination of visual and textual features outperforms the sin-
                                                                                     gle feature. Note that run4 which employs only credibility feature
3     RUN SETUP                                                                      in clustering achieves a better CR@20 and F1@20 than run2 and
We submit 5 runs with different experimental setting.                                run3 on both Devset and Testset. In our experiments, the credibility
     • Run1: visual-only. No re-ranking. HC uses a complete link-                    feature is the best feature for clustering. Run5 combines credibility
       age method and the provided visual feature auto color cor-                    and textual features in clustering and achieves the best performance
       relogram (acc) [8].                                                           in P@20, CR@20 and F1@20 on both Devset (0.6605, 0.4888, 0.5402)
     • Run2: text-only. Text-based re-ranking. HC uses a complete                    and Testset (0.6881, 0.6671, 0.6533). We do not use any visual feature,
       linkage method and the provided textual feature [8], .                        because our experimental results on the development set show that
     • Run3: text-visual. Text-based re-ranking. HC uses a ward                      none of the provided visual features is helpful to attain a higher
       linkage method and the provided visual feature scalable                       F1@20 in run5. The weights of textual and credibility features are
       color (sc) and textual feature [8].                                           0.2 and 1 respectively.
     • Run4: text-cred. Text-based re-ranking. HC uses a single
       linkage method and the provided credibility descriptors [8].                  5    CONCLUSION
     • Run5: text-cred. Text-based re-ranking. HC uses a single                      We presented an approach which employes re-ranking and hierar-
       linkage method and the provided credibility descriptors                       chical clustering for retrieving diverse social images. Experimental
       and textual feature [8].                                                      results demonstrated the efficiency of our approach to improve the
                                                                                     relevance and diversity. Credibility descriptors provided a mark
4     EXPERIMENT AND RESULTS                                                         of the quality of the user’s tag-image content relationships and
Table 1 and Table 2 present the results for the five runs evaluated                  were very helpful to cluster similar images to improve the diversity.
on Devset and Testset respectively. Concretly, Devset is the devel-                  Merging textual and credibility features could take advantage of
opment set and Testset is the official test set. Results are assessed                different features and achieved satisfactory performance.
Retrieving Diverse Social Images Task                                           MediaEval’17, 13-15 September 2017, Dublin, Ireland


REFERENCES
[1] B. Boteanu, M.G. Constantin, and B. Ionescu. Lapi@2016 retrieving
    diverse social images task: A pseudo-relevance feedback diversification
    perspective. In MediaEval, 2016.
[2] X. Chen, L. Haokun, D. Zhi-Hong, and Y. Yunlun. Retrieving relevant
    and diverse image from social media images. In MediaEval, 2015.
[3] C. D Ferreira, R. T. Calumby, I. B. do C Araujo, Ícaro C Dourado, J. AV
    Munoz, O. A. B. Penatti, L. T. Li, J. Almeida, and R. da Silva Torres.
    Recod@ mediaeval 2016: Diverse social images retrieval. In MediaEval,
    2016.
[4] A. L. Gînsca, A. Popescu, and N. Rekabsaz. Cea list’s participation at
    the mediaeval 2014 retrieving diverse social images task. In MediaEval,
    2014.
[5] D. Müllner. Modern hierarchical, agglomerative clustering algorithms.
    arXiv preprint arXiv:1109.2378, 2011.
[6] C. Spampinato and S. Palazzo. Perceive lab@unict at mediaeval 2014
    diverse images: Random forests for diversity-based clustering. In Media-
    Eval, 2014.
[7] S. Tollari. Upmc at mediaeval 2016 retrieving diverse social images task.
    In MediaEval 2016 Workshop, 2016.
[8] M. Zaharieva, B. Ionescu, A. L. Gînsca, R. L. T. Santos, and H. Müller.
    Retrieving diverse social images at mediaeval 2017: Challenge, dataset
    and evaluation. In MediaEval, 2017.