=Paper=
{{Paper
|id=Vol-1436/Paper27
|storemode=property
|title=Retrieving Relevant and Diverse Image from Social Media Images
|pdfUrl=https://ceur-ws.org/Vol-1436/Paper27.pdf
|volume=Vol-1436
|dblpUrl=https://dblp.org/rec/conf/mediaeval/ChenLDY15
}}
==Retrieving Relevant and Diverse Image from Social Media Images==
<pdf width="1500px">https://ceur-ws.org/Vol-1436/Paper27.pdf</pdf>
<pre>
                         Retrieving Relevant and Diverse Image
                               from Social Media Images
          Xi Chen                             Haokun Liu                      Zhihong Deng                                           Yunlun Yang
    Peking University                     Peking University                  Peking University                                     Peking University
     Beijing, China                        Beijing, China                     Beijing, China                                         Beijing, China
    mrcx@pku.edu.cn                    haokun-liu@pku.edu.cn              zhdeng@cis.pku.edu.cn                                     incomparable-
                                                                                                                                   lun@pku.edu.cn


ABSTRACT                                                                                          • Filter out part of irrelevant images based on distance
We describe our approach and its result for MediaEval 2015                            Pre-
                                                                                    FIltering
                                                                                                    and face detector

Retrieving Diverse Social Images Task. The basic idea is removing
the irrelevant images and then obtaining the diverse image using a                                • Based on gengeral CNN, adaptive CNN and TF-IDF
greedy strategy. Experiment results show our method can retrieve                    Clustering
                                                                                                    similarity features
diverse images with a moderate relevance to the topic.

1. INTRODUCTION                                                                                   • Select clusters based on average user visual score
      Retrieving social media information is currently a hot topic in
                                                                                    Cluster-
                                                                                    Filtering

research. The Retrieving Diverse Social Images task puts forward
a problem on how to generate a brief summary up to 50 images                                      • Sort the clusters based on CNN features similarity
which is both relevant and diverse from Flickr photos [1].                           Sorting
                                                                                                    with Wikipedia images or user credibility

      The problem is challenging. On the one hand, we need to filter
out the irrelevant images in Flickr to guarantee the relevance. On
the other hand, we need to reserve different images in order to                    Distributing
                                                                                                  • Choose images from each cluster evenly

enhance the diversity.
      Many work on this problem have been done previous.
Concetto and Simone proposed a method including cluster wise                                       • Use a greedy strategy to extract images not too close
                                                                                                                        to each other
filtering which attained a high relevancy but lower the diversity [2].
                                                                                    Extracting


Dang-Nguyen, Piras and Giacinto introduced a method removing
irrelevant image before clustering, yield high score in both criteria
                                                                                                    Figure 1. Schema of our approach
[3], but this method may fail when irrelevant images is in the
majority of the images. There are methods trying to make use of           2.2 Clustering
the given ground truth with a neural network [4], but not very                 In this step, we use general CNN(Convolution Neural
effective in improving the diversity.                                     Network) and adaptive CNN features, combined with cosine
      We propose a method considering both relevance and                  similarity of the image text TFIDF between the image and all
diversity, which can be describe briefly as extracting images from        provided images, to form a 4096+4096+N dimension feature,
the given images through clustering and sorting with the help of the      where N is the number of provided images after pre-filtering.
images from Wikipedia. The basic idea is removing the irrelevant
images and then obtaining the diverse images using a greedy                   We use a hierarchical clustering algorithm [5] to cluster the
strategy.                                                                 image into 30 clusters.

2. APPROACH                                                               2.3 Cluster-Filtering
     Our approach contains 6 steps (Figure 1): pre-filtering,                  In the previous step, similar images are clustered into the same
clustering, cluster-filtering, sorting, distributing and extracting.      cluster. As a result, in some cluster, most image are relevant, while
Each of them is described in the following.                               in other clusters, most images are irrelevant. However, relevant
                                                                          images with different classes may not be directly clustered into
2.1 Pre-Filtering                                                         different clusters. Instead, similar classes are probably clustered
     In this step, we filter out part of irrelevant images based on the   into one cluster.
following rules:                                                               We expect to remove irrelevant clusters first, and to deal with
1． If the distance between spots for photography and the query is         the diversity problem later. Using user information, we calculate
     over 30km.                                                           the average user score of each cluster, and keep only the clusters
2． If the proportion of human faces in the images is over 0.05.           whose score is higher than the average score of the whole image set
     Because the face detector we used is not absolutely accurate,        and filter out the other.
and some images including people are also relevant to the queries’
topic, the images we filter out are not all correct but of a              2.4 Sorting
considerable accuracy.                                                        To sorting the clusters, we calculate the distance between each
                                                                          image in a cluster and the corresponding nearest image in
                                                                          Wikipedia, and sort the clusters based on the average distance.

Copyright is held by the author/owner(s).
MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany.
              Visual features       Text features       User credibility                           P@20              CR@20               F1@20
                                                                                 Run 1             0.6840             0.4270              0.5197
   Run 1      CNN adaptive,               -                    -
              CNN general                                                        Run 2             0.6678             0.4068              0.4988
                                                                                 Run 3             0.6831             0.4347              0.5264
   Run 2             -                 TF-IDF                  -                 Run 4             0.7564             0.3159              0.4373
   Run 3      CNN adaptive,            TF-IDF                  -                 Run 5             0.7995             0.4634              0.5866
              CNN general                                                                 Table 2(a). Run performance on Development Set
                                                                                                   P@20              CR@20               F1@20
   Run 4             -                    -              Visual score            Run 1             0.6275             0.4053              0.4828
   Run 5      CNN adaptive,            TF-IDF            Visual score            Run 2             0.6254             0.3980              0.4797
              CNN general                                                        Run 3             0.6217             0.4086              0.4811
                                                                                 Run 4             0.6304             0.3245              0.4097
                     Table 1. Features used in 5 runs
                                                                                 Run 5             0.6804             0.4521              0.5298
2.5 Distributing                                                                         Table 2(b). Run performance on Test Set Single-topic
                                                                                                   P@20              CR@20               F1@20
    After sorting, we need to determine the number of images to                  Run 1             0.6864             0.4253              0.5090
be extracted from each cluster i.e., distributing the number of                  Run 2             0.6964             0.4419              0.5261
images among clusters. Here we use uniform distribution i.e.,                    Run 3             0.6850             0.4626              0.5393
choosing same number of images from each cluster. And if it can’t                Run 4             0.7136             0.3143              0.4109
be distributed evenly, redundant part will be chosen from the                    Run 5             0.7107             0.4350              0.5185
                                                                                         Table 2(c). Run performance on Test Set Multi-topic
former clusters.
                                                                                                   P@20              CR@20               F1@20
2.6 Extracting                                                                   Run 1
                                                                                 Run 2
                                                                                                   0.6572
                                                                                                   0.6612
                                                                                                                      0.4154
                                                                                                                      0.4201
                                                                                                                                          0.4960
                                                                                                                                          0.5031
     After the previous distributing step, we apply a greedy strategy            Run 3             0.6536             0.4358              0.5104
to extract images from each cluster. In the process of extracting, we            Run 4             0.6273             0.3139              0.4103
choose the image in the first cluster with highest visual score to be            Run 5             0.6957             0.4435              0.5241
the first image. And after that, in each step, image with highest                              Table 2(d). Run performance on Test Set
score which is not among the top 15 nearest of any previously
chosen image will be selected.                                             Run 4. While CR@20, F1@20 in Run 5 are lower than Run 3.
                                                                           Generally, according to the average performance, Run 5 get the best
3. RESULTS AND DISCUSSION                                                  score on P@20 of 0.6957, CR@20 of 0.4435 and F1@20 of 0.5241.

3.1 Running Options                                                        4. Discussion
     We submitted 5 runs as following:                                          In Development Set, our approach has achieved good results,
Run 1. We use only image features in clustering and extracting, and        while in Test Set, P@20 performs relatively poorly. We believe it
clusters are sorted by size without being filtered. We distribute          is due to the method we used and problems we observed.
evenly and extract using a greedy strategy.                                     We noticed two problems in the experiments. One is that
Run 2. We use only text features in clustering and extracting, and         images in different classes may be clustered into the same cluster.
clusters are sorted by size without being filtered. We distribute          So we utilize a greedy strategy, and set different image/test weight
evenly and extract using a greedy strategy.                                in the process of classification and extraction, which make a
Run 3. We use both image and text features. Clusters are sorted by         significant improvement in the experiments.
size without being filtered. We distribute evenly and extract using             The second issue is that we found it a common scenario that
a greedy strategy. Featured were weighted differently in clustering        users take multiple shots at one scene, causing irrelative images
and extracting.                                                            may be well clustered into several clusters with few relevant images.
Run 4. We simply select the first fifty pictures sorted by user score.     Because of the difficulties in removing irrelative clusters, we use
                                                                           user visual score to decide whether a cluster is irrelative or not. This
Run 5. We use both image and text features. The clusters with low          method relies on the accuracy of user information. In the
average visual score are filtered before sorted by distance to wiki        experiments it generally makes improvements, but it reduces the
image (sorted by average visual score if wiki is not available). We        score at some test points, moreover, it makes the approach more
distribute evenly and extract using a greedy strategy. Featured were       sensitive to the quality of user information. Thus this problem has
weighted differently in clustering and extracting.                         not yet reached a satisfactory solution.
     Different features used are shown in Table 1 above.
                                                                           5. CONCLUSION
3.2 Results                                                                     We proposed an approach on how to generate a brief image
     This section presents the experimental results achieved on            summary considering both relevance and diversity from Flickr.
development set (153 queries, 45,375 photos) and test set (139             This approach performs well as shown by our experiments. And we
queries, 41,394 photos). The CR@20 is number of covered classes            also put forward two problems which restrict the performance in
among the top 20 results, P@20 is number of relevant photos                the experiments.
among the top 20 results, and F1@20 is the harmonic mean of the                 Further research will be mainly on how to identify whether a
previous two.                                                              similar picture is related to the topic or not.
     We obtained the best result at Run 5 in development set with
F1@20 values of 0.5866, P@20 values of 0.7995, and CR@20                   ACKNOWLEDGMENT
values of 0.4634. In the test for Single-topic, Run 5 gains the                This work is support by the Peking University Education
highest score on both relevance and diversity, while in the test for       Foundation (URTP2015PKU003).
Multi-topic, though P@20 in Run 5 is not the highest, it’s close to
6. REFERENCES                                                    [3] Duc-Tien Dang-Nguyen, Luca Piras, Giorgio Giacinto,
[1] Ionescu, B., Gînscă, A.L., Boteanu, B., Popescu, A., Lupu,       Giulia Boato, Francesco De Natale: Retrieval of Diverse
    M. ,Müller, H., Retrieving Diverse Social Images at              Images by Pre-filtering and Hierarchical Clustering
    MediaEval 2015: Challenge, Dataset and Evaluation.           [4] MIRACL's participation at MediaEval 2014 Retrieving
    Working Notes Proceedings of the MediaEval 2015                  Diverse Social Images Task: Hanen Karamti, Mohamed
    Workshop, Wurzen, Germany, September 14-15, CEUR-                Tmar, Faiez Gargouri
    WS.org, 2015;
                                                                 [5] Scikit-learn: Machine Learning in Python, Pedregosa et al.,
[2] PeRCeiVe Lab@UNICT at MediaEval 2014 Diverse Images:             JMLR 12, pp. 2825-2830, 2011.
    Random Forests for Diversity-based Clustering Concetto
    Spampinato, Simone Palazzo

</pre>