=Paper=
{{Paper
|id=Vol-1436/Paper27
|storemode=property
|title=Retrieving Relevant and Diverse Image from Social Media Images
|pdfUrl=https://ceur-ws.org/Vol-1436/Paper27.pdf
|volume=Vol-1436
|dblpUrl=https://dblp.org/rec/conf/mediaeval/ChenLDY15
}}
==Retrieving Relevant and Diverse Image from Social Media Images==
Retrieving Relevant and Diverse Image from Social Media Images Xi Chen Haokun Liu Zhihong Deng Yunlun Yang Peking University Peking University Peking University Peking University Beijing, China Beijing, China Beijing, China Beijing, China mrcx@pku.edu.cn haokun-liu@pku.edu.cn zhdeng@cis.pku.edu.cn incomparable- lun@pku.edu.cn ABSTRACT • Filter out part of irrelevant images based on distance We describe our approach and its result for MediaEval 2015 Pre- FIltering and face detector Retrieving Diverse Social Images Task. The basic idea is removing the irrelevant images and then obtaining the diverse image using a • Based on gengeral CNN, adaptive CNN and TF-IDF greedy strategy. Experiment results show our method can retrieve Clustering similarity features diverse images with a moderate relevance to the topic. 1. INTRODUCTION • Select clusters based on average user visual score Retrieving social media information is currently a hot topic in Cluster- Filtering research. The Retrieving Diverse Social Images task puts forward a problem on how to generate a brief summary up to 50 images • Sort the clusters based on CNN features similarity which is both relevant and diverse from Flickr photos [1]. Sorting with Wikipedia images or user credibility The problem is challenging. On the one hand, we need to filter out the irrelevant images in Flickr to guarantee the relevance. On the other hand, we need to reserve different images in order to Distributing • Choose images from each cluster evenly enhance the diversity. Many work on this problem have been done previous. Concetto and Simone proposed a method including cluster wise • Use a greedy strategy to extract images not too close to each other filtering which attained a high relevancy but lower the diversity [2]. Extracting Dang-Nguyen, Piras and Giacinto introduced a method removing irrelevant image before clustering, yield high score in both criteria Figure 1. Schema of our approach [3], but this method may fail when irrelevant images is in the majority of the images. There are methods trying to make use of 2.2 Clustering the given ground truth with a neural network [4], but not very In this step, we use general CNN(Convolution Neural effective in improving the diversity. Network) and adaptive CNN features, combined with cosine We propose a method considering both relevance and similarity of the image text TFIDF between the image and all diversity, which can be describe briefly as extracting images from provided images, to form a 4096+4096+N dimension feature, the given images through clustering and sorting with the help of the where N is the number of provided images after pre-filtering. images from Wikipedia. The basic idea is removing the irrelevant images and then obtaining the diverse images using a greedy We use a hierarchical clustering algorithm [5] to cluster the strategy. image into 30 clusters. 2. APPROACH 2.3 Cluster-Filtering Our approach contains 6 steps (Figure 1): pre-filtering, In the previous step, similar images are clustered into the same clustering, cluster-filtering, sorting, distributing and extracting. cluster. As a result, in some cluster, most image are relevant, while Each of them is described in the following. in other clusters, most images are irrelevant. However, relevant images with different classes may not be directly clustered into 2.1 Pre-Filtering different clusters. Instead, similar classes are probably clustered In this step, we filter out part of irrelevant images based on the into one cluster. following rules: We expect to remove irrelevant clusters first, and to deal with 1. If the distance between spots for photography and the query is the diversity problem later. Using user information, we calculate over 30km. the average user score of each cluster, and keep only the clusters 2. If the proportion of human faces in the images is over 0.05. whose score is higher than the average score of the whole image set Because the face detector we used is not absolutely accurate, and filter out the other. and some images including people are also relevant to the queries’ topic, the images we filter out are not all correct but of a 2.4 Sorting considerable accuracy. To sorting the clusters, we calculate the distance between each image in a cluster and the corresponding nearest image in Wikipedia, and sort the clusters based on the average distance. Copyright is held by the author/owner(s). MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany. Visual features Text features User credibility P@20 CR@20 F1@20 Run 1 0.6840 0.4270 0.5197 Run 1 CNN adaptive, - - CNN general Run 2 0.6678 0.4068 0.4988 Run 3 0.6831 0.4347 0.5264 Run 2 - TF-IDF - Run 4 0.7564 0.3159 0.4373 Run 3 CNN adaptive, TF-IDF - Run 5 0.7995 0.4634 0.5866 CNN general Table 2(a). Run performance on Development Set P@20 CR@20 F1@20 Run 4 - - Visual score Run 1 0.6275 0.4053 0.4828 Run 5 CNN adaptive, TF-IDF Visual score Run 2 0.6254 0.3980 0.4797 CNN general Run 3 0.6217 0.4086 0.4811 Run 4 0.6304 0.3245 0.4097 Table 1. Features used in 5 runs Run 5 0.6804 0.4521 0.5298 2.5 Distributing Table 2(b). Run performance on Test Set Single-topic P@20 CR@20 F1@20 After sorting, we need to determine the number of images to Run 1 0.6864 0.4253 0.5090 be extracted from each cluster i.e., distributing the number of Run 2 0.6964 0.4419 0.5261 images among clusters. Here we use uniform distribution i.e., Run 3 0.6850 0.4626 0.5393 choosing same number of images from each cluster. And if it can’t Run 4 0.7136 0.3143 0.4109 be distributed evenly, redundant part will be chosen from the Run 5 0.7107 0.4350 0.5185 Table 2(c). Run performance on Test Set Multi-topic former clusters. P@20 CR@20 F1@20 2.6 Extracting Run 1 Run 2 0.6572 0.6612 0.4154 0.4201 0.4960 0.5031 After the previous distributing step, we apply a greedy strategy Run 3 0.6536 0.4358 0.5104 to extract images from each cluster. In the process of extracting, we Run 4 0.6273 0.3139 0.4103 choose the image in the first cluster with highest visual score to be Run 5 0.6957 0.4435 0.5241 the first image. And after that, in each step, image with highest Table 2(d). Run performance on Test Set score which is not among the top 15 nearest of any previously chosen image will be selected. Run 4. While CR@20, F1@20 in Run 5 are lower than Run 3. Generally, according to the average performance, Run 5 get the best 3. RESULTS AND DISCUSSION score on P@20 of 0.6957, CR@20 of 0.4435 and F1@20 of 0.5241. 3.1 Running Options 4. Discussion We submitted 5 runs as following: In Development Set, our approach has achieved good results, Run 1. We use only image features in clustering and extracting, and while in Test Set, P@20 performs relatively poorly. We believe it clusters are sorted by size without being filtered. We distribute is due to the method we used and problems we observed. evenly and extract using a greedy strategy. We noticed two problems in the experiments. One is that Run 2. We use only text features in clustering and extracting, and images in different classes may be clustered into the same cluster. clusters are sorted by size without being filtered. We distribute So we utilize a greedy strategy, and set different image/test weight evenly and extract using a greedy strategy. in the process of classification and extraction, which make a Run 3. We use both image and text features. Clusters are sorted by significant improvement in the experiments. size without being filtered. We distribute evenly and extract using The second issue is that we found it a common scenario that a greedy strategy. Featured were weighted differently in clustering users take multiple shots at one scene, causing irrelative images and extracting. may be well clustered into several clusters with few relevant images. Run 4. We simply select the first fifty pictures sorted by user score. Because of the difficulties in removing irrelative clusters, we use user visual score to decide whether a cluster is irrelative or not. This Run 5. We use both image and text features. The clusters with low method relies on the accuracy of user information. In the average visual score are filtered before sorted by distance to wiki experiments it generally makes improvements, but it reduces the image (sorted by average visual score if wiki is not available). We score at some test points, moreover, it makes the approach more distribute evenly and extract using a greedy strategy. Featured were sensitive to the quality of user information. Thus this problem has weighted differently in clustering and extracting. not yet reached a satisfactory solution. Different features used are shown in Table 1 above. 5. CONCLUSION 3.2 Results We proposed an approach on how to generate a brief image This section presents the experimental results achieved on summary considering both relevance and diversity from Flickr. development set (153 queries, 45,375 photos) and test set (139 This approach performs well as shown by our experiments. And we queries, 41,394 photos). The CR@20 is number of covered classes also put forward two problems which restrict the performance in among the top 20 results, P@20 is number of relevant photos the experiments. among the top 20 results, and F1@20 is the harmonic mean of the Further research will be mainly on how to identify whether a previous two. similar picture is related to the topic or not. We obtained the best result at Run 5 in development set with F1@20 values of 0.5866, P@20 values of 0.7995, and CR@20 ACKNOWLEDGMENT values of 0.4634. In the test for Single-topic, Run 5 gains the This work is support by the Peking University Education highest score on both relevance and diversity, while in the test for Foundation (URTP2015PKU003). Multi-topic, though P@20 in Run 5 is not the highest, it’s close to 6. REFERENCES [3] Duc-Tien Dang-Nguyen, Luca Piras, Giorgio Giacinto, [1] Ionescu, B., Gînscă, A.L., Boteanu, B., Popescu, A., Lupu, Giulia Boato, Francesco De Natale: Retrieval of Diverse M. ,Müller, H., Retrieving Diverse Social Images at Images by Pre-filtering and Hierarchical Clustering MediaEval 2015: Challenge, Dataset and Evaluation. [4] MIRACL's participation at MediaEval 2014 Retrieving Working Notes Proceedings of the MediaEval 2015 Diverse Social Images Task: Hanen Karamti, Mohamed Workshop, Wurzen, Germany, September 14-15, CEUR- Tmar, Faiez Gargouri WS.org, 2015; [5] Scikit-learn: Machine Learning in Python, Pedregosa et al., [2] PeRCeiVe Lab@UNICT at MediaEval 2014 Diverse Images: JMLR 12, pp. 2825-2830, 2011. Random Forests for Diversity-based Clustering Concetto Spampinato, Simone Palazzo