1. INTRODUCTION

Retrieval of Diverse Images by Pre-filtering and Hierarchical Clustering

. Tree Refinning

0 1 2 0 D.-T. Dang-Nguyen, L. Piras, G. Giacinto DIEE - University of Cagliari Piazza D'armi , 09123 Cagliari , Italy 1 G. Boato, F. G. B. De Natale DISI - University of Trento Via Sommarive , 9 I-38123 Povo, Trento , Italy 2 Step 4. Result reranking

2014

16 17

In this paper, we describe our approach and its results for MediaEval 2014 Retrieving Diverse Social Images Task. The basic idea of our proposed method is to lter out non-relevant images at the beginning of the process and then construct a hierarchical tree which allows to cluster the images with di erent criteria on visual and textual features. Experimental results shown that it is stable and has little uctuation with the number of topics.

1. INTRODUCTION

In MediaEval 2014 Retrieving Diverse Social Images task [ 2 ], participants are provided with sets of images retrieved from Flickr, where each set is related to a location. However, these sets are normally noisy and redundant, thus, the goal of this task is to re ne the initial results by choosing a subset of images that are relevant to the queried location but reporting di erent views of the location, various perspective, di erent daytimes (e.g., night and day), etc.

The basic idea of the proposed method is to lter out the non-relevant images at the beginning based on the rules of the task and then use for clustering the BIRCH algorithm [ 4 ], that builds a hierarchical structure where nodes are the images, and edges represent the similarity between the linked nodes. This structure allows creating di erent clusters, according to the criteria used, and can also be used to remove outliers, i.e., non-relevant images that were not ltered out during the rst step.

METHODOLOGY

The proposed method contains 4 steps (see Fig. 1): Step 1. Pre- ltering: The goal of this step is to lter out outliers by removing images that are considered as non-relevant. We consider an image as non-relevant by de ning the following rules: (i) it contains people as main subject; (ii) it was shot far away from the queried location; (iii) it received very few number of views on Flickr; and (iv) it is out-of-focus or blurred. Condition (i) can be detected by the proportion of the human face size with respect to the size of the image.

In our method, Luxand FaceSDK 1 is used as a face Step 2. Hierarchical Clustering: In this step, we use the BIRCH clustering algorithm [ 4 ] on the provided visual and textual features. BIRCH allows to obtain an initial clustering result in large datasets with very low computational costs. Images that are similar to each other based on global visual features and textual information after this step are grouped into the same cluster or the same branch of the hierarchical tree; Step 3. Tree Re ning: thanks to the initial tree constructed in the previous step, isolated clusters can be easily removed or merged to other branches by updating the tree, without modifying the clusters; Step 4. Result Re-ranking: the clusters are sorted based on the number of images, i.e., clusters contain more images are ranked higher. In each cluster, the image uploaded by the user who has highest visual score is selected as the rst image. If there are more than one image from that user, the image closest to the centroid is selected. The second image is the one which Other features HOG2x2, f-Score, face size, GPS has the largest distance to the rst image. The third image is chosen as the image with the largest distance to both the rst 2 images, and so on.

Several similarities and metrics have been used: for the provided visual information, we use Euclidean distance, while with textual information, we use cosine similarity. About geo-tagged images, Haversine formula is used to compute the geographical distance between two locations.

RESULTS AND DISCUSSION

In order to nd the best combination of features and parameters (the number of clusters, the inner parameters of BIRCH, and the thresholds to determine the outliers), we ran our model for all the provided features together with our own features. According to the results, we choose the best features and parameters for each run and applied to the test set as follows:

Run 1 (Visual ): Color naming (CNM), color descriptor (GCD), histogram of oriented gradients (HOG) and local binary pattern (GLBP) are used. In Step 4, since we cannot exploit user credibility information, the centroid of each cluster is selected as the rst image. Run 2 (Text ): The parameters are chosen similar to Run 1, but we used only TF-IDF information and measure the distances by cosine similarity.

Run 3 (Visual+Text ): The method is applied on the combined features from Run 1 and Run 2 where TFIDF is used rst, then the visual features with Euclidean distance are applied after.

Run 4 (User credibility ): Please notice that this run is allowed to use only the user credibility information, thus the proposed method is not applied. In this run, we clustered the images by user. The order of the clusters is ranked based on the visual score (i.e., the cluster belong to the user with highest visual score will be selected rst), then by face proportion, and so on with all the user credibility information. For each cluster, images are selected based on the number of views, i.e., the image with highest number of views is selected as the rst image. Run 5 (All features): All steps in the proposed method are applied in this run. In Step 1, outliers are detected as follows: (i) the face size is bigger than 10% with respect to the size of the image, (ii) images that were shot farther than 15kms, (iii) images that have less than 25 views, and (iv) images that have f-score (focus measure) smaller than 20. In Step 2, a similar clustering as Run 3 is applied. About the visual features, we replace the provided HOG features by HOG2x2 as presented in [ 3 ].

[1]

J.-T.

Huang ,

C.-H.

Shen ,

S.-M.

Phoong , and

Chen . Robust measure of image focus in the wavelet domain . In Intelligent Signal Processing and Communication Systems , pages 157 { 160 , Dec 2005 .

[2]

Ionescu ,

Popescu ,

Lupu ,

A. L.

Ginsca , and

Muller . Retrieving diverse social images at mediaeval 2014: Challenge, dataset and evaluation . In MediaEval 2014 Workshop, Barcelona, Spain, 2014 .

[3]

Xiao ,

Hays ,

K. A.

Ehinger ,

Oliva , and

Torralba . Sun database: Large-scale scene recognition from abbey to zoo . In IEEE Conference on Computer Vision and Pattern Recognition , pages 3485 { 3492 . IEEE, 2010 .

[4]

Zhang ,

Ramakrishnan , and

Livny . BIRCH: An E cient Data Clustering Method for Very Large Databases . In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data , pages 103 { 114 , 1996 .