1. INTRODUCTION

Run

MIRACL's participation at MediaEval 2014 Retrieving Diverse Social Images Task

Hanen Karamti Mohamed Tmar

karamti.hanen@gmail.com karamti.hanen@gmail.com mohamedtmar@yahoo.fr mohamedtmar@yahoo.fr 0

Faiez Gargouri

faiez.gargouri@fsegs.rnu.tn 1 0 MIRACL Laboratory MIRACL Laboratory, City ons, University of Sfax City ons, University of Sfax , B.P.3021 Sfax TUNISIA B.P.3021 Sfax TUNISIA 1 MIRACL Laboratory, City ons, University of Sfax , B.P.3021 Sfax TUNISIA

2014

1 0 16 17

The 2014 MediaEval Retrieving Diverse Social Images Task tackles the problem of search result diversification of Flickr results sets formed from queries about geographic places and landmarks. In this paper we describe our approach using a neuron network. This approach uses only the visual information from image. The goal of this method is to put forward the creation of a new retrieval model based on a neural network which transforms any image retrieval process into a vector space model. We submitted two runs. The first run describes a sample method of content-based image retrieval. The second run describes our approach of query expension using our new retrieval model. Our two runs produced a high precision of the results.

1. INTRODUCTION

The diversification of search results is increasingly becoming an important topic in the area of information retrieval. The MediaEval 2014 Retrieving Diverse Social Images Task addresses the problem of result diversification in the context of social photo retrieval [ 1 ]. The data consist of a development set containing 30 locations (devset), a user annotation credibility set containing information for ca. 300 locations and 685 users (credibilityset) and a test set containing 123 locations (testset). Data was retrieved from Flickr using the name of the location as query. Participants will receive for each location a list of 300 photos retrieved from and ranked with Flickr’s default relevance algorithm.

For each query, our strategy to induce diversity while keeping the relevance is based on four steps: • Step 1: Recalculate the scores of the provided ranked list using feature extraction; • Step 2: Rerank the result to improve relevance; • Step 3: Use the results (the score vector and the feature vector for each image) to construct a new retrieval model using the neuron network; • Step 4: Calculate the new scores of images using our new model of search; • Step 5: Finally rerank the results in descending order of their scores. 2.

APPROACHES

To obtain representative and diverse photos in the upper rank, we use a method based on the visual information extracted from images. Run 1 uses a sample method of feature extraction [ 2 ] and Run 5 uses a new vectorization method using a neuron network [ 3 ]. In this section, we explain both methods and features briefly. 2.1

Run 1: Extraction of Visual features

Data extraction processing and query processing are two main functionalities supported of feature extraction [ 3 ]:

The data extraction process is responsible for extracting appropriate features from images and storing them into their feature vectors. This process is usually performed offline. The architecture of this phase is described in Figure 1. A feature vector VI of an image I can be thought of as a list of low-levels features (C1, C2, ..., Cm), where m is the number of features. We have used three descriptors: A color layout descriptor (CLD)1, an Edge Histogram Descriptor (EHD)2, and a Scalable Color Descriptor (SCD)3. The Ci represents the combination of CCLD, CSCD and CEHD of feature i.

The query processing, in turn, extracts a feature vector from a query and applies a metric (Euclidean distance, see equation 1) to evaluate the similarity between the query image and the database images.

vu m distEuclidean(VI , VIi ) = utX(CIj − CIij )2 i=1 (1) where VI is the feature vector of an image I, VIi is the feature vector of an image Ii, CIj is the low-level feature 1is designed to capture the spatial distribution of color in an image. 2is a texture descriptor proposed for MPEG-7 and expresses only the local edge distribution in the image. 3the historgram is generated by color quantizing the image into 256 bins in the HSV color space, with 16 bins for hue, and 4 bins each for saturation and value. CIj corresponding to VI and CIij is the low-level feature CIij corresponding to ViI .

The similarities scores of the queries results builds a score vector. A score vector SI of an image I can be thought of as a set of scores (S1, S2, ..., Sn), where n is the dimension of database images. 2.2

Run 5: vectorization process with the visual features

Each query location qi is expressed on the feature vector by (Fqi1 , Fqi2 , ..., Fqim ), where Fqij is the value of feature in the query qi. The feature vector Fqi corresponding to qi is computed with the same method of a score vector SI .

The image retrieval process provides a score vector (Sqi1 , Sqi2 , ..., Sqin ), where Sqij is the value of similarity score between the query qi and the image j where j ∈ {1, ..., n}. Sqi represents the list of 300 photos retrieved from Flickr and ranked with query processing process described in previous section.

This vector space model is characterized by a (m × n) matrix W where for each query qi, described by the Fqi feature vector, associated with a score vector Sqi , we have:

Fqi × W = Sqi W [i, j] = wij

∀(i, j) ∈ {1, 2...n} × {1, 2...m}  Fqi1  w11  Fq...i2  ×  ...

w21 w12 . . . w1m  w22 . . .

... . . . wn1 wn2 . . . wnm w2m  ∼=  . ..      Sqi1 

Sqi2 

...  (4) The wij values are calculated with a propagate algorithm using a neural network [ 3 ]. (2) (3)

RESULTS

Performance is going to be assessed for both diversity and relevance. The performance of our approach is computed by: • Average P @20: Precision, a measure that assesses the number of relevant photos among the top 20 results; • Average CR@20: Cluster Recall, a measure that assesses how many different clusters from the ground truth are represented among the top 20 results (only relevant images are considered); • Average F 1@20: F 1-measure at 20 is the harmonic mean of the previous two. 4.

CONCLUSIONS

In this paper we present our first participation in MediaEval 2014. Our results show that a purely visual approach can lead to efficient results. However, improved results could be obtained by leveraging on other types of information (textual description), if available, for further refining of the results. In the future, we plan to explore the integration of social and visual cues in order to obtain a more efficient diversification.

[1]

Ionescu ,

Popescu ,

Lupu , A.

Gˆınsc˘a, and

¨ller. Retrieving diverse social images at mediaeval 2014: Challenge, dataset and evaluation . In Proceedings of the MediaEval 2014 Multimedia Benchmark Workshop , Barcelona, Spain, October 16-17 , 2014 .

[2]

Karamti ,

Fakhfakh ,

Tmar , and

Gargouri . MIRACL at imageclef 2014: Plant identification task . In Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15-18 , pages 747 - 755 , 2014 .

[3]

Karamti ,

Tmar , and

Gargouri . Vectorization of content-based image retrieval process using neural network . In ICEIS 2014 - Proceedings of the 16th International Conference on Enterprise Information Systems , Volume 2 , Lisbon, Portugal, 27 - 30 April, pages 435 - 439 , 2014 .