-

Pro l Entropic visual Features for Visual Concept Detection in CLEF 2008 campaign

Herve GLOTIN

glotin@univ-tln.fr 0

Zhongqiu ZHAO

zhongqiuzhao@gmail.com 0

UMR CNRS

Universite' Sud Toulon-Var France

0 0 Introduction to Visual Concept Detection Clef2008 Task

In this task, we used only visual information to implement the VCDT task. We dened and compared two simple projection operators : the harmonic and arithmetic means. We proposed a new kind of compact features based on the entropy of pixels projection. These features, called Pro l Entropy Features (PEF), were added to usual color means and variances, and then were fed to SVM classi ers for the detection of 17 visual concepts on the IARPR images during the CLEF 2008 campaign. The simple arithmetic mean projection is at the 4th best rank at the o cial test over 53 runs of around 20 laboratories. We show that the harmonic projection gives complementary information, and that its simple early fusion with arithmetic PEF yields to the third best rank system. As the runs of the other teams used state of the art SIFT an color histogram visual features, it could be concluded that PEF are e cient. Moreover, PEF are fast with around 10 images computed per second on usual pentium.

Rank Fusion Image Retrieval

Therefore, the VCDT task provides a training database of approximately 1,800 images which are classi ed according to the concept hierarchy described in Figure 1 along with their classi cation. Only these data may be used to train retrieval models. Figure 2 shows the examples for the 17 topics to nd in the IAPR images database. So the retrieval task contains totally 17 topics. The test database consists of 1,000 images, for each of which participating groups are required to determine the presence/absence of the concepts.

In this task, we use the LS support vector machine (LS-SVM) to implement image retrieval, which will be detailed in Section 4.

Feature Extraction

An important step in content-based image retrieval (CBIR) system is the extraction of discriminant visual feature that are fast to compute. Information theory and Cognitive sciences can provide some inspiration for developping such feature.

Among the many visual features that have been studied, the distribution of color pixels in an image is the most common visual feature studied. The standard representation of color for content-based indexing in image databases is the color histogram. A di erent color representation is based on the information theoretic concept of entropy. Such entropic feature can simply equal the entropy of the pixel distribution of the image, as proposed in [1]. A more theoretical presentation of this kind of image entropy feature, accompanied by a practical description of its merits and limitations compared to color histograms, has been given in [ 2 ].

We propose in [ 3 ] a new feature equal to the pixel 'pro l' entropy. A pixel pro l can be a simple arithmetic mean in horizontal (or vertical) direction. The advantage of such feature is to combine raw shape and texture representations in a low cpu cost feature. These feature, associated to mean and color std, reached the second best rank in the o cial ImagEval 2006 campaing (see www.imageval.org and [ 4 ]).

In this paper we extend these features using another projection to get the pixel pro l. We then propose also to use the harmonic mean of the pixel of each ligne or column. The idea is that the object or pixel region distribution, which is lost in arithmetic mean projection, could be partly catch by the harmonic mean. These two projections are then expected to give complementary and/or concept dependant informations. We detail below the extraction algorithm of these Pro l Entropy Feature (PEF). 2.1

PEF Algorithm Let I be an image, or any rectangular subpart of an image.

For each normalized color (L = R + G + B, r = R=L; andg = G=L), we rst calculate two orthogonal pro ls by the projections of the pixels of I. We consider two simple orthogonal projection axes : the horizontal axis X (noted X ), versus the vertical one Y (noted Y ). The projection operator is either the arithmetic mean (noted 'Ar', then the projection is noted AXr), as illustrated in Figure 3, or the harmonic mean of the pixels on each column or each ligne of I (noted 'Ha', then we have HXa).

Then, we estimate the probability distribution function (pdf) of each pro l according to [ 5 ]. Considering that the sources are ergodic, we naly calculate each PEF equal to the normalized entropy (H(pdf )=log(#bins(pdf ))). We detail below each steps of the PEF extraction.

Let be op the selected projection, for each color of I of L(I) lignes and C(I) columns :

oXp(I) = p^df ( oXp(I)), over nbinX (I) = round(pC(I)) bins, where oXp is the vertical projection with operator op, P EFX (I) = H( oXp(I))=log(nbinX (I)).

P EFYoYp((II)) == Hp^d(f (oYpoY(pI()I))=)l,ogo(vnerbinnbYi(nIY))(.I) = round(pL(I)) bins,

We add to these P EFa the usual entropic feature : p^df (I) = pdf of all the pixels of I over nbinXY (I) = nbinX (I) nbinY (I) bins, P EF:(I) = H(p^df (I))=log(nbinXY (I)).

And we naly complete the PEF features by the usual mean and standard deviation of each normalized color of I.

70 60 50 40 300

signal RGB X profil 200 400 600 50 Y profil 100

150 R/L G/L L We can calculate the PEF into three horizontal subimages as illustrated in Figure 4. We note such PEF '='. We also calculate the PEF in three vertical subimages, we note these PEF 'kk'.

For each, we have 3 bands and 3 di erent PEF for each of the 3 colors, plus their mean and variance, thus we have 3 3 3 + 3 3 2 = 45 dimensions for '=' or for 'kk' features. We note '+' the feature concatenation of '=' and 'kk' features, which has then 90 dimensions. Considering the two mean type, the PEF concatenation without repetition of the mean and std color are quite compact with a total of 126 dimensions (= 2 (subimages type '=' or 'kk') * 3 (bands by subimages type) * 3 (rgL) * 4 (=4 types = (X or Y) * (Ar or Ha) ) + 1 (=H(I)) + 2 (= mean and std))).

Support Vector Machines

The support vector machine (SVM) [ 6,7 ] rst maps the data into a higher dimensional input space by some kernel functions to learn a separating hyperspace to maximize the margin. Currently, because of its good generalization capability, this technique has been widely applied in many areas such as face detection, image retrieval, and so on [ 8,9 ]. The SVM is typically based on an "-insensitive cost function, meaning that approximation errors smaller than will not increase the cost function value. This results in a quadratic convex optimization problem. So instead of using an "-insensitive cost function, a quadratic cost function can be used. The least squares support vector machines (LS-SVM) [ 10 ] are reformulations to the standard SVMs which lead to solving linear KKT systems instead. It is computationally attractive.

In our experiments, the RBF kernel

K(x1 x2) = exp( jx1 x2j2= 2) is selected as the kernel function of our LS-SVM. So there is a corresponding parameter, , to be tuned. A large value of 2 indicates a stronger smoothing. Moreover, there is another parameter, , needing tuning to nd the tradeo between to stress minimizing of the complexity of the model and to stress good tting of the training data points.

We train a total of 100 SVMs with di erent parameter values for each topic, and then we selected the best SVM using the validation set. In the experiments, we used the LS-SVMlab1.5 toolbox, which can be downloaded from http://www.esat.kuleuven.ac.be/sista/lssvmlab/. 4

Experimental Results

The process we adopt to implement the image retrieval in VCDT task is shown in Figure 5. It can also be depicted as the following steps:

Step 1) Split the VCDT labeled image dataset into 2 sets, namely training image dataset and validation set.

Step 2) Extract the visual features from the training image data using our extraction method; train and generate lots of SVM (or in the original run Kernel Discrimant Analysis or MlP) with di erent parameters.

Step 3) Use the validation set to select the best model

Step 4) Extract the visual features from the VCDT test image database using our extraction method; and then use the best model to nd the best discriminant feature.

Step 5) Sort the test images by the distances from the positive training images and produce the nal rank result. 0 0 2 2 2 4 4 4 6

8 10 gain( Ar= on Ar|| ) for each topic 12 14 16 18 6

8 10 12 gain( Ar+ on Ha+ ) for each topic 14 16 18 6 8 10 12 gain( [Ar+ U Ha+] on Ha+ ) for each topic 14 16 18

Discussion and conclusion

We naly compared PEF scores to the 4 best team which participated to Clef VCDT 2008. Figure 7 gives for each topic the classi cation error (= the complementary of the usual area under the curve = 1 - Area Under the Curve). In average the results of our 126 PEF features (denoted by 'LSIS') are at the third rank into the initial o cial campaing (the average of the 17 topic errors is the given at index 18 in g. 7). Xerox system is the best, certainly including SIFT features 2 4 6

8 10 12 topic number (18=global) 14 16 18 20 and large reference images database (see Xerox paper in this workshop). The usual perceptual color histograms features, of around 200 dimensions, that has been partly used by UPMC (see workshop note) seem similar or little less discriminant than PEF.

Acknowledgment

This work was partially supported by the French National Agency of Research (ANR-06-MDCA002). [1] M. Jagersand, Saliency maps and attention selection in scale and spatial coordinates: An information theoretic approach, in Proc. of 5th International Conference on Computer Vision, 1995.

[2] Iyengar

Zachary , S.S and Barhen

, Content based image retrieval and information theory: A generalized approach , in Special Topic Is- sue on Visual Based Retrieval Systems and Web Mining, Journal of the American Society for Information Science and Technology , 2001 , pp. 841 - 853 .

[3]

Glotin , "Robust Information Retrieval and perception for a scaled Lego-Audio-Video multistructuration" , Thesis of habilitation for research direction, University Sud Toulon-Var, 2007 .

[4]

Tollari and

Glotin , Web image retrieval on imageval: Evidences on visualness and textualness concept dependency in fusion model , in ACM Int Conf on Image Video Retrieval , 2007 .

[5]

Moddemeijer , On estimation of entropy and mutual information of continuous distributions , Signal Processing , vol. 16 , no. 3 , pp. 233 - 246 , 1989 .

[6] Vapnik , V. 1995 The nature of statistical learning theory . Springer-Verlag, New York.

[7] Vapnik , V. 1998 Statistical learning theory . John Wiley, New York.

[8] Waring , C.A. and Liu , X. 2005 . Face detection using spectral histograms and SVMs . IEEE Transactions on Systems, Man, and Cybernetics , Part

, 35 , 3 ( June 2005 ), 467 - 476 .

[9] Tong

, Edward , and Chang 2001 . Support vector machine active learning for image retrieval . In Proceedings of the ninth ACM international conference on Multimedia Ottawa , (Canada, 2001 ), 107 - 118 .

[10] Suykens , J.A.K. and Vandewalle , J. 1999 . Least Squares Support Vector Machine Classi ers Neural Processing Letters , 9 ( 1999 ), 293 - 300 .