<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pro l Entropic visual Features for Visual Concept Detection in CLEF 2008 campaign</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Herve GLOTIN</string-name>
          <email>glotin@univ-tln.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhongqiu ZHAO</string-name>
          <email>zhongqiuzhao@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>UMR CNRS</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Universite' Sud Toulon-Var France</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Introduction to Visual Concept Detection Clef2008 Task</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this task, we used only visual information to implement the VCDT task. We dened and compared two simple projection operators : the harmonic and arithmetic means. We proposed a new kind of compact features based on the entropy of pixels projection. These features, called Pro l Entropy Features (PEF), were added to usual color means and variances, and then were fed to SVM classi ers for the detection of 17 visual concepts on the IARPR images during the CLEF 2008 campaign. The simple arithmetic mean projection is at the 4th best rank at the o cial test over 53 runs of around 20 laboratories. We show that the harmonic projection gives complementary information, and that its simple early fusion with arithmetic PEF yields to the third best rank system. As the runs of the other teams used state of the art SIFT an color histogram visual features, it could be concluded that PEF are e cient. Moreover, PEF are fast with around 10 images computed per second on usual pentium.</p>
      </abstract>
      <kwd-group>
        <kwd>Rank Fusion</kwd>
        <kwd>Image Retrieval</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Therefore, the VCDT task provides a training database of approximately 1,800 images which
are classi ed according to the concept hierarchy described in Figure 1 along with their classi
cation. Only these data may be used to train retrieval models. Figure 2 shows the examples for the
17 topics to nd in the IAPR images database. So the retrieval task contains totally 17 topics.
The test database consists of 1,000 images, for each of which participating groups are required to
determine the presence/absence of the concepts.</p>
      <p>In this task, we use the LS support vector machine (LS-SVM) to implement image retrieval,
which will be detailed in Section 4.</p>
    </sec>
    <sec id="sec-2">
      <title>Feature Extraction</title>
      <p>An important step in content-based image retrieval (CBIR) system is the extraction of discriminant
visual feature that are fast to compute. Information theory and Cognitive sciences can provide
some inspiration for developping such feature.</p>
      <p>
        Among the many visual features that have been studied, the distribution of color pixels in
an image is the most common visual feature studied. The standard representation of color for
content-based indexing in image databases is the color histogram. A di erent color representation
is based on the information theoretic concept of entropy. Such entropic feature can simply equal the
entropy of the pixel distribution of the image, as proposed in [1]. A more theoretical presentation
of this kind of image entropy feature, accompanied by a practical description of its merits and
limitations compared to color histograms, has been given in [
        <xref ref-type="bibr" rid="ref1">2</xref>
        ].
      </p>
      <p>
        We propose in [
        <xref ref-type="bibr" rid="ref2">3</xref>
        ] a new feature equal to the pixel 'pro l' entropy. A pixel pro l can be a
simple arithmetic mean in horizontal (or vertical) direction. The advantage of such feature is to
combine raw shape and texture representations in a low cpu cost feature. These feature, associated
to mean and color std, reached the second best rank in the o cial ImagEval 2006 campaing (see
www.imageval.org and [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ]).
      </p>
      <p>In this paper we extend these features using another projection to get the pixel pro l. We then
propose also to use the harmonic mean of the pixel of each ligne or column. The idea is that the
object or pixel region distribution, which is lost in arithmetic mean projection, could be partly
catch by the harmonic mean. These two projections are then expected to give complementary
and/or concept dependant informations. We detail below the extraction algorithm of these Pro l
Entropy Feature (PEF).
2.1</p>
      <p>PEF Algorithm
Let I be an image, or any rectangular subpart of an image.</p>
      <p>For each normalized color (L = R + G + B, r = R=L; andg = G=L), we rst calculate
two orthogonal pro ls by the projections of the pixels of I. We consider two simple orthogonal
projection axes : the horizontal axis X (noted X ), versus the vertical one Y (noted Y ). The
projection operator is either the arithmetic mean (noted 'Ar', then the projection is noted AXr),
as illustrated in Figure 3, or the harmonic mean of the pixels on each column or each ligne of I
(noted 'Ha', then we have HXa).</p>
      <p>
        Then, we estimate the probability distribution function (pdf) of each pro l according to [
        <xref ref-type="bibr" rid="ref4">5</xref>
        ].
Considering that the sources are ergodic, we naly calculate each PEF equal to the normalized
entropy (H(pdf )=log(#bins(pdf ))). We detail below each steps of the PEF extraction.
      </p>
      <p>Let be op the selected projection,
for each color of I of L(I) lignes and C(I) columns :</p>
      <p>oXp(I) = p^df ( oXp(I)), over nbinX (I) = round(pC(I)) bins,
where oXp is the vertical projection with operator op,
P EFX (I) = H( oXp(I))=log(nbinX (I)).</p>
      <p>P EFYoYp((II)) == Hp^d(f (oYpoY(pI()I))=)l,ogo(vnerbinnbYi(nIY))(.I) = round(pL(I)) bins,</p>
      <p>We add to these P EFa the usual entropic feature :
p^df (I) = pdf of all the pixels of I over nbinXY (I) = nbinX (I) nbinY (I) bins,
P EF:(I) = H(p^df (I))=log(nbinXY (I)).</p>
      <p>And we naly complete the PEF features by the usual mean and standard deviation of each
normalized color of I.</p>
      <p>70
60
50
40
300</p>
      <p>signal RGB
X profil
200
400
600
50
Y profil
100</p>
      <p>150
R/L
G/L
L
We can calculate the PEF into three horizontal subimages as illustrated in Figure 4. We note such
PEF '='. We also calculate the PEF in three vertical subimages, we note these PEF 'kk'.</p>
      <p>For each, we have 3 bands and 3 di erent PEF for each of the 3 colors, plus their mean and
variance, thus we have 3 3 3 + 3 3 2 = 45 dimensions for '=' or for 'kk' features. We note
'+' the feature concatenation of '=' and 'kk' features, which has then 90 dimensions. Considering
the two mean type, the PEF concatenation without repetition of the mean and std color are quite
compact with a total of 126 dimensions (= 2 (subimages type '=' or 'kk') * 3 (bands by subimages
type) * 3 (rgL) * 4 (=4 types = (X or Y) * (Ar or Ha) ) + 1 (=H(I)) + 2 (= mean and std))).</p>
    </sec>
    <sec id="sec-3">
      <title>Support Vector Machines</title>
      <p>
        The support vector machine (SVM) [
        <xref ref-type="bibr" rid="ref5 ref6">6,7</xref>
        ] rst maps the data into a higher dimensional input space
by some kernel functions to learn a separating hyperspace to maximize the margin. Currently,
because of its good generalization capability, this technique has been widely applied in many
areas such as face detection, image retrieval, and so on [
        <xref ref-type="bibr" rid="ref7 ref8">8,9</xref>
        ]. The SVM is typically based on an
"-insensitive cost function, meaning that approximation errors smaller than will not increase the
cost function value. This results in a quadratic convex optimization problem. So instead of using
an "-insensitive cost function, a quadratic cost function can be used. The least squares support
vector machines (LS-SVM) [
        <xref ref-type="bibr" rid="ref9">10</xref>
        ] are reformulations to the standard SVMs which lead to solving
linear KKT systems instead. It is computationally attractive.
      </p>
      <p>In our experiments, the RBF kernel</p>
      <p>K(x1
x2) = exp( jx1
x2j2= 2)
is selected as the kernel function of our LS-SVM. So there is a corresponding parameter, , to be
tuned. A large value of 2 indicates a stronger smoothing. Moreover, there is another parameter,
, needing tuning to nd the tradeo between to stress minimizing of the complexity of the model
and to stress good tting of the training data points.</p>
      <p>We train a total of 100 SVMs with di erent parameter values for each topic, and then we
selected the best SVM using the validation set. In the experiments, we used the LS-SVMlab1.5
toolbox, which can be downloaded from http://www.esat.kuleuven.ac.be/sista/lssvmlab/.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental Results</title>
      <p>The process we adopt to implement the image retrieval in VCDT task is shown in Figure 5. It
can also be depicted as the following steps:</p>
      <p>Step 1) Split the VCDT labeled image dataset into 2 sets, namely training image dataset and
validation set.</p>
      <p>Step 2) Extract the visual features from the training image data using our extraction method;
train and generate lots of SVM (or in the original run Kernel Discrimant Analysis or MlP) with
di erent parameters.</p>
      <p>Step 3) Use the validation set to select the best model</p>
      <p>Step 4) Extract the visual features from the VCDT test image database using our extraction
method; and then use the best model to nd the best discriminant feature.</p>
      <p>Step 5) Sort the test images by the distances from the positive training images and produce
the nal rank result.
0
0
2
2
2
4
4
4
6</p>
      <p>8 10
gain( Ar= on Ar|| ) for each topic
12
14
16
18
6</p>
      <p>8 10 12
gain( Ar+ on Ha+ ) for each topic
14
16
18
6 8 10 12
gain( [Ar+ U Ha+] on Ha+ ) for each topic
14
16
18</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion and conclusion</title>
      <p>We naly compared PEF scores to the 4 best team which participated to Clef VCDT 2008. Figure
7 gives for each topic the classi cation error (= the complementary of the usual area under the
curve = 1 - Area Under the Curve). In average the results of our 126 PEF features (denoted by
'LSIS') are at the third rank into the initial o cial campaing (the average of the 17 topic errors
is the given at index 18 in g. 7). Xerox system is the best, certainly including SIFT features
2
4
6</p>
      <p>8 10 12
topic number (18=global)
14
16
18
20
and large reference images database (see Xerox paper in this workshop). The usual perceptual
color histograms features, of around 200 dimensions, that has been partly used by UPMC (see
workshop note) seem similar or little less discriminant than PEF.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgment</title>
      <p>This work was partially supported by the French National Agency of Research
(ANR-06-MDCA002).
[1] M. Jagersand, Saliency maps and attention selection in scale and spatial coordinates: An
information theoretic approach, in Proc. of 5th International Conference on Computer Vision,
1995.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Iyengar</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zachary</surname>
            ,
            <given-names>S.S</given-names>
          </string-name>
          and
          <string-name>
            <surname>Barhen</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Content based image retrieval and information theory: A generalized approach</article-title>
          , in Special Topic Is- sue
          <source>on Visual Based Retrieval Systems and Web Mining, Journal of the American Society for Information Science and Technology</source>
          ,
          <year>2001</year>
          , pp.
          <fpage>841</fpage>
          -
          <lpage>853</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          ,
          <article-title>"Robust Information Retrieval and perception for a scaled Lego-Audio-Video multistructuration"</article-title>
          , Thesis of habilitation for research direction, University Sud Toulon-Var,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tollari</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          ,
          <article-title>Web image retrieval on imageval: Evidences on visualness and textualness concept dependency in fusion model</article-title>
          ,
          <source>in ACM Int Conf on Image Video Retrieval</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Moddemeijer</surname>
          </string-name>
          ,
          <article-title>On estimation of entropy and mutual information of continuous distributions</article-title>
          ,
          <source>Signal Processing</source>
          , vol.
          <volume>16</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>233</fpage>
          -
          <lpage>246</lpage>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <year>1995</year>
          <article-title>The nature of statistical learning theory</article-title>
          . Springer-Verlag, New York.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <year>1998</year>
          <article-title>Statistical learning theory</article-title>
          . John Wiley, New York.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Waring</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <year>2005</year>
          .
          <article-title>Face detection using spectral histograms and SVMs</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <volume>35</volume>
          ,
          <issue>3</issue>
          (
          <year>June 2005</year>
          ),
          <fpage>467</fpage>
          -
          <lpage>476</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Tong</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Edward</surname>
          </string-name>
          , and
          <article-title>Chang 2001</article-title>
          .
          <article-title>Support vector machine active learning for image retrieval</article-title>
          .
          <source>In Proceedings of the ninth ACM international conference on Multimedia Ottawa</source>
          , (Canada,
          <year>2001</year>
          ),
          <fpage>107</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Suykens</surname>
            ,
            <given-names>J.A.K.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Vandewalle</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>1999</year>
          .
          <source>Least Squares Support Vector Machine Classi ers Neural Processing Letters</source>
          ,
          <volume>9</volume>
          (
          <year>1999</year>
          ),
          <fpage>293</fpage>
          -
          <lpage>300</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>