<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SIFT, BoW architecture and one-against-all Support vector machine</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mohamed Issolah</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diane Lingrand</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frederic Precioso</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2000</year>
      </pub-date>
      <abstract>
        <p>For this first participation to ImageClef Plant Identification, we build on the reference Bag-of-Word framework (BoW). We extract Points-of-Interest (PoI) using the SIFT detector in every image and describe each local feature with the SIFT descriptor. The visual dictionary is built with a K-means algorithm of 100 clusters on the local features. Each image is then represented by its histogram onto the dictionary using hard-assignment strategy. We classify the images with as many binary one-against-all Support Vector Machines as the number of plant classes per organ types. Our aim is to evaluate for the plant identification task a classic baseline of multi-class image categorization. Our first results illustrate how difficult this task is and that a framework which has become a standard baseline for classifying general image datasets is not immediately relevant on Plant Identification data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Feature extraction and Image description</title>
      <p>We extract Points-of-Interest (PoI) using both the SIFT detector and the SIFT
descriptor in each image. We extract about 1000 points in each image, with
standard settings of Opencv C++ library:
– Number of layers per Octave: 3
– The minimum threshold to consider a point as PoI: 0.04
– σ of Gaussian: 1.6</p>
      <p>Then we build a visual dictionary using a K-means where K is set to 100.
Finally we represent each image by its histogram obtained by the hard
assignment of each local feature to BoW clusters. The histogram is normalized by the
size of the BoW.</p>
      <p>– C = 100
– The kernel type is LINEAR
– Number of iteration = 10000
– Entire
– Stem
– Fruit
– Flower
– Leaf NaturalBackground
– Leaf SheetAsBackground
1.2</p>
    </sec>
    <sec id="sec-3">
      <title>Learning</title>
      <p>We build one-against-all SVMs for each plant class and for each organ type and
we exploit the XML metadata provided by the Challenge during the training
phase. We thereby identify type of content information to train separately
organoriented SVMs. The same SVM configuration is used for any organ:</p>
      <p>In the end, we obtain a trained SVM per class of plant and per organ type
among:</p>
      <p>SVM outputs are organized into vectors, one vector per organ type, as
depicted in figure 1
 classc11   classc21   classc61 </p>
      <p>classc12 classc22 classc62
 classc13   classc23  . . .  classc63 
 ...   ...   ... </p>
      <p>classc1n classc2n classc6n</p>
      <p>In 2013 dataset, the max number of plant classes is n = 250.
For one test image, our system performs keypoint extraction and description,
then from the XML file associated to the test image, we extract organ and type
information. We then classify the test image histogram using all the SVMs
corresponding to the associated organ. For example, if the organ of the image is
Leaf and the type is SheetAsBackground, our system executes the set of n SVMs
corresponding to Leaf vector. The final class corresponds to the class associated
with the SVM providing the highest confidence score.
For the first configuration, different parameters are to be determined but the
main one to choose is K, the number of clusters. The higher the number of
clusters, the more discriminant the Bag-of-Words histogram representation. The
number of clusters is fixed to 100 because the K-means configuration is common
to all different organs and plant classes, it must preserve the generalization
capability of the BoW representation and must require a reasonable computation
time.</p>
      <p>The same holds for the SVMs which have common settings for all organs and all
classes in our baseline implementation. The bigger the parameter C, the lower
the error rate. C is set to 100.</p>
      <p>The Clustering is the most demanding step in terms of computational intensity.
In order to reduce its impact, only 100 points among the 1000 extracted in each
image are considered during the clustering. The keypoints to be discarded are
chosen randomly.
2.1</p>
    </sec>
    <sec id="sec-4">
      <title>Resources</title>
      <p>For implementation we use the OpenCV libraries which offers different type of
detection and different methods for learning. The implementation language is
C++ for efficiency and speed. The LibXML libraries are used for XML parsing.
The program has been launched on a server made of 2 Processors Intel Xeon
X5675 at 3,06GHz, 6 Cores and 24GB RAM DDR3-1333MHz.
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>The goal of our proposition is to evaluate a standard framework for image
categorization in multimedia databases, using the classic local feature, SIFT, (both for
the detector and the descriptor), a BoW architecture and as many
one-againstall SVMs as binary classification required.</p>
      <p>The results are presented in figure 3, 4 and 5
Run name runfilename Entire Flower Fruit Leaf Stem NaturalBackground
I3S Run 1 1368034466828 new 100 0.017 0.023 0.041 0.038 0.025 0.026
I3S Run 2 1368165605197 new2 100 0.017 0.023 0.041 0.038 0.025 0.026
As a first participation to ImageClef Plant Identification challenge, we have
implemented a standard framework which proved to be powerful for image
categorization in multimedia database. To do so, we have considered SIFT algorithm
for both local feature detection and description, then represented each image
with its histogram on the visual dictionary. The resulting histograms for each
image are classified by one-against-all SVMs, one SVM per plant class and per
organ. Despite the efficiency of such architecture for image categorization, the
results are somewhat disappointing on ImageClef Plant Identification task. We
are currently working on how to optimize all the parameters of our method to
achieve better results.
Fig. 5. SheetAsBackgroundScores</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Caputo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomee</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paredes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zellhofer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeau</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varea</surname>
            ,
            <given-names>I.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cazorla</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Imageclef 2013: the vision, the data and the open challenges</article-title>
          .
          <source>Proceedings CLEF</source>
          <year>2013</year>
          ,
          <string-name>
            <surname>LNCS</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>