<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Plant species recognition using Bag-Of-Word with SVM classifier in the context of the LifeCLEF challenge</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Issolah Mohamed</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lingrand Diane</string-name>
          <email>lingrand@i3s.unice.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Precioso Fr´ed´eric</string-name>
          <email>precioso@unice.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Univ. Nice Sophia Antipolis Laboratory I3S, UMR 7271 UNS-CNRS 06900 Sophia Antipolis</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>738</fpage>
      <lpage>746</lpage>
      <abstract>
        <p>For the plant task of the LifeCLEF challenge, we adopted the reference Bag-of-Word framework (BoW) with local soft assignment. The points of interests (POI) are both detected and described with the SIFT and OpponentColor SIFT. Parameters of the bag of word are optimized through cross-validation and we present the results of different experimentations. A Support Vector Machine is trained with different strategies according to the organs and species of plants.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        For this 2014 participation in the LifeCLEF challenge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and more specifically
to the plant identification task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we build an image processing chain based
on the reference Bag-of-Word framework (BoW). We study the results obtained
with this framework when optimizing parameters with respect to the different
organs. The goal of the plant identification task is to determine plant species
from plant observations that may consist on one or more images and associated
meta-data.
      </p>
      <p>As a first step, we focus on plant species recognition from a single image
with the organ type as metadata. We have considered images of each organ
separately. There are 7 categories of organs to be considered this year: leaf with
natural background, leaf with uniform background, flower, fruit, branch, stem
and entire. From these categories of organs, we build 7 quite similar but
independent processing chains.</p>
      <p>
        We extract Points-of-Interest (PoI) using the SIFT detector in every image
and describe each local feature with the SIFT descriptor or Opponent Color
SIFT descriptor [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The visual dictionary is built with a K-means algorithm
on the local features. Each image is then represented by its histogram onto the
dictionary using local soft assignment strategy [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. We classify the images with
as many binary one-against-all Support Vector Machines as the number of plant
classes per organ types. Considering 7 categories of organs by 500 species leads
to almost 3500 SVMs (not all organs are available for every species).
      </p>
      <p>This year participation differs from our previous one by using local soft
assignment instead of hard assignment, color processing for flowers and the
optimization of parameters in the clustering and SVM classification:</p>
    </sec>
    <sec id="sec-2">
      <title>1. The number of cluster for the K-means. 2. C which represents the sum of error distances for SVM. We now detail the different steps and discuss the results of experimentations.</title>
      <p>1.1</p>
      <sec id="sec-2-1">
        <title>Feature Extraction and Image Description</title>
        <p>For each images, Points Of Interest (POIs) are extracted using the SIFT detector.
They are described using Opponent Color SIFT for flower and standard SIFT
for other organs.</p>
        <p>About 1000 points are extracted in each images, with standard settings:
1. Number of Layers per Octave = 3
2. The minimum threshold to consider a point as POI = 0.04
3. Sigma of Gaussian = 1.6</p>
        <p>From these POIs, visual dictionaries (one specific per organ) are computed
using a K-means algorithm. K, the number of clusters, is cross-validated to be
set to different values: 4000 for leaf with uniform background, 2000 for leaf with
natural background and 500 for the other organs.</p>
        <p>Finally, image features are encoded with a local soft assignment onto the
dictionary: each local features is participating for its 5 nearest clusters in the
BOW.
1.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Training</title>
        <p>For each category (i.e. each organ type), linear binary Support Vector Machine
(SVM) are learned on training data, in a one-against-all strategy, in order to
predict the different plant species. The C parameter is configured according to
the cross-validation results: 100 for leaf (uniform and natural background) and
0.5 for the other categories of organs.</p>
        <p>The SVMs are organized into 7 vectors according to the categories of organs
(see equation 1). Since images for the 7 categories of organs are not available for
all species, size of SVMs could be less than 500.</p>
        <p> classo11   classo21   classo71 
 classo12   classo22   classo72 
 classo13   classo23  . . .  classo73 
 ...   ...   ... 
classo1n1 classo2n2 classo7n7
(1)
1.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Query Image</title>
        <p>When all SVMs are computed on the training data, each test image has to be
analyzed with the following steps:
1. get the category of organ from the XML file,
2. extract and describe points of interest with SIFT or OpponentColor SIFT,
depending on the organ category,
3. generate the BOW using the vocabulary specific to the considered organ,
4. test all the SVM(s) of this organ ( 500) and get a list of confidence on the
prediction of species. ≤
1.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Generation of runs</title>
        <p>
          The confidence scores (d) obtained for the test images represent the distance of
the vector to the margin corresponding to each SVM. Thus, it is not possible
to compare scores from different SVMs. In order to overcome this problem,
confidence scores of each species are normalized in order to be compared each
other. This step will project each confidence in the interval of [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ].
        </p>
        <p>All confidence scores concerning the same observation plant are gathered.
Let denote Sd this set of confidence scores. All values are normalized using:
Sn = dnon dnon =
{ |
d</p>
        <p>M inSd
M ax−Sd − M inSd }
(2)
with:
Sn: new set of confidence normalized of the nth observation plant.
dnorn: normalized confidence score.
d: confidence score obtained by the SVM.</p>
        <p>M inSd: minimum value in Sd.</p>
        <p>M axSd : maximum value in Sd.</p>
        <p>After the normalization, confidence scores obtained for all single images
corresponding to the same observation plant are merged to generate the final run.
Two ways of merging have been tested:
run1 : confidence scores Sn have been sorted in descending order. Scores of the
same class are summed up.
run2 : confidence scores Sn have been sorted in descending order and keep the
uttermost.</p>
        <p>Merging confidence scores obtained with individual images for observation
plant increases the whole score.</p>
        <sec id="sec-2-4-1">
          <title>Experiments for the Optimization</title>
          <p>In order to tune our process, different experiments have been done on the 2013
ImageCLEF challenge dataset. First of all, parameters have been optimized:
number of clusters K for the K-means clustering algorithm and C for SVM.</p>
          <p>This study has been focused on leaf with uniform or natural background and
flower. Different values of the number of clusters K (100, 200, 500, 1000, 2000,
4000) and C (0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100) have been tested and chosen
with respect to the maximal value of score, computed from 2013 ImageCLEF
challenge rules. Comparison has also been made with default values (K = 100
and C = 100) that were used in our 2013 submissions.</p>
          <p>For the experimentations, train data from 2013 ImageCLEF were divided
according to table 1 and cross validation was performed.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>We now detail experiments for each organ category.</title>
      <p>Tuning K and C for leaf with uniform background For all tested values
of K and C, scores have been reported in table 2. Blue colored values correspond
to the best scores and the maximum is reached with K = 4000 and C = 100.
Variations of scores in the neighborhood of the best score with respect to K
and C have been plotted in figure 1 and show that increasing K and C leads to
better results.</p>
      <p>Tuning K and C lead to an increase of 27%.</p>
      <p>The table 2 suggest that we can have better result if we increase the
parameters: C and more specifically K. For this paper, we have tested a pre-defined
set of parameters for all organs. High value of K are costly in term of
computation, this is why higher than 4000 values of K have not been tested in this
paper but need to be in a further study. Refining quantification of parameters
in neighborhood of optimal values should also be examined.</p>
      <p>Tuning K and C for leaf with natural background Scores are reported in
table 3 and the maximal score is reached for K = 2000 and C = 100. Increasing
C value leads to higher score while K should not be increased over 4000 (see
figure 2.</p>
      <p>Tuning K and C lead to an increase of 76%. However, the maximal score
(0.368) is almost half the maximal score of leaf with uniform background.
Segmenting the leaf should significantly improve the performances by removing noise
introduced by the background.</p>
      <p>Tuning K and C for flower Scores are reported in table 4 and present a
maximal value for K = 500 and C = 0.5. Variations of score in the neighborhood
of the maximal value are not similar to the ones observed for the 2 categories of
organs associated to leafs. Even if the discretization that is performed on K and
C parameters may lead to miss global maximal values, we expect to be close
enough to this maximal value. Tuning K and C lead to an increase of 43%.</p>
      <p>Tuning K and C parameters improves significantly the performance.
However, the impact of the C parameter is less important than the K parameter.</p>
      <p>Tuning has be done using a set of predefined values that could be extended
and also refined by reducing discretization steps of the different parameters in
the neighborhood of optimal values.</p>
      <p>Further experiment should be done in order to refine optimal K and C
parameters for these organ categories but also for other categories. These experiments
are costly in term of computations and have to be planned for a long period of
time.</p>
      <sec id="sec-3-1">
        <title>Description of points of interest for flower Two different descriptors of</title>
        <p>points of interest have been tested for the organ category flower: SIFT and
OpponentColor SIFT in order to take the color into account. Using optimal
parameters K = 500 and C = 0.5, the score increases from 0.31 to 0.49 (+58%):
not really surprisingly, color has to be taken into account for flowers.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Local soft assignment versus hard assignment Different species may present</title>
        <p>organs that are visually similar, such as leafs for instance. In order to consider
different species for one image to be tested, local soft assignment has been
compared on the leaf organ category with uniform background. Optimal parameter
were K = 4000 and C = 100. The score increases from 0.67 to 0.74 (+10%).
This assignment has been used on all categories of organs.
3</p>
        <sec id="sec-3-2-1">
          <title>Result Obtained</title>
          <p>Submission has been done with the K and C parameters tuned on 2013 data,
SIFT for all categories of organs except flowers (OpponentColor SIFT) and local
soft assignment. Training has been done on the 2014 training data. The score
obtained are: 0.091 for run1 and 0.089 for run2 (see figure 4). Scores for run1
are detailed according to organ categories:</p>
          <p>Compared to 2013 results, results of 2014 have been improved. However, we
were expecting better results from what we had obtained on 2013 data sets,
especially on the leaf and flower categories. Parameters have been tuned and
scores have been computed on the 2013 dataset which was smaller with half
number of species.
4</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Conclusion</title>
          <p>Tuning parameters is not an instinctive task and its computation is time
consuming. However, it increases a lot the performance of the recognition. Using
local hard assignment can be benefit for the problem where we need more
discrimination. Further studies will focus on refining our tuning process and taking
into account metadata.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gemert</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geusebroek</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veenman</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smeulders</surname>
            ,
            <given-names>A.W.</given-names>
          </string-name>
          :
          <article-title>Kernel codebooks for scene categorization</article-title>
          .
          <source>In: Proceedings of the 10th European Conference on Computer Vision</source>
          : Part III. pp.
          <fpage>696</fpage>
          -
          <lpage>709</lpage>
          . ECCV 2008, Springer-Verlag, Berlin, Heidelberg (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. van Gemert,
          <string-name>
            <given-names>J.C.</given-names>
            ,
            <surname>Veenman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.J.</given-names>
            ,
            <surname>Smeulders</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.W.M.</given-names>
            ,
            <surname>Geusebroek</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.M.</surname>
          </string-name>
          :
          <article-title>Visual word ambiguity</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>32</volume>
          (
          <issue>7</issue>
          ),
          <fpage>1271</fpage>
          -
          <lpage>1283</lpage>
          (
          <year>Jul 2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Go¨eau, H.,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molino</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          , Barth´el´emy,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Boujemaa</surname>
          </string-name>
          , N.:
          <article-title>Lifeclef plant identification task 2014</article-title>
          . In: CLEF working notes
          <year>2014</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Mu¨ller, H., Go¨eau, H.,
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spampinato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rauber</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vellinga</surname>
            ,
            <given-names>W.P.</given-names>
          </string-name>
          , Fisher,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>Lifeclef 2014: multimedia life species identification challenges</article-title>
          .
          <source>In: Proceedings of CLEF</source>
          <year>2014</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. van de Sande,
          <string-name>
            <given-names>K.E.A.</given-names>
            ,
            <surname>Gevers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Snoek</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.G.M.</surname>
          </string-name>
          :
          <article-title>Evaluation of color descriptors for object and scene recognition</article-title>
          .
          <source>In: Proceedings of the IEEE Computer Society Conference on Computer Vision</source>
          and Pattern
          <string-name>
            <surname>Recognition</surname>
          </string-name>
          (
          <year>June 2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>