<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>DAEDALUS at ImageCLEF 2011 Plant Identification Task: Using SIFT Keypoints for Object Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Julio Villena-Román</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sara Lana-Serrano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Carlos González-Cristóbal</string-name>
          <email>josecarlos.gonzalez@upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DAEDALUS - Data</institution>
          ,
          <addr-line>Decisions and Language, S.A</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad Carlos III de Madrid</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad Politécnica de Madrid</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2011</year>
      </pub-date>
      <abstract>
        <p>This paper describes the participation of DAEDALUS at ImageCLEF 2011 Plant Identification task. The task is evaluated as a supervised classification problem over 71 tree species from the French Mediterranean area used as class labels, based on visual content from scan, scan-like and natural photo images. Our approach to this task is to build a classifier based on the detection of keypoints from the images extracted using Lowe's Scale Invariant Feature Transform (SIFT) algorithm. Although our overall classification score is very low as compared to other participant groups, the main conclusion that can be drawn is that SIFT keypoints seem to work significantly better for photos than for the other image types, so our approach may be a feasible strategy for the classification of this kind of visual content.</p>
      </abstract>
      <kwd-group>
        <kwd>Plant identification task</kwd>
        <kwd>image retrieval</kwd>
        <kwd>Scale-Invariant Feature Transform</kwd>
        <kwd>SIFT</kwd>
        <kwd>keypoints</kwd>
        <kwd>classifier</kwd>
        <kwd>training</kwd>
        <kwd>test</kwd>
        <kwd>Pl@ntLeaves</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This paper describes the participation of DAEDALUS research team at the Plant
Identification task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a new pilot task within ImageCLEF 2011 whose objective is to
research on the application of image retrieval technologies for identifying plant
species. Specifically, this first year the focus is on tree species identification based on
leaf images. Leaves are easily observable and the most studied organ in the computer
vision community, although they are known to not be the only discriminant key
between tree species.
      </p>
      <p>
        The task is evaluated as a supervised classification problem over 71 tree species
from the French Mediterranean area used as class labels, based on visual content from
Pl@ntLeaves dataset, published under a creative commons license within the
Pl@ntNet project [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], containing 3070 leaf scans, 897 leaf pictures with a white
uniform background (referred as scan-like pictures) and 2469 leaf pictures in natural
conditions (taken on the tree) provided by Telabotanica [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a French social network
of amateur and expert botanists.
      </p>
      <p>In addition to the image file itself, the dataset contains a series of meta-data
attributes apart from the full taxon name (species, genus, family…) and French or
English vernacular names (common names), including the acquisition type (scan,
pseudoscan or photograph), content type (single leaf, single dead leaf or several
leaves on tree visible in the picture), date, locality and GPS coordinates, and
information about the author, all encoded in XML files. An example is shown in
Figure 1.</p>
      <p>A part of Pl@ntLeaves dataset is provided as training data whereas the remaining
part is used later as test data. The training data finally results in 4004 images and the
test data results in 1432 images. The goal of the task is to associate the correct tree
species to each test image. Each participant was allowed to submit up to 3 runs built
from different methods. As many species as possible could be associated to each test
image, sorted by decreasing confidence score.</p>
      <p>In the following sections we will describe our approach, the experiments that we
submitted, the results that we achieved on this task, and some preliminary
conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Our Approach</title>
      <p>
        We approach this task with the construction of a classifier based on keypoints that
represent objects within the images, extracted using Lowe’s Scale-Invariant Feature
Transform (SIFT) algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>The fundamentals of SIFT algorithm are to extract interesting points for a given
training image that model the objects depicted in it, so that those objects can be
identified in a given test image containing many other objects. To perform reliable
recognition, those features extracted from the training image must be detectable under
changes in image scale, noise and illumination. In addition, the relative positions
between these features in the original scene should not change from one image to
another. Such interesting points usually lie on high-contrast regions of the image,
such as object edges.</p>
      <p>Our classifier is trained by first extracting SIFT keypoints from all images in the
training set. Each set of keypoints is stored in a database, associated to the tree species
that corresponds to such training image.</p>
      <p>
        The number of extracted keypoints can be controlled by scaling the image
resolution. Image resolution must not be very high as it is the larger scale keypoints
that are most reliable and this is also much more efficient than processing large
images. According to Lowe, an image of size 500 pixels square will typically give
over 1000 keypoints depending on image content, which is plenty for most
applications. For this purpose, each training image is rescaled to a width of 200
pixels. Moreover, as required by the Lowe’s implementation that is used to obtain the
SIFT keypoints [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], images are converted to greyscale PGM format prior to the
extraction.
      </p>
      <p>Once all the training set is processed, an object is recognized in a test image by
individually comparing each feature from the test image to this database and finding
candidate matching features based on Euclidean distance of their feature vectors. Test
images are also downscaled, in this case to a width of 400 pixels to be able to find
more keypoints, and then also converted to greyscale PGM format.</p>
      <p>
        From the full set of matches, subsets of keypoints that agree on the object and its
location, scale and orientation in the new image are identified to filter out good
matches. The same criteria as proposed by Lowe is used [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], in which matches are
identified by finding the 2 nearest neighbours of each keypoint from the training
image among those in the test image, and only accepting a match if the distance to the
closest neighbour is less than 0.6 of that to the second closest neighbour. This
threshold can be adjusted up to select more matches or down to select only the most
reliable.
      </p>
      <p>Then the probability that a particular set of features indicates the presence of an
object is computed, given the accuracy of fit and number of probable false matches.
Object matches that pass all these tests are supposed to be identified as correct with
high confidence.</p>
      <p>The output of the SIFT classifier provides a list of training images sorted by
relevance. To get the matching among training images and classification labels, the
relevance of the top-ranked training image for each classification label is selected as
the relevance for such label.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Results</title>
      <p>Although we initially planned different experiments changing the image downscaling
and the object acceptance thresholds, we finally submitted just one run to be
evaluated due to lack of time when carrying out the experiments.</p>
      <p>For the same reason, we had to discard our initial idea to build three different
specific classifiers based on acquisition type.</p>
      <p>Apart from the image itself and the taxon name in the training set, no use of any
other metadata information was made.</p>
      <p>The primary metric used by the organizers to evaluate the submitted runs is a
classification rate on the 1st species returned for each test image. Each test image is
attributed with a score of 1 if the 1st returned species is correct and 0 if it is wrong.
An average score is then computed on all test images. As a simple mean will
introduce some bias due to the different number of images of the same individual
plant and the number of pictures provided by each contributor to the Pl@ntLeaves
dataset, the final metric is defined as an average classification score S:
(1)
where U is the number of users (who have at least one image in the test data), Pu is
the number of individual plants observed by the u-th user, Nu,p is the number of
pictures taken from the p-th plant observed by the u-th user and Su,p,n is
classification score (1 or 0) for the n-th picture taken from the p-th plant observed by
the u-th user. An average classification score S is computed separately for each type
(scan, scan-like or photo) to isolate and evaluate its impact.</p>
      <p>The results achieved in our experiment are shown in Table 1.</p>
      <p>In general, those figures are very low and results are a bit disappointing. However,
an interesting point shown in the table is that the top values are achieved for natural
photos. As a preliminary interpretation, we think that this may be because of the fact
that SIFT keypoints strongly rely on contrast changes in images (such as colour
gradients or edges), and natural pictures represent more realistic conditions.</p>
      <p>Furthermore, another possible explanation may be the fact that the training and test
dataset are not evenly balanced among the three acquisition types and not even
between them, as shown in Table 2. Our conclusion is that we should have built three
different classifiers, one for each type of image.</p>
      <p>A detailed analysis considering more than the 1st result is presented in Table 3.
This table shows, for each classification label (tree species), the number of test images
where the label was returned (independently of its position in the result list) and the
average position of that label in the result list.</p>
      <p>Our classifier was able to find the valid label for 649 test images (45.1% of the
training set), in the 8.9th position on average. No test image was identified for the
following tree species: Alnus glutinosa, Fagus sylvatica, Fraxinus ornus and
Magnolia grandiflora.</p>
      <p>Finally, Figure 2 shows the comparison of all 21 runs submitted by all 8 groups.</p>
      <p>Our group is the last one in the overall ranking because of the low performance for
scans and especially for scan-like images. However our results for natural photos
outperform the best ranked experiment from two other groups, as shown in Figure 3.
This reinforces the idea that SIFT keypoints may be a valuable strategy for natural
photos.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>Despite the poor overall classification score, the main preliminary conclusion that can
be drawn is that SIFT keypoints seem to work better for natural photos rather than
scan and scan-like images, and our experiment has been able to outperform the best
experiment by other groups in this type.</p>
      <p>For future participations, we will definitely build specific classifiers for each image
type. Moreover, we will try other alternatives to SIFT that are less demanding to
compute and may handle colour images, such as SURF keypoints.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This work has been partially supported by several Spanish research projects:
MA2VICMR: Improving the access, analysis and visibility of the multilingual and
multimedia information in web for the Region of Madrid (S2009/TIC-1542),
MULTIMEDICA: Multilingual Information Extraction in Health domain and
application to scientific and informative documents (TIN2010-20644-C03-01) and
BUSCAMEDIA: Towards a semantic adaptation of multi-network-multiterminal
digital media (CEN-20091026). Authors would like to thank all partners for their
knowledge and support.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Goëau</surname>
            , Hervé; Bonnet, Pierre; Joly, Alexis; Boujemaa, Nozha; Barthelemy, Daniel; Molino, Jean-François; Birnbaum, Philippe; Mouysset, Elise; Picard,
            <given-names>Marie.</given-names>
          </string-name>
          <article-title>The CLEF 2011 plant image classification task</article-title>
          .
          <source>CLEF 2011 working notes</source>
          , Amsterdam, The Netherlands,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Pl@ntNet Project. http://www.plantnet-project.
          <source>org/ [online August</source>
          <year>2011</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Tela</surname>
            <given-names>Botanica</given-names>
          </string-name>
          ,
          <article-title>The French Botany Network</article-title>
          . http://www.tela-botanica.
          <source>org/ [online August</source>
          <year>2011</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lowe</surname>
          </string-name>
          , David G.
          <article-title>Object recognition from local scale-invariant features</article-title>
          .
          <source>Proceedings of the International Conference on Computer Vision</source>
          , vol
          <volume>2</volume>
          . pp.
          <fpage>1150</fpage>
          -
          <lpage>1157</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lowe</surname>
          </string-name>
          , David G.
          <article-title>Distinctive image features from scale-invariant keypoints</article-title>
          .
          <source>International Journal of Computer Vision</source>
          ,
          <volume>60</volume>
          ,
          <issue>2</issue>
          , pp.
          <fpage>91</fpage>
          -
          <lpage>110</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Demo</given-names>
            <surname>Software: SIFT Keypoint</surname>
          </string-name>
          <article-title>Detector</article-title>
          . http://www.cs.ubc.ca/~lowe/keypoints/ [
          <source>online August</source>
          <year>2011</year>
          ].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>