=Paper= {{Paper |id=Vol-1178/CLEF2012wn-ImageCLEF-ZhengEt2012 |storemode=property |title=ZhaoHFUT at ImageCLEF 2012 Plant Identification Task |pdfUrl=https://ceur-ws.org/Vol-1178/CLEF2012wn-ImageCLEF-ZhengEt2012.pdf |volume=Vol-1178 |dblpUrl=https://dblp.org/rec/conf/clef/ZhengZG12 }} ==ZhaoHFUT at ImageCLEF 2012 Plant Identification Task == https://ceur-ws.org/Vol-1178/CLEF2012wn-ImageCLEF-ZhengEt2012.pdf
        ZhaoHFUT at ImageCLEF 2012 Plant
               Identification Task

             Peng Zheng1 , Zhong-Qiu Zhao1 , and Herve Glotin2,3
         1
           The College of Computer Science and Information Engineering,
                       Hefei University of Technology, China,
2
  Systems and Information Sciences lab, LSIS CNRS & Univ. of Sud-Toulon Var - La
                                Garde, 83957, France,
                  3
                    Institut Iniversitaire de France, Paris, France,
      nicknina2006@163.com, zhongqiuzhao@gmail.com, glotin@univ-tln.fr




      Abstract. This paper presents the contribution of ZhaoHFUT group to
      the ImageCLEF 2012 Plant identiﬁcation task. The task involves iden-
      tifying various species of trees based on images of their leaves. In this
      task, we adopted the main structure of the ScSPM model and another ex-
      tension of Scale Invariant Feature Transform (SIFT) descriptors, namely
      ﬂip SIFT descriptors, to investigate the performance of them considering
      their good performance in object classiﬁcation. Although our results are
      not quite promising as compared to other participant groups, they can
      still guide our work in this ﬁeld for some conclusions reached.

      Keywords: Plant identiﬁcation, feature extraction, ScSPM, ﬂip SIFT
      descriptors



1   Introduction

There are about 400,000 species in the world, of which only 270,000 have been
named and identiﬁed by botanists. Accurate knowledge of the identity, geo-
graphic distribution and uses of plants is essential for agriculture development
and biodiversity. Although there are many researches in this ﬁeld, the taxonomy
of species is still a hard task even for professionals (such as farmers or wood
exploiters) or for the botanists themselves because such basic information is
usually partially available for professional researchers and the ﬂowers and fruits
are not always available for studies throughout the year while the leaves which
can be gotten for most of the time don’t have suﬃcient visible characteristics to
diﬀerentiate between many species.
    Methods of computer vision such as image retrieval technologies can be help-
ful in reducing this taxonomic gap. Also evaluating recent advances of the IR
community on this challenging task might therefore have a strong impact. The
ImageCLEF series use this idea and propose an ongoing campaign on plant i-
dentiﬁcation task. The goal of the task is to retrieve the correct species among
the top k species of a ranked list of retrieved species for each test image and
provide a forum for researchers working on image analysis and artiﬁcial intelli-
gence methods to share their ideas and compare their systems in order to help
the taxonomic process with leaves information [1] [2].
    In the continuity of last year pilot, the task will be focused on tree species
identiﬁcation based on leaf images. The main novelties compared to last year
are the following:
(1) More species: the number of species will be this year 126, which is an impor-
tant step towards covering the entire ﬂora of a given region.
(2) Plant species retrieval vs. pure classiﬁcation: last year’s task was organized
as a pure classiﬁcation task over 70 species, whereas this year evaluation metric
will be slightly modiﬁed in order to consider a ranked list of retrieved species
rather than a single brute determination [1].
    So the task this year is much more challenge than that of last year and the
evaluation is also quite reasonable compared with that of last year.
    The following Section 2 describes the materials and methods used. In Section
3, we describe the results of our experiments. Finally, some conclusions reached
are presented in Section 4.


2     Material and Methods

2.1   Database

The task will be based on Pl@ntLeaves II dataset which focuses on 126 tree
species from French Mediterranean area. This database is maintained by the
French project Pl@ntNet (INRIA, CIRAD, Telabotanica). It contains 11572 pic-
tures subdivided into three diﬀerent kinds of pictures: scans (57%), scan-like
photos (24%) and free natural photos (19%)[1].
1. Scan: These images are oriented vertically along the main natural axis and
with the petiole visible collected using ﬂatbed scanners.
2. Scan-like photos: Those images have uniform background but with some lumi-
nance variations, optical distortions, shadows and color derivations which look
similar to the scans images.
3. Free natural photos: No acquisition protocol is used, which results in a non-
uniform background, rotated and bad-scaled images, among others problems.
And the images are taken directly on the trees.
    All data are published under a creative commons license. Each image has an
xml ﬁle associated that contain the date, type (single leaf, single dead leaf or
foliage), name of the author and GPS coordinates of the observation etc. But we
don’t make use of this kind of information and all the data are extracted form
the images directly without any segmentation operations. So the experiments
are fully automatic without any human assistance.
    The training data ﬁnally results in 8422 images (4870 scans, 1819 scan-like
photos, 1733 natural photos) with full xml ﬁles associated to them. The test data
results in 3150 images (1760 scans, 907 scan-like photos, 483 natural photos) with
purged xml ﬁles (i.e. without the taxon information that has to be predicted)[1].
2.2   Methods
As the leaf is one kind of object, the methods applied to object classiﬁcation
can be adopted to perform plant identiﬁcation. Sparse representation is a new
method used to solve this problem and achieve promising results [3]. Here we
utilize the ScSPM model proposed by Yang to ﬁnish this task. ScSPM is an
algorithm which combined spatial pyramid matching and sparse representation
getting good results in object classiﬁcation task [4]. This algorithm has 4 stages
as follows.

Feature Extraction. Feature extraction has great eﬀects on the stages af-
terwards. As SIFT descriptors are widely used in pattern recognition, signal
processing and computer vision and suitable for classiﬁcation, in this paper we
extracted SIFT descriptors densely in images and normalized them to form the
feature vectors [5]. Also as symmetry can be found in leaves, if we consider this
kind of symmetry in feature extraction, a better result may be able to gotten. So
here we take advantage of an extension version of SIFT descriptors, namely ﬂip
SIFT descriptors, to do another experiment [6]. The descriptors can be generat-
ed by the software package provided by the author. We change it by extracting
them densely in the images and omit some descriptors which don’t meet the
requirements.

Sparse Coding. Since the introduction of sparse coding, many algorithms have
been proposed to solve this problem such as MP, OMP which solve the l0 -
regularization problem:
                               2
                 min ∥y − Xa∥2 + λ ∥a∥0 subject to ∥xi ∥ <
                                                         =1 .                 (1)
                  a

and LASSO that solves the l1 -regularization problem:
                               2
                 min ∥y − Xa∥2 + λ ∥a∥1 subject to ∥xi ∥ <
                                                         =1 .                 (2)
                  a

Feature search sign is a method solving this problem with high eﬃciency, so we
use it to do sparse coding for saving the time for experiments [7].

Spatial Pooling Scheme. Instead of the histogram representation in tradi-
tional SPM method [8], this scheme divided the image into several scales and on
each scale it performed pooling scheme in the patch formed.
    Usually there are three diﬀerent pooling mechanisms which can be used in
this model:
(1) Average pooling which calculates the mean pooling of absolution value.
(2) Maximum value pooling which gets the maximum absolute value in a local
patch.
(3) Energy pooling which calculates the square root of mean of energy values in
a patch formed.
    The operation of maximum pooling can get the best result usually, so we just
report the result of it.
Classification. The last step of this framework is the training of a classiﬁer to
classify the unitary normalized test vectors obtained from the previous stages. As
linear classiﬁers can earn a good result, we use linear Supported Vector Machine
(SVM) classiﬁers to ﬁnish this work and the parameters for SVMs are tuned
by cross-validation. Also the results for the runs are formed by normalizing the
degrees of attribution in diﬀerent SVMs.


3     Experiments and Results
3.1   Evaluation Metric
The only thing that matters in the primary metric evaluating the submitted
runs is the rank of the correct species in the list of retrieved species. Each test
image will be assigned a score between 0 and 1 which decreases quickly with
the increment of the rank of the correct species. At last, a simple mean on all
test images will be computed. Since we want to evaluate the ability of a system
to provide correct answers to all users and based on a single plant observation,
we rather measure the mean of the average classiﬁcation rate per author and
average the classiﬁcation rate on each individual plant respectively. Finally, the
ﬁnal metric is deﬁned as the following average classiﬁcation score S:

                            1 ∑ 1 ∑       1 ∑
                               U      Pu      Nu,p
                       S=                          su,p,n .                     (3)
                            U u=1 Pu p=1 Nu,p n=1

where U is the number of users (who have at least one image in the test data), Pu
is the number of individual plants observed by the u-th user, Nu,p is the number
of pictures taken from the p-th plant observed by the u-th user and Su,p,n is
classiﬁcation score for the n-th picture taken from the p-th plant observed by
the u-th user. Considering diﬀerent image acquisition type may have diﬀerent
eﬀects, an average classiﬁcation score S will be computed separately for each
type [1].

3.2   The Results
According to the rule of the campaign, we submitted three diﬀerent runs to the
plant identiﬁcation task. And the details are as follows.
   In the ﬁrst run, we just extracted the dense-SIFT descriptors from the im-
ages, used ScSPM model for generating the feature vectors and took the ls-svm
package to form classiﬁers to classify the test images in the database [9].
   In the second run, we changed the descriptors into ﬂip SIFT ones in hope of
taking symmetry into account and the others kept the same as the ﬁrst run.
   In the third run, also the last run, we fused the classiﬁers of the two runs
above to form a new kind of degree of attribution. Then the normalization of
the terms formed the rank list we submitted.
   The results are showed in Table 1. From it we can ﬁnd that the third run
achieves the best results in all kinds of pictures. That’s to say, the fusion scheme
usually can reach a better result for combination of diﬀerent methods advantages.
The run for the SIFT descriptors is in the middle place of the three runs and
the one for ﬂip SIFT descriptors gets the worst results which is in contradiction
with our expectation. In fact, we have done comparison experiments on these
two kinds of descriptors by using the species which have more than 30 images in
training database with MAP as the evaluation term. Then the result of ﬂip SIFT
descriptors is better than that of original SIFT descriptors. So we can speculate
that the reasons for this phenomenon may have a relation to the number of
training images in certain species which have very few images and the parameters
of the SVMs which are the same as those of the ﬁrst run for convention and time
limit.

                  Table 1. Results for all runs in plant database.

    Run name      Retrieval type run-type scan pseudo-scan photograph avg score
 ZhaoHFUT run1        Visual     Automatic 0,30      0,24        0,09     0,21
 ZhaoHFUT run2        Visual     Automatic 0,24      0,17        0,09     0,17
 ZhaoHFUT run3        Visual     Automatic 0,32      0,26        0,11     0,23




    Also we can ﬁnd that of the three kinds of images, the results for the scan
images are the best, then those of the pseudo-scan images and the ones of the
photograph images come at last. The diﬀerences between the results of the ﬁrst
two kinds of images may due to optical distortions and shadows in the images
which can not be recognized by dense SIFT descriptors. The reason for the worst
performance in classifying the photograph images is that the model used con-
siders the natural scenes as parts of leaves which does harm to the identiﬁcation
task afterward. Maybe taking segmentation operation at ﬁrst for this kind of
images is a good choice.
    For the limitation of pages, we don’t present it here and you can ﬁnd it on
the homepage of this task [1]. The results can be divided into two parts, namely
fully automatic and others, at ﬁrst. In the ﬁrst two kinds of images, the best of
ours gets the sixth place in automatic ones. While for the last kind of images,
the results of ours fall behind the others and get the last several positions. For
the average classiﬁcation score, we can earn the sixth place in automatic ones.
That’s to say, the results are not too bad for our ﬁrst participation in this task
and the results can be improved a lot if we improve the performance of classifying
the photograph images.


4   Conclusion

This is our ﬁrst try in the plant identiﬁcation task. Although the results are not
quite promising, we still reach some conclusions and gain some experience.
(1)At ﬁrst, as there are some species which have only a few images, the bias for
these species may be emphasized and combing some retrieval methods is a good
choice.
(2)Then the ﬂip SIFT descriptors may not be suitable for this task if we can’t
tune the parameters individually and change it into dense ones in a proper way.
(3)After that, building diﬀerent classiﬁers for diﬀerent types of images may be
helpful and the ScSPM model probably doesn’t meet the requirements of classi-
fying the photograph images and it suggests us selecting some other algorithms
for this type of images.
(4)Also it may not be proper for the prediction lists of SVMs to represent the
results directly and there are some problems in the normalization scheme. And
nonlinear classiﬁers may improve the results.
(5)At last, the provided xml ﬁles are not used in our experiments and maybe
they will provide complementary information for the visual features.
    As this kind of system is quite helpful for researches and agricultural devel-
opment, we will continue to do researches in this ﬁeld. And by caring about the
details and altering the methods used, we hope to further improve our experi-
ment in future tasks.

Acknowledges. This research has been supported by the National Natural
Science Foundation of China (61005007), the National 863 Program of China
(2012AA011005), the China Post-doctoral Science Foundation (20100470825,
201104324) and Cognilego ANR 2010-CORD-013. The authors would like to
thank Jianchao Yang, who provided the SIFT and ScSPM codes for sharing.


References
1. ImageCLEF 2012 Plant identiﬁcation task. http://www.imageclef.org/2012/
   plant
2. Hervé Goeau, Pierre Bonnet, Alexis Joly, Itheri Yahiaoui, Daniel Barthelemy, Nozha
   Boujemaa, Jean-François Molino, The ImageCLEF 2012 plant identiﬁcation task,
   CLEF 2012 working notes, Rome, Italy, 2012.
3. Olshausen B A, Field D J.: Sparse coding with an overcomplete basis set: a strategy
   employed by V1? Vision research. vol.37(23), pp.3311-3325, DOI: 10.1016/S0042-
   6989(97)00169-7 (1997)
4. Yang Jianchao, Yu Kai, Gong Yihong,etc: Linear Spatial Pyramid Matching Using
   Sparse Coding for Image Classiﬁcation. IEEE Conference on Computer Vision and
   Pattern Recognition, 1794-1801(2009)
5. Lowe, David G.: Object recognition from local scale-invariant features. Proceedings
   of the International Conference on Computer Vision, vol 2, pp.1150-1157(1999)
6. lip-vireo package for ﬂip SIFT descriptors. http://www.cs.cityu.edu.hk/~wzhao2/
7. H. Lee, A. Battle, R. Raina, and A. Y. Ng.: Eﬃcient sparse coding algorithms.
   Advances in Neural Information Processing Systems (NIPS), pp. 801-808(2007).
8. S. Lazebnik, C. Schmid, and J. Ponce: Beyond bags of features: Spatial pyramid
   matching for recognizing natural scene categories. In CVPR(2006)
9. lssvm matlab package. http://www.esat.kuleuven.be/sista/lssvmlab/