<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A comparative study of ne-grained classi cation methods in the context of the LifeCLEF plant identi cation challenge 2015</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Julien Champ</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Titouan Lorieul</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maximilien Servajean</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexis Joly</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Inria ZENITH team</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LIRMM</institution>
          ,
          <addr-line>Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of Inria to the plant identi cation task of the LifeCLEF 2015 challenge. The aim of the task was to produce a list of relevant species for a large set of plant observations related to 1000 species of trees, herbs and ferns living in Western Europe. Each plant observation contained several annotated pictures with organ/view tags: Flower, Leaf, Fruit, Stem, Branch, Entire, Scan (exclusively of leaf). To address this challenge, we experimented two popular families of classi cation techniques, i.e. convolutional neural networks (CNN) on one side and sher vectors-based discriminant models on the other side. Our results show that the CNN approach achieves much better performance than the sher vectors. Beyond, we show that the fusion of both techniques, based on a Bayesian inference using the confusion matrix of each classi er, did not improve the results of the CNN alone.</p>
      </abstract>
      <kwd-group>
        <kwd>LifeCLEF</kwd>
        <kwd>plant</kwd>
        <kwd>leaves</kwd>
        <kwd>leaf</kwd>
        <kwd>ower</kwd>
        <kwd>fruit</kwd>
        <kwd>bark</kwd>
        <kwd>stem</kwd>
        <kwd>branch</kwd>
        <kwd>species</kwd>
        <kwd>retrieval</kwd>
        <kwd>images</kwd>
        <kwd>collection</kwd>
        <kwd>species identi cation</kwd>
        <kwd>citizen-science</kwd>
        <kwd>ne-grained classi cation</kwd>
        <kwd>evaluation</kwd>
        <kwd>benchmark</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Content-based image retrieval and computer vision approaches are considered
as one of the most promising solutions to help bridging the taxonomic gap, as
discussed in [
        <xref ref-type="bibr" rid="ref1 ref17 ref5">5,1,36,34,17</xref>
        ]. We therefore see an increasing interest in this
transdisciplinary challenge in the multimedia community (e.g. in [
        <xref ref-type="bibr" rid="ref10 ref12 ref2 ref20 ref25 ref26">26,10,2,25,20,12</xref>
        ].
Beyond the raw identi cation performances achievable by state-of-the-art
computer vision algorithms, recent visual search paradigms actually o er much more
e cient and interactive ways of browsing large ora than standard eld guides or
online web catalogs ([
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). Smartphone applications relying on such image-based
identi cation services are particularly promising for setting-up massive
ecological monitoring systems, involving thousands of contributors at a very low cost.
A rst step in this way has been achieved by the US consortium behind
LeafSnap3, an i-phone application allowing the identi cation of 184 common american
3 http://leafsnap.com/
plant species based on pictures of cut leaves on an uniform background (see [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]
for more details). Then, the French consortium supporting Pl@ntNet ([
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]) went
one step beyond by building an interactive image-based plant identi cation
application that is continuously enriched by the members of a social network
specialized in botany. Inspired by the principles of citizen sciences and participatory
sensing, this project quickly met a large public with more than 300K downloads
of the mobile applications ([
        <xref ref-type="bibr" rid="ref7 ref8">8,7</xref>
        ]). A related initiative is the plant identi cation
evaluation task organized since 2011 in the context of the international
evaluation forum CLEF4 and that is based on the data collected within Pl@ntNet.
This paper presents the participation of Inria ZENITH team to the 2015-edition
of this challenge [
        <xref ref-type="bibr" rid="ref19 ref9">9,19</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        From a computer vision and technological perspective, our work is more
generally related to image classi cation. Most popular methods for this problem are
typically based on the pooling of local visual features into global image
representations and the use of powerful classi ers in the resulting high-dimensional
embedded space such as linear support vector machines ([
        <xref ref-type="bibr" rid="ref24 ref28">24,28</xref>
        ]). The
Bag-ofword representation (BoW) notably remains a key concept although the raw
initial scheme of ([
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]) is now outperformed by several alternative new schemes
([
        <xref ref-type="bibr" rid="ref14 ref16 ref24 ref27 ref6">24,16,27,6,14</xref>
        ]). Its principle is to rst train a so called visual vocabulary thanks
to an unsupervised clustering algorithm computed on a given training set of
local features. The produced partition is then used to quantize the visual features
of a given new image into visual words that are aggregated within a single
high-dimensional histogram. Partial geometry can be embedded in the image
representation by using the Spatial Pyramid Matching scheme of ([
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]). As it
relies on vector quantization, the BoW representation is however a ected by
quantization errors. Very similar visual features might be split across distinct
clusters whereas more dissimilar ones might be a ected to the same visual word.
This results in both mismatches and potentially irrelevant matches. To alleviate
this problem, several improvements have been proposed in the literature. The
rst one consists in expanding the assignment of a given local feature to its
nearest visual words ([
        <xref ref-type="bibr" rid="ref14 ref16 ref29 ref6">16,29,6,14</xref>
        ]). This allows reducing the number of mismatches
without degrading much the encoding time. Other researchers have investigated
alternative ways to avoid the vector quantization step, using sparse coding ([38])
or locality-constrained linear coding ([37]). Such methods optimize the a
ectation of a given local feature to a small number of visual words thanks to sparsity
or locality constraints on the global representation. Another alternative is to
use aggregation-based models such as the improved Fisher Vector of [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] or the
VLAD encoding scheme ([
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]). Such methods do not only encode the number
of occurrences of each visual word but also encode additional information about
4 http://www.clef-initiative.eu/
the distribution of the descriptors by aggregating the component-wise di
erences. When used with discriminative linear classi ers, such high-dimensional
representations bene t of both generative and discrimination approaches
leading to state-of-the-art classi cation performances on ne-grained classi cation
benchmarks ([
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]).
      </p>
      <p>
        A radically di erent approach to image classi cation is the use of deep
convolutional neural networks. Rather than extracting the features according to
hand-tuned or psycho-vision oriented lters, such methods directly work on the
image signal. The weights learned by the rst convolutional layers allows to
automatically build relevant image lters whereas the intermediate layers are in
charge of pooling these raw responses into high-level visual patterns. The last
fully connected layers work more traditionally as any discriminative classi er
on the image representation resulting from the previous layers. Deep
convolutional neural networks have been recently proved to achieve better results on
large-scale image classi cation datasets such as ImageNet ([
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]) and do attract
more and more interest in the computer and multimedia vision communities. A
known drawback of Deep Convolutional Neural Networks is however that they
require a lot of training data mainly because of the huge number of parameters
to be learned. Their performances on ne-grained classi cation are consequently
more controversial and they are still often outperformed by local features based
approaches, as shown in our experiments. Besides, it is important to notice that
they inspire the investigation of new deep learning models making use of more
traditional visual features embedding methods (e.g. [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]).
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experimented ne-grained image classi cation systems</title>
      <p>
        We did experiment two families of image classi cation techniques that are known
to provide state-of-the-art classi cation performances, in particular in ne-grained
recognition challenges ([
        <xref ref-type="bibr" rid="ref11 ref18">11,18</xref>
        ]).
3.1
      </p>
      <p>
        Convolutional neural networks
Convolutional Neural Networks (CNN) have been mainly used since the 90's for
their performances in digit classi cation. But since a few years, they appear to
have now surpassed all state of the art methods for large-scale image classi
cation [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. In this experimentation, we have used Ca e [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], a Deep Learning
Framework, allowing us to use CNN architectures and models from the literature.
We have chosen in the Ca e model Zoo the "GoogLeNet GPU implementation"
model, based on Google winning architecture in the ImageNet 2014 Challenge
[35], and we ne-tuned this model on the LifeCLEF datasets.
      </p>
      <p>The GoogLeNet architecture consists of a 22 layers deep network with a
softmax loss as the top classi er. It is composed of three "inception modules"
stacked on top of each other. Each intermediate inception module is connected
to an auxiliary classi er during training, so as to encourage discrimination in the
lower stages of the classi er, increase the gradient signal that gets propagated
back, and provide additional regularization. These auxiliary classi ers are only
used during the training part, and then discarded.</p>
      <p>Experiments Setup The previously described GoogLeNet CNN uses square
images as input. For each image in the training and test sets, we therefore
cropped the largest square in the center, and re-sized it to 256x256 pixels.
Instead of starting to train our CNN from scratch only on plant images, and as
it was authorized in this year's challenge, we started with a CNN trained on
the popular generalist ImageNet dataset. We only removed its top layers (the
fully connected ones), changed the number of outputs, and trained this new
model using the desired dataset. As it was implemented within Ca e library, it
makes also use of a simple data augmentation technique, consisting in cropping
randomly a 224x224 pixels image, and eventually mirroring it horizontally.</p>
      <p>During our preliminary experiments, we have tried several training strategies
that are presented are presented in Table 1.</p>
      <p>We have tested all these con gurations using the PlantCLEF 2014 data and
groundtruth (500 species, 47815 train images and 13146 test images). CNN1
con guration was the simplest and the rst that we have tested, but nally also
the one providing the best results. The Data Augmentation method proposed
for CNN2 con guration increased signi cantly the number of train images as we
generated 8 new images by applying rotations, and a set of colorimetric
transformations with randomized parameters, i.e. brightness &amp; saturation modulation
in the HSL color space (multiplier factor randomized between 0:8 and 1:2), and
contrast modulation (multiplier factor randomized between 0:7 and 1:3). Even
with additional iterations to train the CNN, results remained nearly the same
than those for CNN1. The CNN3 con guration consisted in training several
CNNs, one for each view type (thanks to the tags provided in the meta-data).
On one hand, as some species haven't images for all views, the number of output
for each CNN is lower than 1000 and that could help to obtain better results
because of the reduction of the confusion risk. On the other hand, some images
from a given view (Branch for example) can really help to identify some images
tagged with another view (Entire for example). Results were slightly lower for
the Branch, Entire, Leaf, Fruit, and Flower views than what was obtained with
the standalone CNN. This could be explained by a less important number of
images to train the network, and proves that images from a given view can help
when identifying an image tagged with another view. This conclusion is not true
for the Stem and LeafScan views. The reason is probably that the LeafScan view
is speci c, very di erent from other views, and does not contain background
information, and as the Stem tag identi es a closeup view of the plant which is
not really apparent on other images.</p>
      <p>Training parameters As a reminder, here are the most important parameters
for Ca e to obtain our submitted run (CNN1). The base learning rate parameter
was set to 10 5. The learning rate is divided by 10 every 60k iterations. After
150k iterations the training is over, and the batch size was xed to 32. All other
parameters were unchanged.
3.2</p>
      <p>
        Fisher vectors &amp; Logistic Regression
Fisher vectors (FV) were rst introduced in image classi cation by [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] and
proved to be very e cient in ne-grained classi cation tasks later on ([
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]).
According to recent surveys such as [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], it is the best performing pooling
strategy currently available. We will only recall here the main steps used to extract
Fisher vectors, for detailed explanations of the theoretical derivation and for
performance analysis we redirect the readers to [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]. The pipeline for computing
the Fisher vector describing an image consists in:
1. Dense extraction of local features : descriptors, often SIFT descriptors, are
extracted on densely sampled overlapping patches at several scales.
2. PCA transformation: the descriptors are then de-correlated and compressed
using a Principal Component Analysis.
3. Feature space density estimation : the distribution of features is modeled
as a Gaussian Mixture Model (GMM) that is learned using the popular
Expectation-Maximisation (EM) algorithm. We thus obtain a probability
distribution of the form of u(x) = PK
      </p>
      <p>k=1 wkuk(x) where uk follows a
Gaussian distribution of mean k and covariance matrix k, uk N ( k; k),
with k being diagonal because the features are decorrelated, and where wk
is the weight of the k-th Gaussian, these weights satisfy Pk wk = 1.
4. Encoding and pooling : the features are encoded and pooled using
1 XN k(xi) xi
G k = pwk i=1</p>
      <p>1 XN
G k = pwk i=1
k(xi)
p2
xi
k</p>
      <p>k
k
k 2
1
where all the divisions and squaring are element-wise operations and where
k(x) = PkK0=w1kwukk(0xu)k0 (x) . Theses 2K vectors are concatenated to produce the
nal representation of dimension 2dK.
5. Post-processing : the vectors are L2-normalized and element-wise
squarerooted using x 7! sign(x):pjxj.</p>
      <p>
        Usually, the classi cation of Fisher Vectors is performed using a linear
classi er as it has been shown that using kernelization techniques on such
highdimensional spaces does not improve signi cantly the performances. In our
experiments, we used the Logistic Regression classi er implemented within the
LibLinear library ([
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). This method was preferred over Support Vectors
Machine because it directly outputs probabilities which then can be used for fusion
purposes.
      </p>
      <p>
        Here, we used two types of Fisher Vectors with two di erent types of
descriptors. The rst system was built with RootSIFT descriptors, l2-normalized
and square-rooted SIFT descriptors, of 128 dimensions which are then reduced
to 80 dimensions through PCA. The second one was based on some
complementary descriptors used in the Pl@ntNet application [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. It consists in the
concatenation of several basic descriptors such as Fourier histograms, Edge
Orientation Histograms (EOH), HSV histograms and Hough transform histograms.
This concatenation was then compressed and de-correlated using PCA. The
association of descriptors used depends on the organ, for Branch, Entire, Leaf,
LeafScan, Stem only Fourier, EOH and Hough histograms are used resulting
in 44-dimension nal descriptors compressed to 14 dimensions after PCA while
Flower and Fruit add HSV histograms giving descriptors of dimension 74
reduced to 38 after compression. In both systems, the GMM used to estimate the
probability distribution of the features learns a codebook of 128 words.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Fusion methods</title>
      <p>Combining multiple classi ers or even multiple results (i.e. several images of a
single observation) from a single classi er is a way to increase the classi cation
quality. This section presents three main approaches we used to merge the various
results from our classi ers.
4.1</p>
      <p>Max and Borda
Maximum and Borda Count are two approaches used to merge top-k lists. While
the maximum relies on the score of each class with the lists, Borda Count uses
their rank.</p>
      <p>More precisely, the maximum based approach associates to each class the
maximum score it reaches among the di erent lists. In the Borda Count
approach, we have associated each class within a list to a score decreasing while
the rank increases. In more details, since we only retrieve the top-K most likely
classes, the score of a given species s is computed as follows:
score(s) = X K</p>
      <p>
        rc(s)
c2C
(1)
where rc(s) is the ranking of species s returned by the classi er c.
Framework presentation This fusion method is inspired by what is done
in crowdsourcing multi-labeled classi cation tasks [
        <xref ref-type="bibr" rid="ref21 ref32">21,32</xref>
        ]. For this purpose we
used the Bayesian inference framework described in Figure 1.
      </p>
      <p>! ( k )</p>
      <p>#
"(k)</p>
      <p>t i
ci(k)
k = 1, …, K</p>
      <p>i = 1, …, N</p>
      <p>In such inference framework, we are given a set of classi ers k 2 1; :::; K and
a confusion matrix (k) is assigned to each one of them. Such matrix enables to
(k)
evaluate the classi cation quality of each classi er. In a more precise way, i;j
refers to the probability that the classi er k, given an image, will answer class
j while the right class is i. The set of all confusion matrices is noted . Notice
that, as presented in Figure 1, the confusion matrix (k) is directly derived from
the parameters matrix (k). The set of all parameters matrices is noted A. In
parallel, each observation (i.e. set of images corresponding to a single plant) is
associated to a distribution probability, noted ti for the ith observation. This
probability depends on the proportion of each species in the database, and we
note the vector referring to this proportion. Finally, based on the probabilities
ti and on the confusion matrix of a given classi er k, we can infer the probability
of the classi er's answer for the ith observation, noted ci(k).</p>
      <p>Therefore, the joint probability of this Bayesian framework follows
Equation 2.</p>
      <p>N
p( ; t; cjA; ) = Y</p>
      <p>f ti
i=1</p>
      <p>K
Y
k=1
(k)
ti;ci(k) gp( jA)
(2)
Once the classi ers answers (i.e. the set of answers c(k) for all k and i) are
i
known, the probabilities of A; ; and t can be updated, thus inferring the
correct class of each observation (i.e. the one with the highest probability in
ti). In the following, we suppose known thanks to the very large size of the
training set.</p>
      <p>
        Addressing the large dimensionality Generally, in the state of the art
solutions, several approaches are proposed to compute the posterior probabilities
such as Gibbs sampling [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] or Variational Bayes [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. In our experiments we had
to face the very large dimension of the problem: each confusion matrix being of
size 1000 1000. Classical method are therefore intractable in our context. To
address this challenge, we used a single-shot approach: only p(ti = jjrest) is
computed and used to update A and { recall that is known and does not
need to be updated. Thus, the confusion matrix of each classi er evolves while
the number of identi cations increases and the quality of inference is re ned
more and more.
      </p>
      <p>Experiments Setup In this subsection, we present three aspects of the setup:
parameters initialization, parameters re nement and classi er's confusion re
nement.</p>
      <p>An important part of the fusion is to learn the confusion matrix (and its
parameters). To do so, we have initialized each parameters matrix A with a
value of S in the diagonal and S=(dimension 1) in the other cells, meaning
that there is a 50% probability that the classi er will be correct and that given
the correct class and a wrong one, it is more likely that the classi er will return
the correct one. In our experiments the value of S has been xed to 5 (best
choice among several runs).</p>
      <p>Then, we tried to enhance the confusion matrix quality based on the training
data. For each image of the set, we asked the classi ers to re-propose a top-30
classi cation, and, given the correct class i, we have added in each cell ai;j of
the matrices A a value inversely proportional to the species rank in the top-30:
ra1nk .</p>
      <p>Finally, to be as ne-grained as possible, each classi er was associated to
several confusion matrices corresponding to each plants organs. Thus, the system
knows the confusion of each classi er for all possible organs. In a way, we consider
each couple forgan; classif ierg as a single classi er.
5
5.1</p>
      <p>O</p>
      <p>cial Results</p>
      <p>Runs details
3 runs were nally submitted to the LifeCLEF 2015 plant challenge:
{ INRIA Zenith Run 1 is based on the results provided by the single
Convolutionnal Neural Network netuned using all provided data (CNN1), and
described in 3.1. Observations composed of several images, are combined
using a Max function to provide Observation Results.
{ INRIA Zenith Run 2 is based on Fisher Vectors described in 3.2. To obtain</p>
      <p>Observation Results we used the Borda Count Algorithm.
{ INRIA Zenith Run 3 is the combination of the results obtained by previous
methods (CNN and Fisher Vectors) using the Bayesian inference method
described in 4.2.</p>
      <p>If we compare the best runs of each team, the INRIA Zenith Run 1, the one
using CNN, is ranked 3rd regarding to observation results. We can note that all
the 4 best teams used Deep Neural Networks. Our second run, INRIA Zenith
Run 2, the one using Fisher Vectors, is disappointingly distanced by the CNN
runs: its nal score is two times lower (0.3 instead of 0.609 for INRIA Zenith Run
1 ). In LifeCLEF 2014, the best performances were obtained by Fisher Vectors,
but the use of external training data was not allowed which explains why CNN
were not performing better.</p>
      <p>Our nal run, INRIA Zenith Run 3, is the Bayesian inference fusion method
using previous runs. It was made in order to bene t from both technologies.
Unfortunately, the results obtained are a little bit lower than the standalone
CNN of INRIA Zenith Run 1 (0.592 instead of 0.609). Two main reasons can
be highlighted to explain this quality loss. First, the two classi ers are not
necessarily independent, thus, there combination does not enable to obtain quality
gain. Second, building a confusion matrix for such high dimension problems (i.e.
1000 1000) is very challenging and the size of the test set is not enough to
learn an accurate confusion.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Inria Zenith team submitted 3 runs, using di erent strategies. The rst run was
based on the well-known GoogLeNet CNN architecture, netuned over Imagenet
dataset, and using a max method to fuse image results to observation results.
Our second run did not used external data, and was based on sher vectors which
was last year winning technology. The conclusion is that Deep Neural Networks
outperforms sher vectors for such classi cation tasks, particularly with an
important number of classes, and when you have large training datasets. Our last
run consisted in trying a new fusion method, based on Bayesian inference, to
merge results of the two previous runs. However results were not as good as
expected, probably because the rst run is already two times better than the
second one.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Appendix: Complementary Results</title>
      <p>34. Spampinato, C., Mezaris, V., van Ossenbruggen, J.: Multimedia analysis for
ecological data. In: Proceedings of the 20th ACM international conference on Multimedia.
pp. 1507{1508. ACM (2012)
35. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,
Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. CoRR abs/1409.4842
(2014), http://arxiv.org/abs/1409.4842
36. Trifa, V.M., Kirschel, A.N.G., Taylor, C.E., Vallejo, E.E.: Automated species
recognition of antbirds in a Mexican rainforest using hidden Markov models. Journal of
The Acoustical Society of America 123 (2008)
37. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained
linear coding for image classi cation. In: Computer Vision and Pattern Recognition
(CVPR), 2010 IEEE Conference on. pp. 3360{3367. IEEE (2010)
38. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using
sparse coding for image classi cation. In: Computer Vision and Pattern
Recognition, 2009. CVPR 2009. IEEE Conference on. pp. 1794{1801. IEEE (2009)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pham</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roe</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Zhang, J.:
          <article-title>Sensor network for the monitoring of ecosystem: Bird species recognition</article-title>
          .
          <source>In: Intelligent Sensors, Sensor Networks and Information</source>
          ,
          <year>2007</year>
          .
          <source>ISSNIP</source>
          <year>2007</year>
          . 3rd International Conference on. pp.
          <volume>293</volume>
          {
          <issue>298</issue>
          (Dec
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cerutti</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tougne</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vacavant</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coquin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>A Parametric Active Polygon for Leaf Segmentation and Shape Estimation</article-title>
          .
          <source>In: 7th International Symposium on Visual Computing</source>
          . p.
          <fpage>1</fpage>
          .
          <string-name>
            <given-names>Las</given-names>
            <surname>Vegas</surname>
          </string-name>
          , United
          <string-name>
            <surname>States</surname>
          </string-name>
          (
          <year>Sep 2011</year>
          ), https://hal. archives-ouvertes.fr/hal-00622269
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ellison</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farnsworth</surname>
            ,
            <given-names>E.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kress</surname>
            ,
            <given-names>W.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neill</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Best</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pickering</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stevenson</surname>
            ,
            <given-names>R.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courtney</surname>
            ,
            <given-names>G.W.</given-names>
          </string-name>
          , VanDyk,
          <string-name>
            <surname>J.K.</surname>
          </string-name>
          :
          <article-title>Next-generation eld guides (</article-title>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>K.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsieh</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>Liblinear: A library for large linear classi cation</article-title>
          .
          <source>The Journal of Machine Learning Research 9</source>
          ,
          <year>1871</year>
          {
          <year>1874</year>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gaston</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.J.</surname>
            ,
            <given-names>O</given-names>
          </string-name>
          <string-name>
            <surname>'Neill</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <source>Automated species identi cation: why not? Philosophical Transactions of the Royal Society of London B: Biological Sciences</source>
          <volume>359</volume>
          (
          <issue>1444</issue>
          ),
          <volume>655</volume>
          {
          <fpage>667</fpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. van Gemert,
          <string-name>
            <given-names>J.C.</given-names>
            ,
            <surname>Veenman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.J.</given-names>
            ,
            <surname>Smeulders</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.W.</given-names>
            ,
            <surname>Geusebroek</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.M.</surname>
          </string-name>
          :
          <article-title>Visual word ambiguity</article-title>
          .
          <source>Pattern Analysis and Machine Intelligence</source>
          ,
          <source>IEEE Transactions on 32(7)</source>
          ,
          <volume>1271</volume>
          {
          <fpage>1283</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>A ouard, A</article-title>
          .,
          <string-name>
            <surname>Bakic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barbe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dufour</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Selmi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yahiaoui</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vignau</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , et al.:
          <article-title>Pl@ ntnet mobile 2014: Android port and new features</article-title>
          .
          <source>In: Proceedings of International Conference on Multimedia Retrieval</source>
          . p.
          <fpage>527</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bakic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barbe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yahiaoui</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Selmi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carre</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barthelemy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boujemaa</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , et al.:
          <article-title>Plantnet mobile app</article-title>
          .
          <source>In: Proceedings of the 21st ACM international conference on Multimedia</source>
          . pp.
          <volume>423</volume>
          {
          <fpage>424</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Goeau, H.,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Lifeclef plant identi cation task 2015</article-title>
          . In: CLEF working notes
          <year>2015</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. Goeau, H.,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Selmi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mouysset</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joyeux</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Visual-based plant species identi cation from crowdsourced data</article-title>
          .
          <source>In: MM'11 - ACM Multimedia</source>
          <year>2011</year>
          . pp.
          <volume>0</volume>
          {
          <issue>0</issue>
          . ACM, Scottsdale, United
          <string-name>
            <surname>States</surname>
          </string-name>
          (
          <year>Nov 2011</year>
          ), https://hal.inria. fr/hal-00642236
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Gosselin</surname>
            ,
            <given-names>P.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murray</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jegou</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perronnin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Revisiting the sher vector for ne-grained classi cation</article-title>
          .
          <source>Pattern Recognition Letters</source>
          <volume>49</volume>
          ,
          <issue>92</issue>
          {
          <fpage>98</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>T.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>L.H.:</given-names>
          </string-name>
          <article-title>An interactive ower image recognition system</article-title>
          .
          <source>Multimedia Tools Appl</source>
          .
          <volume>53</volume>
          (
          <issue>1</issue>
          ),
          <volume>53</volume>
          {73 (May
          <year>2011</year>
          ), http://dx.doi.org/10.1007/ s11042-010-0490-6
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Feature coding in image classi cation: A comprehensive study</article-title>
          .
          <source>Pattern Analysis and Machine Intelligence</source>
          ,
          <source>IEEE Transactions on 36(3)</source>
          ,
          <volume>493</volume>
          {
          <fpage>506</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Jegou</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perronnin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Douze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanchez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmid</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Aggregating local image descriptors into compact codes</article-title>
          .
          <source>Pattern Analysis and Machine Intelligence</source>
          ,
          <source>IEEE Transactions on 34(9)</source>
          ,
          <volume>1704</volume>
          {
          <fpage>1716</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shelhamer</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donahue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karayev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Long</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guadarrama</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
          </string-name>
          , T.:
          <article-title>Ca e: Convolutional architecture for fast feature embedding</article-title>
          .
          <source>arXiv preprint arXiv:1408.5093</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>Y.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngo</surname>
            ,
            <given-names>C.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Towards optimal bag-of-features for object categorization and semantic video retrieval</article-title>
          .
          <source>In: Proceedings of the 6th ACM international conference on Image and video retrieval</source>
          . pp.
          <volume>494</volume>
          {
          <fpage>501</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bakic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barbe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Selmi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yahiaoui</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carre</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mouysset</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molino</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          , et al.:
          <article-title>Interactive plant identi cation based on social image data</article-title>
          .
          <source>Ecological Informatics</source>
          <volume>23</volume>
          ,
          <issue>22</issue>
          {
          <fpage>34</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Goeau, H.,
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spampinato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vellinga</surname>
            ,
            <given-names>W.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Planque</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rauber</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Fisher, R., Muller, H.:
          <article-title>Lifeclef 2014: multimedia life species identi cation challenges</article-title>
          .
          <source>In: Information Access Evaluation</source>
          . Multilinguality, Multimodality, and Interaction, pp.
          <volume>229</volume>
          {
          <fpage>249</fpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Muller, H., Goeau, H.,
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spampinato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rauber</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vellinga</surname>
            ,
            <given-names>W.P.</given-names>
          </string-name>
          , Fisher,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>Lifeclef 2015: multimedia life species identi cation challenges</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Kebapci</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yanikoglu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unal</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Plant image retrieval using color, shape and texture features</article-title>
          .
          <source>Comput. J</source>
          .
          <volume>54</volume>
          (
          <issue>9</issue>
          ),
          <volume>1475</volume>
          {1490 (Sep
          <year>2011</year>
          ), http://dx.doi.org/ 10.1093/comjnl/bxq037
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>H.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghahramani</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Bayesian classi er combination</article-title>
          .
          <source>In: International conference on arti cial intelligence and statistics</source>
          . pp.
          <volume>619</volume>
          {
          <issue>627</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.E.:
          <article-title>Imagenet classi cation with deep convolutional neural networks</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>1097</volume>
          {
          <issue>1105</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belhumeur</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biswas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacobs</surname>
            ,
            <given-names>D.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kress</surname>
            ,
            <given-names>W.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>I.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soares</surname>
            ,
            <given-names>J.V.</given-names>
          </string-name>
          :
          <article-title>Leafsnap: A computer vision system for automatic plant species identi cation</article-title>
          .
          <source>In: Computer Vision{ECCV</source>
          <year>2012</year>
          , pp.
          <volume>502</volume>
          {
          <fpage>516</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Lazebnik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmid</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponce</surname>
          </string-name>
          , J.:
          <article-title>Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories</article-title>
          .
          <source>In: Computer Vision and Pattern Recognition</source>
          ,
          <source>2006 IEEE Computer Society Conference on. vol. 2</source>
          , pp.
          <volume>2169</volume>
          {
          <fpage>2178</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Mouine</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yahiaoui</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verroust-Blondet</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Advanced shape context for plant species identi cation using leaf image retrieval</article-title>
          . In: Ip,
          <string-name>
            <given-names>H.H.S.</given-names>
            ,
            <surname>Rui</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y</surname>
          </string-name>
          . (eds.) ICMR '
          <fpage>12</fpage>
          -
          <lpage>2nd</lpage>
          ACM International Conference on Multimedia Retrieval. ACM,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          ,
          <source>China (Jun</source>
          <year>2012</year>
          ), https://hal.inria.fr/hal-00726785
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Nilsback</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Automated ower classi cation over a large number of classes</article-title>
          .
          <source>In: Computer Vision</source>
          ,
          <source>Graphics Image Processing</source>
          ,
          <year>2008</year>
          . ICVGIP '
          <volume>08</volume>
          . Sixth Indian Conference on. pp.
          <volume>722</volume>
          {
          <issue>729</issue>
          (Dec
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Perronnin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dance</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Fisher kernels on visual vocabularies for image categorization</article-title>
          .
          <source>In: Computer Vision and Pattern Recognition</source>
          ,
          <year>2007</year>
          . CVPR'07. IEEE Conference on. pp.
          <volume>1</volume>
          {
          <issue>8</issue>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Perronnin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanchez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mensink</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Improving the sher kernel for largescale image classi cation</article-title>
          .
          <source>In: Computer Vision{ECCV</source>
          <year>2010</year>
          , pp.
          <volume>143</volume>
          {
          <fpage>156</fpage>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Philbin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chum</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sivic</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Lost in quantization: Improving particular object retrieval in large scale image databases</article-title>
          .
          <source>In: Computer Vision and Pattern Recognition</source>
          ,
          <year>2008</year>
          .
          <article-title>CVPR 2008</article-title>
          . IEEE Conference on. pp.
          <volume>1</volume>
          {
          <issue>8</issue>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Sanchez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perronnin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mensink</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verbeek</surname>
          </string-name>
          , J.:
          <article-title>Image classi cation with the sher vector: Theory and practice</article-title>
          .
          <source>International journal of computer vision 105(3)</source>
          ,
          <volume>222</volume>
          {
          <fpage>245</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vedaldi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Deep Fisher networks for large-scale image classi cation</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Simpson</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Psorakis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Dynamic Bayesian Combination of Multiple Imperfect Classiers</article-title>
          . In:
          <article-title>Decision Making with Imperfect Decision Makers Springer (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Sivic</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Video google: A text retrieval approach to object matching in videos</article-title>
          .
          <source>In: Computer Vision</source>
          ,
          <year>2003</year>
          . Proceedings. Ninth IEEE International Conference on. pp.
          <volume>1470</volume>
          {
          <fpage>1477</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>