<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Plant Recognition by Inception Networks with Test-time Class Prior Estimation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Center for Machine Perception</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dept. of Cybernetics</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faculty of Electrical</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Cybernetics, Faculty of Applied Sciences, University of West Bohemia</institution>
          ,
          <addr-line>Pilsen</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Engineering, Czech Technical University in Prague</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The paper describes an automatic system for recognition of 10,000 plant species from one or more images. The system nished 1st in the ExpertLifeCLEF 2018 plant identi cation challenge with 88.4% accuracy and performed better than 5 of the 9 participating plant identi cation experts. The system is based on the Inception-ResNet-v2 and Inception-v4 Convolutional Neural Network (CNN) architectures. Performance improvements were achieved by: adjusting the CNN predictions according to the estimated change of the class prior probabilities, replacing network parameters with their running averages, and test-time data augmentation.</p>
      </abstract>
      <kwd-group>
        <kwd>Plant Recognition</kwd>
        <kwd>Plant Identi cation</kwd>
        <kwd>Computer Vision</kwd>
        <kwd>Convolutional Neural Networks</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Class Prior Estimation</kwd>
        <kwd>Fine-grained</kwd>
        <kwd>Classi cation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The ExpertLifeCLEF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] plant identi cation challenge is organized in
connection with the LifeCLEF 2018 workshop [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] at the Conference and Labs of the
Evaluation Forum. The goal of the challenge is assess the quality of automatic,
machine-learned recognition systems and to compare their accuracy with human
experts in plant sciences. For practical reasons, the experts are evaluated on a
small subset of the test data.
      </p>
      <p>
        The data provided for the challenge cover 10 000 species of plants { herbs,
trees and ferns { and consist from:
{ PlantCLEF 2017 EOL: 256K images from the Encyclopedia of Life3 provided
in the 2017 challenge [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as the "trusted" training set.
3 http://www.eol.org
{ PlantCLEF 2017 web: 1.4M images automatically retrieved by web search
engines, provided in the 2017 challenge [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as the "noisy" training set.
{ PlantCLEF 2017 test set: 25K test images from the 2017 challenge [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], now
available with ground truth label annotations.
{ PlantCLEF 2016 subset: 64K images from the PlantCLEF 2016 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] challenge
training- and test sets, covering only 717 of the 10k species. The remaining
classes from the 2016 challenge do not exactly taxonomically match the
2017/2018 list of species.
{ ExpertLifeCLEF 2018 test set: 6 892 unlabeled images used for evaluation
of the submitted methods. Examples from the set are displayed in Figure 1.
      </p>
      <p>The proposed classi cation system builds upon the state-of-the-art
Convolutional Neural Network (CNN) architectures, described in Section 2.1. Section 2.3
discusses the use of running averages of the trained network parameters instead
of values from the last training step which noticeably increased the accuracy of
our models.</p>
      <p>
        The class frequencies in the training data follow a long-tailed distribution.
It is reasonable to expect that the training data, whose signi cant majority was
downloaded from the web, have di erent class prior probabilities than the test
set. In section 2.4 we consider the problem of di erent class prior probability
distributions and implement an existing method [
        <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
        ] to improve the CNN
predictions by estimating the test-time priors.
      </p>
      <p>Section 3 describes the 5 submissions we made. Results of the challenge are
presented in Section 4. One of the submitted plant recognition methods achieved
the best accuracy among automated systems, and thus placed 1st in the challenge
and it outperformed 5 of 9 human experts.</p>
      <p>Methodology</p>
      <p>Convolutional Neural Networks</p>
      <p>
        The proposed method is based on two architectures { Inception Resnet v2 and
Inception v4 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] { and their ensembles described in Section 3. The
TensorFlowSlim API was used to adjust and ne-tune the networks from the publicly
available ImageNet-pretrained checkpoints4. All networks in our experiments shared
the optimizer settings enumerated in Table 1. Batch size, input resolution and
random crop area range were set di erently for each network listed in Table 2.
The following image pre-processing was used for training:
      </p>
      <p>Random crop, with aspect ratio range (0:75; 1:33) and with di erent area
ranges listed in Table 2,
Random left-right ip,</p>
      <p>Brightness and Saturation distortion.</p>
      <p>At test-time, 14 predictions per image are generated by using 7 crops and their
mirrored versions:
1x Full image,
1x Central crop covering 80% of the original image,
1x Central crop covering 60% of the original image,
4x corner crops covering 60% of the original image.
4 https://github.com/tensorflow/models/tree/master/research/slim#</p>
      <p>Pretrained
Networks #1,..,#6, initialized from the ImageNet pre-trained checkpoints, were
rst trained on PlantCLEF data from previous years (PlantCLEF 2017 EOL +
PlantCLEF 2017 web + PlantCLEF 2016 subset). PlantCLEF 2017 test set was
used for validation.</p>
      <p>Another set of networks, denoted as #1clean,..,#6clean, was ne-tuned from
models #1,..,#6 without using the noisy PlantCLEF 2017 web set. For this
netuning, we also added most of the PlantCLEF 2017 test set, keeping only 1 000
observations (1 403 images) as a min-val set.
Preliminary experiments, using the 2017 test set for validation, showed a
signi cant improvement in accuracy when using running averages of the trained
variables instead of the values from the last training step. Namely we used an
exponential decay with decay rate of 0.999.</p>
      <p>
        In this task where majority of the training data is noisy, we interpret this as
keeping a stable version of the variables, since mini-batches with noisy samples
may produce large gradients pointing outside of the local optima. Another
possible interpretation is that the learning rate was still too high. Unfortunately, we
did not have the computational time to experiment with di erent learning rate
schedules.
In many computer vision tasks, the class prior probabilities are assumed to be
the same for the training data and test data. In ExpertLifeCLEF, however, it is
reasonable to assume that class priors change: The largest part of the training
set comes from the web, where the class frequencies may not correspond with
the test-time priors (dependening on the species incidence, the interest of users,
etc.). The problem of adjusting CNN outputs to the change in class prior
probabilities was discussed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], where it was proposed to recompute the posterior
probabilities (predictions) p(ckjxi) by Equation 1.
      </p>
      <p>pe(ckjxi) = p(ckjxi) ppe(c(ckk)p)pe((xxii)) =</p>
      <p>p(ckjxi) ppe((cckk))
P p(cj jxi) ppe((ccjj)) / p(ckjxi) ppe((cckk)) ;
K
j=1
(1)</p>
      <p>The subscript e denotes probabilities on the evaluation/test set. The
posterior probabilities p(ckjxi) are estimated by the Convolutional Neural Network
outputs, since it was trained with the cross-entropy loss. For class priors p(ck)
we have an empirical observation - the class frequency in the training set. The
evaluation/test set priors pe(ck) are, however, unknown.</p>
      <p>
        We follow the proposition from [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to use an existing EM algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for
estimation of test set priors by maximization of the likelihood of the test
observations. The E and M step are described by Equation 2, where the super-scripts
(s) or (s + 1) denote the step of the EM algorithm.
      </p>
      <p>p(es)(ckjxi) =
p(s+1)(ck) =
e
p(ckjxi) p(es)(ck)</p>
      <p>p(ck)
P p(cj jxi) p(es)(cj )
K
j=1 p(cj )
1 N</p>
      <p>X p(es)(ckjxi);
N i=1
;
(2)</p>
      <p>
        In our submissions, we estimated the class prior probabilities for the whole
test set. However, one may also consider estimating di erent class priors for
di erent locations, based on the GPS-coordinates of the observations. Moreover,
as discussed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], one may use this procedure even in the cases where the new
test samples come sequentially.
3
      </p>
      <p>Submissions
In the challenge, each team was allowed to submit up to 5 di erent run- les with
predictions. We used this opportunity to evaluate the following 5 submissions:
CMP Run 1 is an ensemble of 6 CNNs: #1clean,..,#6clean described in Section
2.2. This submission used the automatic test set class-prior estimation from
the CNN outputs, discussed in Section 2.4.</p>
      <p>CMP Run 2 was predicted by the ensemble from Run 1 without class prior
estimation on the test data.</p>
      <p>CMP Run 3 is an ensemble of 12 CNNs: #1,..,#6 described in Section 2.1
and #1clean,..,#6clean described in Section 2.2. This submission used the
automatic test set class-prior estimation.</p>
      <p>CMP Run 4 is an ensemble of 6 CNNs: #1,..,#6 described in Section 2.1. This
submission used the automatic test set class-prior estimation.</p>
      <p>CMP Run 5 is a single Inception-v4 model, denoted as CNN #4clean, using
the automatic test set class-prior estimation.</p>
      <p>In all runs, the predictions (optionally improved by the class prior estimation)
for all crops of the test image are averaged to compute the nal image prediction.
Moreover, for observations with several images (connected by the ObservationID
values in the provided data), the nal classi cation decision is taken based on
the average of all corresponding image predictions.
The o cial results of the challenge are displayed in Figure 2. Our system achieved
the best results among automatic methods: 88.4% accuracy on the full test set.
The best scoring submission was the largest ensemble - CMP Run 3 - using all
12 models. Results of all CMP submissions are listed in Table 3.</p>
      <p>When evaluated against human experts in plant sciences, the system (both
CMP Run 3 and CMP Run 4) outperformed 5 of 9 tested human experts. That
means that in the task of plant recognition from images, machine learning
systems reached human expert performance - achieving better accuracy than the
median of human experts. The detailed results are displayed in Figure 3.</p>
      <p>Interestingly, while ne-tuning on "clean" data slightly improved the
recognition accuracy on the full test set, it signi cantly decreased the accuracy on
the test subset for human experts. Similarly, test-time prior estimation on the
full test set noticeably improved the accuracy, but had an opposite e ect on
the subset. We assume that the test subset selected for human experts was too
small to provide a representative, identically distributed, sample of the full test
set. Therefore the results on the test subset for human experts may be biased
towards a small number of species contained in it.
5</p>
      <p>Conclusions
The proposed machine-learning system for recognition of 10 000 plant species
achieved an excellent accuracy of 88.4% in the ExpertLifeCLEF 2018 challenge,
scoring 1st among automated systems.</p>
      <p>The ensemble of Convolutional Neural Networks bene ted from the following
improvements:
1. Adjusting the CNN predictions according to the estimated change of the
class prior probabilities.
2. Replacing network parameters by their running averages with exponential
decay.
3. Test-time data augmentation.</p>
      <p>The experiment with human experts shows that machine learning reached the
expert knowledge in plant recognition: our system scored better than an average
(median) human expert in plant recognition, achieving better recognition rate
than 5 of the 9 evaluated human experts.
MS was supported by the CTU student grant SGS17/185/OHK3/3T/13 and by
the Electrolux Student Support Programme. JM was supported by The Czech
Science Foundation Project GACR P103/12/G084 P103/12/G084. LP was
supported by the UWB project No. SGS-2016-039.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Plant identi cation in an open-world (lifeclef 2016)</article-title>
          .
          <source>In: CLEF working notes 2016</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Plant identi cation based on noisy web data: the amazing performance of deep learning (lifeclef</article-title>
          <year>2017</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Overview of expertlifeclef 2018: how far automated identi cation systems are from the best experts ?</article-title>
          <source>In: CLEF working notes 2018</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Goeau, H.,
          <string-name>
            <surname>Botella</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Planque</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vellinga</surname>
            ,
            <given-names>W.P.</given-names>
          </string-name>
          , Muller, H.:
          <article-title>Overview of lifeclef 2018: a large-scale evaluation of species identi cation and recommendation algorithms in the era of ai</article-title>
          .
          <source>In: Proceedings of CLEF</source>
          <year>2018</year>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Saerens</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Latinne</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decaestecker</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Adjusting the outputs of a classi er to new a priori probabilities: a simple procedure</article-title>
          .
          <source>Neural computation 14(1)</source>
          ,
          <volume>21</volume>
          {
          <fpage>41</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Sulc</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matas</surname>
          </string-name>
          , J.:
          <article-title>Improving cnn classi ers by estimating test-time priors</article-title>
          . arXiv preprint arXiv:
          <year>1805</year>
          .
          <volume>08235</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , Io e, S.,
          <string-name>
            <surname>Vanhoucke</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Inception-v4, inception-resnet and the impact of residual connections on learning</article-title>
          .
          <source>CoRR abs/1602</source>
          .07261 (
          <year>2016</year>
          ), http://arxiv. org/abs/1602.07261
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>