<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning with Noisy and Trusted Labels for Fine-Grained Plant Recognition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Milan Sulc</string-name>
          <email>sulcmila@cmp.felk.cvut.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jir Matas</string-name>
          <email>matas@cmp.felk.cvut.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Machine Perception, Dept. of Cybernetics, Faculty of Electrical Eng., Czech Technical University in Prague</institution>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>EOL fine tuning (Accuracy) EOL fine tuning</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <abstract>
        <p>The paper describes the deep learning approach to automatic visual recognition of 10 000 plant species submitted to the PlantCLEF 2017 challenge. We evaluate modi cations and extensions of the state-ofthe-art Inception-ResNet-v2 CNN architecture, including maxout, bootstrapping for training with noisy labels, and ltering the data with noisy labels using a classi er pre-trained on the trusted dataset. The nal pipeline consists of a set of CNNs trained with di erent modi cations on di erent subsets of the provided training data. With the proposed approach, we were ranked as the third best team in the LifeCLEF 2017 challenge.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The plant identi cation challenge PlantCLEF 2017 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a part of the LifeCLEF
activity [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] organized within CLEF 2017 { The Conference and Labs of the
Evaluation Forum. The task of the challenge is automatic plant identi cation
using computer vision. A similar task has been the subject of previous challenges
[
        <xref ref-type="bibr" rid="ref3 ref4">3,4</xref>
        ], yet PlantCLEF 2017 aims at a signi cantly larger scale: recognizing plants
from 10 000 species.
      </p>
      <p>Two sets of training data, with di erent properties and sources but both
covering the same 10 000 plant species, were provided by the organizers:
1. A set based on the online collaborative Encyclopedia Of Life (EoL)
containing 256 287 images and corresponding xml les with meta-information. An
important eld in the meta-information is the "Observation ID", which is an
identi er connecting images of the same specimen (object of observation).
This dataset is considered \trusted", i.e. the ground truth labels should allbe
assigned correctly.
2. A noisy training set built using web crawlers, or more precisely, obtained by
google and bing image search. It thus contains images not related to the given
plant species. This set is provided in the form of a list of more than 1442k
image URLs. We obtained nearly 1405k images from the list, the remaining
images failed to download.</p>
      <p>The evaluation is performed on a test set containing 25 170 images of 13 471
observations (specimen).</p>
      <p>The rest of the paper is structured as follows: the deep learning approach and
all proposed modi cations are described in Section 2. Preliminary experiments
are described and their evaluation is discussed in Section 3. Post-processing steps
are described in Section 4. The run les submitted to PlantCLEF are listed in 5.
Conclusions are drawn in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Proposed Methods</title>
      <p>
        In recent years, Deep Convolutional Neural Networks (CNNs) have become the
core of state-of-the-art solutions of many computer vision tasks, especially those
related to recognition and detection of objects. This is also the case for plant
recognition, where in previous PlantCLEF challenges 2015 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and 2016 [
        <xref ref-type="bibr" rid="ref3 ref5">5,3</xref>
        ]
the deep learning submissions [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref6 ref7 ref8 ref9">6,7,8,9,10,11,12</xref>
        ] outperformed combinations of
hand-crafted methods signi cantly.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Inception-ResNet-v2</title>
        <p>The submitted model is based on the state-of-the-art convolutional neural
network architecture, the Inception-ResNet-v2 model [13] which introduced residual
Inception modules, i.e.inception modules with residual connections. Both the
paper [13] and our preliminary experiments show that this network architecture
leads to superior results compared with other state-of-the-art CNN
architectures. The publicly available1 Tensor ow model pretrained on ImageNet was
used for initial values of network parameters. The main hyperparameters were
set as follows:</p>
      </sec>
      <sec id="sec-2-2">
        <title>Optimizer</title>
      </sec>
      <sec id="sec-2-3">
        <title>Weight decay</title>
      </sec>
      <sec id="sec-2-4">
        <title>Learning rate</title>
      </sec>
      <sec id="sec-2-5">
        <title>Batch size</title>
        <p>RMSProp with momentum 0.9 and decay 0.9.
0.00004.</p>
        <p>Starting LR 0.01, decay factor 0.94,
exponential decay, ending LR 0.0001.
32.
2.2</p>
      </sec>
      <sec id="sec-2-6">
        <title>MaxOut</title>
        <p>We experimented with adding maxout to the end of the network, which was
helpful in our submission to PlantCLEF 2016: an additional fully-connected
(FC) layer was added on top of the network, before the classi cation FC layer.
The activation function in the added layer is maxout [14], maximum over slices
of the layer:
hi(x) = max zij ;
j2[1;k]
(1)
1 https://github.com/tensorflow/models/blob/master/slim/README.md#
pre-trained-models
where zij = xTW::ij + bij is a standard FC layer with parameters W 2
Rd m k, b 2 m k.</p>
        <p>One can understand maxout as a piecewise linear approximation to a convex
function, speci ed by the weights of the previous layer. This is illustrated in
Figure 1.</p>
        <p>We added a FC layer with 4096 units. The maxout activation operates over
k = 4 linear pieces of the FC layer, i.e. m = 1024. Dropout with a keep
probability of 80% is applied before the FC layers. The nal layer is a 10000-way softmax
classi er corresponding to the number of plant species needed to be recognized.</p>
        <p>We observed is that the additional FC layer has to be batch normalized
[15]. Without normalization, the architecture becomes unstable with the default
setting of hyperparameters, leading to unexpected drop in accuracy.
In order to improve learning from noisy labels, Reed et. al. [16] proposed a simple
consistency objective, which does not require an explicit information about the
noise distribution.</p>
        <p>Intuitively, the new objective(s) takes into account the current predictions of
the network, lowering the damage done by incorrect labels. Reed al. propose two
variants of the objective, denoted as Bootstrapping for consistency in multi-class
prediction:
{ soft bootstrapping uses the probabilities qk estimated by the network
(softmax):</p>
        <p>Lsoft(q; t) =</p>
        <p>N
X [ tk + (1
k=1
)qk] log qk
(2)
Reed et al. [16] point out that the objective is equivalent to softmax
regression with minimum entropy regularization, which was previously studied in
[17]; encouraging high con dence in predicting labels.
{ hard bootstrapping uses the strongest prediction zk =
(1 if k = argmaxqi</p>
        <p>0 otherwise
)zk] log qk
(3)
Lhard(q; t) =</p>
        <p>N
X [ tk + (1
k=1</p>
        <p>The experiments of [16] show that the two objectives improve learning in the
case of label noise, achieving the best accuracy with hard bootstrapping. We
decided to follow the result of [16] and use hard booststrapping with = 0:8
in our experiments. The search for the optimal value of was ommited for
computational reasons and limited time for the competition, yet the dependence
between the amount of label noise and the optimal setting of hyperparameter
is an interesting topic of future work.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>We used a subset of the test data from the previous year's PlantCLEF 2016
challenge to thoroughly evaluate the proposed methods. We only used 2583 images
from the previous year dataset, for which we found species-correspondences in
the 2017 task. This small validation set covers only a small subset of the classes,
but should be su cient for an approximate evaluation of the method.</p>
      <p>The sections below describe the experiments and corresponding design choices:
3.1</p>
      <sec id="sec-3-1">
        <title>Fine-tuning vs. Training from Scratch</title>
        <p>The rst issue tested was whether the network should be trained from scratch,
or ne-tuned from an ImageNet-pretrained model. We compared the two
scenarios by training only on the "trusted" dataset. As illustrated in Figure 2,
training from scratch converges very slow. After 150k iterations (mini-batches)
ne-tuning leads to 65.1% accuracy, while training from scratch only gets to
44.5%. For illustration 150k training iterations take ca. 65 hours on an NVIDIA
Titan X GPU. Therefore we decided for ne-tuning.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Training on Trusted and Noisy Data</title>
        <p>We ne-tuned the system with di erent settings described in Section 2 on the
"trusted" (EOL) data only, as well as on the combination of both "trusted"
and "noisy" data (EOL+WEB). The soft- and hard- bootstrapping were used
for training with "noisy" data. Figure 3 shows that after 200k iterations, the
networks trained only on the "trusted" data performed slightly better. The two
best performing networks trained on the "trusted" (EOL) dataset will be used
in the follow-up experiments.
0.8
0.7
20000
30000
40000
50000
60000
70000
80000
In order to lter out wrongly labeled examples from the "noisy" part of the
training set, we used the network pretrained on the "trusted set" (from Section
3.2) to predict the labels from images. Only images, where the network prediction
was equal to the label were kept in the " ltered noisy" dataset. This reduced
the size of the "noisy" set from ca 1405k images to ca 425k images.</p>
        <p>Let us denote the two networks ne-tuned on the "trusted" (EOL) dataset
in Section 3.2 as follows:
{ Net #1: Fine-tuned on "trusted" (EOL) set without maxout for 200k
iterations.
{ Net #2: Fine-tuned on "trusted" (EOL) set with maxout for 200k
iterations.</p>
        <p>
          Further ne-tuning was performed from these models pre-trained ( ne-tuned)
on the "trusted" set. In order to perform bagging from several networks, we
divide the data into 3 disjoint folds. Then each setting is used to further
netune three networks, each on di erent 2 of the 3 folds. Each network is further
ne-tuned for 50k iterations.
{ Net #3,#4,#5: Fine-tuned from #1 for 50k iterations on the "trusted"
dataset.
{ Net #6,#7,#8: Fine-tuned from #2 for 50k iterations on the "trusted"
dataset, with maxout.
0.8
0.6
0.4
As shown by the previous year's challenge winner [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and con rmed by the
experiments described in this report, averaging the predictions over images of
the same observation (specimen) increases accuracy signi cantly. Therefore we
also average scores per observations in all submitted run les.
20000
30000
40000
50000
0.8
0.6
0.4
0.2
Given the fact that we are evaluating the whole test set of images, we decided
to experiment with adjusting the prediction distribution over the test set. Some
plant species are certainly much rarer to observe than other. We assumed that
the species in the test set might not follow the same distribution as the species
in the training set. We computed the prior p(K) for each class K among the
observations in the "trusted" dataset, and estimated the prior pt(K) of on the
test set. Let q(KjX) be the prediction con dence for class K, given input image
X. The nal prediction taking into account the possible shift in the distributions
was:
q (KjX) = q(KjX)
s p(K)
pt(K)
;
(4)
where the square root is used to make the adjustment less severe.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Description of the Submitted Run les</title>
      <p>In PlantCLEF 2017, each participant is allowed to submit up to four run les
with the results. We submitted the following run les:
{ CMP Run 1 combines all 17 networks by summimg their results.
{ CMP Run 2 uses the prediction distribution adjustment from Section 4.2 on
top of the results from the rst run le.
{ CMP Run 3 combines only networks trained on the "trusted" data.
{ CMP Run 4 again adds the prediction distribution adjustment on top of
results from the third run le.
The di culties of the challenge lie in the high number of classes, high intra-class
variations, small inter-class variations, and learning from noisy data downloaded
by web crawlers.</p>
      <p>To overcome these di culties, we employed a state-of-the-art deep learning
architecture and compared a number of approaches to increase the accuracy of
very ne-grained classi cation when learning from noisy data. The results of the
challenge are depicted in Figure 5. Based on our evaluation, the following steps
increase the classi cation accuracy:
{ Maxout [14] with batch normalisation [15] of the added FC layer.
{ Filtering the noisy data using a model trained on a trusted database.
{ Bagging of several networks ne-tuned under di erent conditions.
Adjusting the species distribution on the test set, on the other hand, has
decreased the recognition accuracy noticeably.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>Milan Sulc was supported by Electrolux Student Support Programme and by
CTU student grant SGS17/185/OHK3/3T/13, Jir Matas was supported by The
Czech Science Foundation Project GACR P103/12/G084.
13. Christian Szegedy, Sergey Io e, Vincent Vanhoucke, and Alex Alemi. Inception-v4,
inception-resnet and the impact of residual connections on learning. arXiv preprint
arXiv:1602.07261, 2016.
14. Ian J Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua</p>
      <p>Bengio. Maxout networks. arXiv preprint arXiv:1302.4389, 2013.
15. Sergey Io e and Christian Szegedy. Batch normalization: Accelerating
deep network training by reducing internal covariate shift. arXiv preprint
arXiv:1502.03167, 2015.
16. Scott Reed, Honglak Lee, Dragomir Anguelov, Christian Szegedy, Dumitru
Erhan, and Andrew Rabinovich. Training deep neural networks on noisy labels with
bootstrapping. arXiv preprint arXiv:1412.6596, 2014.
17. Yves Grandvalet and Yoshua Bengio. Entropy regularization. Semi-supervised
learning, pages 151{168, 2006.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Herve Goeau, Pierre Bonnet, and
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Joly</surname>
          </string-name>
          .
          <article-title>Plant identi cation based on noisy web data: the amazing performance of deep learning (lifeclef 2017)</article-title>
          .
          <source>In CLEF working notes</source>
          <year>2017</year>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Joly</surname>
          </string-name>
          , Herve Goeau, Herve Glotin, Concetto Spampinato, Pierre Bonnet,
          <string-name>
            <surname>Willem-Pier</surname>
            <given-names>Vellinga</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jean-Christophe</surname>
            <given-names>Lombardo</given-names>
          </string-name>
          , Robert Planque, Simone Palazzo, and Henning Muller.
          <article-title>Lifeclef 2017 lab overview: multimedia species identi cation challenges</article-title>
          .
          <source>In Proceedings of CLEF</source>
          <year>2017</year>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Joly</surname>
          </string-name>
          , Herve Goeau, Herve Glotin, Concetto Spampinato, Pierre Bonnet,
          <string-name>
            <surname>Willem-Pier</surname>
            <given-names>Vellinga</given-names>
          </string-name>
          , Julien Champ, Robert Planque, Simone Palazzo, and Henning Muller.
          <source>Lifeclef</source>
          <year>2016</year>
          :
          <article-title>multimedia life species identi cation challenges</article-title>
          .
          <source>In Proceedings of CLEF</source>
          <year>2016</year>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Herve Goeau, Pierre Bonnet, and
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Joly</surname>
          </string-name>
          .
          <article-title>Lifeclef plant identi cation task 2015</article-title>
          . In Working Notes of CLEF 2015 -
          <article-title>Conference and Labs of the Evaluation forum</article-title>
          , Toulouse, France, September 8-
          <issue>11</issue>
          ,
          <year>2015</year>
          . CEUR-WS,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Herve Goeau, Pierre Bonnet, and
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Joly</surname>
          </string-name>
          .
          <article-title>Plant identi cation in an open-world (lifeclef 2016)</article-title>
          .
          <source>In CLEF working notes</source>
          <year>2016</year>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Sungbin</given-names>
            <surname>Choi</surname>
          </string-name>
          .
          <article-title>Plant identi cation with deep convolutional neural network: Snumedinfo at lifeclef plant identi cation task 2015</article-title>
          . In Working Notes of CLEF 2015 -
          <article-title>Conference and Labs of the Evaluation forum</article-title>
          , Toulouse, France, September 8-
          <issue>11</issue>
          ,
          <year>2015</year>
          . CEUR-WS,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>ZongYuan</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <surname>Chris</surname>
            <given-names>McCool</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Condrad</given-names>
            <surname>Sanderson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Peter</given-names>
            <surname>Corke</surname>
          </string-name>
          .
          <article-title>Content speci c feature learning for ne-grained plant classi cation</article-title>
          .
          <source>In Working Notes of CLEF</source>
          <year>2015</year>
          <article-title>- Conference and Labs of the Evaluation forum</article-title>
          , Toulouse, France, September 8-
          <issue>11</issue>
          ,
          <year>2015</year>
          . CEUR-WS,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Julien</given-names>
            <surname>Champ</surname>
          </string-name>
          , Titouan Lorieul, Maximilien Servajean, and
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Joly</surname>
          </string-name>
          .
          <article-title>A comparative study of ne-grained classi cation methods in the context of the lifeclef plant identi cation challenge 2015</article-title>
          . In Working Notes of CLEF 2015 -
          <article-title>Conference and Labs of the Evaluation forum</article-title>
          , Toulouse, France, September 8-
          <issue>11</issue>
          ,
          <year>2015</year>
          . CEUR-WS,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Angie</surname>
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Reyes</surname>
          </string-name>
          ,
          <string-name>
            <surname>Juan C. Caicedo</surname>
            , and
            <given-names>Jorge E.</given-names>
          </string-name>
          <string-name>
            <surname>Camargo</surname>
          </string-name>
          .
          <article-title>Fine-tuning deep convolutional networks for plant recognition</article-title>
          .
          <source>In Working Notes of CLEF</source>
          <year>2015</year>
          <article-title>- Conference and Labs of the Evaluation forum</article-title>
          , Toulouse, France, September 8-
          <issue>11</issue>
          ,
          <year>2015</year>
          . CEUR-WS,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Milan</surname>
            <given-names>Sulc</given-names>
          </string-name>
          , Dmytro Mishkin, and
          <string-name>
            <given-names>Jir</given-names>
            <surname>Matas</surname>
          </string-name>
          .
          <article-title>Very deep residual networks with maxout for plant identi cation in the wild</article-title>
          .
          <source>In Working notes of CLEF 2016 conference</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Mostafa Mehdipour Ghazi, Berrin Yanikoglu, and
          <string-name>
            <given-names>Erchan</given-names>
            <surname>Aptoula</surname>
          </string-name>
          .
          <article-title>Open-set plant identi cation using an ensemble of deep convolutional neural networks</article-title>
          .
          <source>Working notes of CLEF</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Siang Thye Hang, Atsushi Tatsuma, and
          <string-name>
            <given-names>Masaki</given-names>
            <surname>Aono</surname>
          </string-name>
          .
          <article-title>Blue eld (kde tut) at lifeclef 2016 plant identi cation task</article-title>
          .
          <source>In Working notes of CLEF 2016 conference</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>