<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Plant Identi cation with Large Number of Species: SabanciU-GebzeTU System in PlantCLEF 2017</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sara Atito</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Berrin Yanikoglu</string-name>
          <email>berring@sabanciuniv.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erchan Aptoula</string-name>
          <email>eaptoula@gtu.edu.tr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Engineering and Natural Sciences, Sabanci University</institution>
          ,
          <addr-line>Istanbul</addr-line>
          ,
          <country country="TR">Turkey</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Information Technologies, Gebze Technical University</institution>
          ,
          <addr-line>Kocaeli</addr-line>
          ,
          <country country="TR">Turkey</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe the plant identi cation system that was submitted to the LifeCLEF plant identi cation campaign in 2017 [1], as a collaboration of Sabanci University and Gebze Technical University in Turkey. Similar to our system that got a very close second place in 2016, we ne-tuned two well-known deep learning architectures (VGGNet and GoogLeNet) that were pre-trained on the object recognition dataset of ILSVRC 2012 and used an ensemble of 4-9 networks using score-level combination for the submitted systems. Our best system was obtained with a classi er fusion of 9 networks trained with some di erences in training (network architecture, data, or initialization), achieving an average inverse rank of 0.634 on the o cial test data, while the rst place system achieved an impressive score of 0.92.</p>
      </abstract>
      <kwd-group>
        <kwd>plant identi cation</kwd>
        <kwd>deep learning</kwd>
        <kwd>convolutional neural networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Automatic plant identi cation addresses the identi cation of the plant species
in a given photograph. Plant identi cation challenge within the Conference and
Labs of the Evaluation Forum (CLEF) [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7">1,2,3,4,5,6,7</xref>
        ] is the most well-known
annual event that benchmarks content-based image retrieval of plants. The
campaign has been run since 2011, with plant species and number of training images
almost doubling every year, reaching to 10,000 classes in the 2017 evaluation.
Considering very high similarities between species and a large variety of imaging
and plant conditions, the problem is rather challenging.
      </p>
      <p>
        Our team participated in the PlantCLEF 2017 campaign under the name of
SabanciU-GebzeTU. In all of our runs, we used an ensemble of 4-9 convolutional
networks, with di erent classi er combination criteria. The base networks were
pre-trained deep convolutional neural networks of GoogLeNet [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and VGGNet
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] that were ne-tuned with plant images. The campaign organizers provided
two separate data sets: the main training set consisted of 256,203 images with
clean labels (collected from the Encyclopedia of Life (EOL)) and the web crawled
data consisted of around 1.6 million of images with noisy labels. The test set was
sequestered until a few weeks before results submission. Details of the campaign
can be found in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>The rest of this paper is organized as follows. Section 2 describes our approach
based on: ne-tuning GoogLeNet and VGGNet models for plant identi cation
and applying score-level classi er fusion. Section 3 describes the data sets and
experimental results. The paper concludes in Section 4 with the summary and
discussion of the utilized methods and obtained results.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <p>
        Our approach was ne-tuning and fusing of two successful deep learning
models, i.e. GoogLeNet [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and VGGNet[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], using the implementations provided
in the Ca e deep learning framework [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. These models are, respectively, the
rst-ranked and second-ranked architectures of the ImageNet Large-Scale
Visual Recognition Challenge (ILSVRC) 2014{both trained on the ILSVRC 2012
dataset with 1.2 million labeled images of 1,000 object classes.
      </p>
      <p>
        In this work, we ne-tuned the GoogLeNet and VGGNet models starting
from the learned weights of our PlantCLEF2016 system [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In the rst network,
we used only the training portion of EOL with internal augmentation (during
training at each iteration a random crop of the image is used and randomly
mirrored horizontally), to get some quick results. This network was the VGGNet
architecture with all but the last layer of weights being xed. In fact, in all of
the experiments, we could only ne-tune the last 1-2 layers, as learning was very
slow otherwise. This network achieved 41% accuracy.
      </p>
      <p>After getting the base system running, we started using 8-fold external
augmentation for training and later we started to incorporate images from the noisy
data set into the training data: as the web crawled data is not reliable, we tested
200,000 images from the noisy data set using the best networks we had thus far
and took only those images for which prediction matched the groundtruth.</p>
      <p>We also tried VGGNET using Batch Normalization and GoogleNet
architecture, with roughly similar performance. In both of these networks, all of the layers
were xed except for the last one due to scarce computing resources. Another
network concentrated on the most common 1000 species and while we found that
this network only achieved a 27% accuracy, it helped improve the performance
of the ensemble like all other networks. In this fashion, each successive network
(for a total of 9 di erent ones) was trained for either more iterations, or with
new data added, or with di erent network architecture. At last, we trained one
of the previous networks with all available training data, merging the validation
set to the training set. This was done for only one network given the limited
time.</p>
      <p>
        Score-level averaging is applied to combine the prediction scores assigned to
each of the augmented patches within a single network. As for the nal systems,
the obtained scores from all networks are combined using Borda count [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] or
based on the maximum score of di erent classi ers.
      </p>
      <p>Our main problem was computational resources, faced with a very large
number of classes and large amount of data. Only 60,000 images from the noisy
data set were veri ed (to check for prediction and label match) and added to
the training set. All trains and tests were run on a Linux system with a Tesla
K40c and 12GB of video memory and in most cases training a network took 2-3
days.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Results</title>
      <p>For training and validating our system, we used the EOL data consisting of
256,203 images of di erent plant organs, belonging to 10,000 species. Speci
cally, we randomly divided the training portion of the dataset into two subsets
for training and validation, with 174,280 and 81,923 images respectively. The
test portion of the dataset consists of a separate set of 25,170 images that was
sequestered by the organizers, until the last weeks of the campaign. We will call
these three subsets train, validation and test subsets respectively in the
remainder of this paper.</p>
      <p>The base accuracy of the networks trained with all of the 10,000 classes
ranged from 41% to 48.4% and the combined accuracy was 61.03%, on the
validation subset. The combination was helpful even with highly correlated networks
and taking less successful networks from the ensemble always reduced the
performance The most successful individual network, based on the accuracy of the
validation set, was the VGGNet using the largest training set (the train subset
and around 60,000 samples from noisy data) and with a large batch size (60).</p>
      <p>
        The submitted runs are described below and the results (mean inverse rank)
released by the campaign organizers are given in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and shown in Figure 1.
Detailed scores and ranking of the best runs from the top teams are shown in
Table 1.
{ Run 1. In this run, the combination was done based on Borda count, with
classi er con dence to break the ties.
{ Run 2. This ensemble only used based systems trained with EOL data.
{ Run 3. This system was the same as System 4 except for using a combination
based on maximum con dence.
{ Run 4. This system was the same as System 1 except for classi er
combination weights.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>
        The main objective was to preserve the high scores we obtained in 2016, despite
the 10-fold increase in the number of classes [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Unfortunately, the large number
of classes and limited computational power made it impossible to successfully
ne-tune the networks or use most of the images from the noisy data set. While
our results were signi cantly below the best performing system this year, our
results are not too far from our results last year, despite 10-fold increase in the
number of classes. It was overall a challenging exercise to deal with a large real
life problem.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Goeau, H.,
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spampinato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vellinga</surname>
            ,
            <given-names>W.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lombardo</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Planque</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palazzo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Muller, H.:
          <article-title>LifeCLEF 2017 lab overview: multimedia species identi cation challenges</article-title>
          .
          <source>In: CLEF 2017 Proceedings, Springer Lecture Notes in Computer Science (LNCS)</source>
          .
          <article-title>(</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boujemaa</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barthelemy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molino</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Birnbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mouysset</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Picard</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The CLEF 2011 plant images classi cation task</article-title>
          . In: CLEF (Notebook Papers/Labs/Workshop). (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yahiaoui</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barthelemy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boujemaa</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molino</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>The ImageCLEF 2012 plant identi cation task</article-title>
          . In: CLEF (Online Working Notes/Labs/Workshop). (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bakic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barthelemy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boujemaa</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molino</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>The ImageCLEF 2013 plant identi cation task</article-title>
          .
          <source>In: CLEF (Working Notes)</source>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Goeau, H.,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Selmi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molino</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barthelemy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boujemaa</surname>
          </string-name>
          , N.:
          <article-title>LifeCLEF plant identi cation task 2014</article-title>
          . In: CLEF (Working Notes). (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>LifeCLEF plant identi cation task 2015</article-title>
          . In: CLEF (Working Notes). (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Plant identi cation in an open-world (lifeclef 2016)</article-title>
          .
          <source>In: CLEF (Working Notes)</source>
          . (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sermanet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reed</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anguelov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erhan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoucke</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rabinovich</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Going deeper with convolutions</article-title>
          .
          <source>In: IEEE Conference on Computer Vision and Pattern Recognition</source>
          . (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          .
          <source>Computing Research Repository (CoRR)</source>
          (
          <year>2014</year>
          ) arXiv:
          <fpage>1409</fpage>
          .
          <fpage>1556</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shelhamer</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donahue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karayev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Long</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guadarrama</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
          </string-name>
          , T.:
          <article-title>Ca e: Convolutional architecture for fast feature embedding</article-title>
          .
          <source>In: Proceedings of the 22nd ACM International Conference on Multimedia.</source>
          (
          <year>2014</year>
          )
          <volume>675</volume>
          {
          <fpage>678</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mehdipour-Ghazi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yanikoglu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aptoula</surname>
          </string-name>
          , E.:
          <article-title>Open-set plant identi cation using an ensemble of deep convolutional neural networks</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2016</year>
          <article-title>- Conference and Labs of the Evaluation forum</article-title>
          , Evora, Portugal,
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          September,
          <year>2016</year>
          . (
          <year>2016</year>
          )
          <volume>518</volume>
          {
          <fpage>524</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Erp</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schomaker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Variants of the Borda count method for combining ranked classi er hypotheses</article-title>
          .
          <source>In: Proceedings of the seventh international workshop on frontiers in handwriting recognition</source>
          . (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>