<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Plant Species Identi cation Using Transfer Learning - PlantCLEF 2020</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nanda H Krishna?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rakesh M?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ram Kaushik R?</string-name>
          <email>ramkaushik17125g@cse.ssn.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering SSN College of Engineering</institution>
          ,
          <addr-line>Kalavakkam, Chennai</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Automated prediction of plant species from images is immensely useful to conservationists, especially in the case of data-scarce regions of biodiversity. The PlantCLEF 2020 Challenge provides a platform for the creation of a classi er to identify plant species from a large collection of labelled images. The aim of the challenge is to identify which methods work best on the same data, and hence accelerate research progress in the eld. In this paper, we discuss the submissions made by our team to the challenge, based on transfer learning. For our submissions, we trained our models on Cloud TPUs and TPU Pods available on Google Cloud Platform. All our models, which were initially trained on the ImageNet Dataset, were ne-tuned to the PlantCLEF 2020 Dataset using transfer learning. With our ResNet-50 models, we achieved an overall MRR of 0.008 in the testing phase. For speci cally chosen classes with fewer training samples, we achieved an MRR of 0.003.</p>
      </abstract>
      <kwd-group>
        <kwd>Species Identi cation</kwd>
        <kwd>Transfer Learning</kwd>
        <kwd>TPU</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Automated plant species identi cation from images is immensely useful in
datascarce regions of biodiversity, to identify and record the ora present. With the
advent of Deep Learning and novel model architectures, performance in this
task has improved considerably over the years. However, classi cation with a
very large number of classes is still a tough task with considerable scope for
improvement.</p>
      <p>
        It is with the objective of building a reliable plant species identi cation
system that the PlantCLEF 2020 Challenge [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] was organised, as part of LifeCLEF
2020 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The challenge provides a platform for the evaluation of di erent
methods on the same dataset, in an e ort to identify the best-performing algorithm
for the task. A large labelled dataset of plant images was provided by the
organisers, wherein the images exhibit great inter- and intra-class diversity, mimicking
the real world.
      </p>
      <p>Evaluation of the submissions to the challenge was based on the Mean
Reciprocal Rank (MRR) metric. Additionally, the MRR for the classi cation of speci c
species with fewer training samples was considered as a secondary metric.
In this paper, we will discuss our team's submissions to the PlantCLEF 2020
Challenge in detail. We will rst discuss the methodology we employed to solve
the task. next, we will outline the resources used to build our models. Finally,
we will discuss the results obtained by our submissions, along with an analysis
and a note on future work.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <sec id="sec-2-1">
        <title>Data Preprocessing</title>
        <p>
          First, we normalised the pixel values to the range [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]. All the images were
then resized to 224 224 3, due to limited computational resources. To do so,
we rst resize the smaller of the two spatial dimensions (H or W ) to 224 pixels,
and then extract the center crop of size 224 224 (H W ). The disadvantage of
this preprocessing step is the possible removal of salient information present in
pixels that were discarded. An alternative approach could be the use of multiple
224 224 crops from the image extracted with a certain stride, to ensure all
details in the original image are present in a subset of the extracted images.
However, this would increase the number of images in the dataset and thus the
computation time.
        </p>
        <p>We applied augmentations to the training images to improve the generalisation
performance of our models. Images were randomly zoomed-in or zoomed-out by
up to 20% of the image width, rotated by an angle in the range [-45 , 45 ] and
ipped about their vertical axis. The objective of these augmentations is to make
the models learn robust features, invariant to scale or rotation.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Models Used</title>
        <p>
          VGG-16 &amp; VGG-19: We rst experimented with the VGG-16 and VGG-19
architectures [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], pre-trained on the ImageNet Dataset [
          <xref ref-type="bibr" rid="ref2 ref8">2,8</xref>
          ]. The pre-trained
models were used as non-trainable feature extractors, and their output for every
image was passed to a shallow ANN of 2 Dense layers (each having 4096 units),
followed by a Fully-Connected layer (with softmax activation). This method did
not perform well on our validation set, and we did not make any submissions
based on this method.
        </p>
        <p>
          ResNet-50: Both our nal submissions to the challenge were made using the
ResNet-50 [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] models. We used the ResNet-50 architecture, again pre-trained on
the ImageNet Dataset [
          <xref ref-type="bibr" rid="ref2 ref8">2,8</xref>
          ], as a trainable feature extractor network. Following
the layers of the ResNet, we added 2 Dense layers of 2048 units each, followed
by a Fully-Connected layer with softmax activation.
        </p>
        <p>All models were trained with Adam as the optimiser and categorical cross-entropy
as the loss function. The number of training epochs was set to 8. As there were
training examples for only 998 classes, all our models had an output vector of
998 probabilities.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Prediction</title>
        <p>A single species could have one or more images associated with the same
submission ID. Two techniques were used to make predictions for the individual
species. The predictions were made for the individual images, and either the
average or the maximum of the probabilities predicted were used to rank the
species and generate the nal predictions.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Source Code and Resources Used</title>
      <p>
        Our models were built using the Keras [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] Deep Learning framework, and the
code was written in Python 3. Jupyter Notebooks containing our code have been
made publicly available on our GitHub Repository1.
      </p>
      <p>All of the work was performed on the Google Cloud Platform (GCP). Tensor
Processing Units (TPU) are processors developed by Google speci cally for the
purpose of accelerating tasks that involve computation on tensors. TPU Pods are
a collection of TPUs that are connected by a high speed network. GCP provides
access to Cloud TPUs and Cloud TPU Pods. Both individual TPUs (version 2
with 8 cores and version 3 with 8 cores) as well as a TPU Pod (version 3 with
256 cores) were made use of in the training phase.</p>
      <p>For all other tasks, including making of predictions, an n1-highmem VM
instance was used, with 8 CPU cores and 52GB of RAM. The storage drive used
was a 150GB SSD, connected to the instance.</p>
      <p>Despite the power of the computational resources available, we were unable to
experiment with more powerful models or approaches due to limited availability
of credits. Furthermore, the TPUs allocated to us were present in regions other
than that of our VM instance, causing delays and increased compute time due to
network operations. The training of our ResNet model, for instance, took around
2 hours per epoch, highlighting the importance of computational resources in
data-intensive tasks.
1 https://github.com/nandahkrishna/PlantCLEF2020</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>The results obtained by our models can be seen in Table 1. We have included
the categorical cross-entropy loss for all our models as well as the MRR score for
our submitted models. With both our submissions, we obtained a Testing MRR
score of 0.008, with an MRR of 0.003 on the species with fewer training samples.</p>
      <p>Overall our submissions received the 44th and 45th ranks on the challenge
leaderboard. The scores obtained by all submissions to the challenge can be
visualised in Fig. 1. The top scores were obtained by teams ITCR PlantNet
(Overall MRR 0.180) and Neoun AI (Overall MRR 0.121). On the species with
limited training examples, the ITCR PlantNet team obtained a poorer score
(MRR 0.062) than the Neuon AI team (MRR 0.108).</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>
        Despite the resources available, training our models was computationally
expensive as the large dataset was very large. We were unable to train our models
for a larger number of epochs due to the constraints of our cloud computing
resources. Increasing the number of training epochs could yield larger gains in
classi cation accuracy or MRR, as indicated by Fig. 2.
Additionally, the lower scores obtained by our methods could be attributed to
the low similarity between the dataset used for pre-training and the task-speci c
dataset. In such a setting, using homogeneous domain adaptation techniques
such as MMD [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or approaches based on an architecture criterion [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] could
improve performance on the target domain by accounting for the di erence in data
distribution. We consider only the supervised setting as the PlantCLEF dataset
contains su cient number of examples to facilitate the same. An alternative to
this would be the use of heterogeneous domain adaptation techniques using
generative models as discussed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which could help improve performance on
a target domain which is very di erent from the source domain, as is in our case.
For the future, we would still consider a transfer learning based approach,
however, with pre-training on a similar large dataset. This would considerably
improve model performance. In addition, we would like to consider using extreme
classi cation methods to handle the large number of output classes better.
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>We would like to acknowledge the support received in the form of Cloud TPUs
and a Cloud TPU v3 Pod from TensorFlow Research Cloud (TFRC).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Chollet</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , et al.: Keras. https://github.com/fchollet/keras (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fei-Fei</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <string-name>
            <surname>ImageNet: A LargeScale Hierarchical Image</surname>
          </string-name>
          <article-title>Database</article-title>
          .
          <source>In: CVPR09</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Overview of the lifeclef 2020 plant identi cation task</article-title>
          .
          <source>In: CLEF working notes</source>
          <year>2020</year>
          ,
          <article-title>CLEF: Conference and Labs of the Evaluation Forum</article-title>
          , Sep.
          <year>2020</year>
          , Thessaloniki,
          <string-name>
            <surname>Greece.</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In: 2016 IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2016</year>
          ,
          <string-name>
            <surname>Las</surname>
            <given-names>Vegas</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NV</surname>
          </string-name>
          , USA, June 27-30,
          <year>2016</year>
          . pp.
          <volume>770</volume>
          {
          <fpage>778</fpage>
          . IEEE Computer Society (
          <year>2016</year>
          ). https://doi.org/10.1109/CVPR.
          <year>2016</year>
          .
          <volume>90</volume>
          , https://doi.org/10.1109/ CVPR.
          <year>2016</year>
          .90
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deneu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kahl</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Goeau, H., Ruiz De Castaneda,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Champ</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Eggel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Cole</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dorso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Glotin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Lorieul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          , Stoter,
          <string-name>
            <given-names>F.R.</given-names>
            ,
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.P.</surname>
          </string-name>
          , Muller, H.:
          <article-title>Lifeclef 2020: Biodiversity identi cation and prediction challenges</article-title>
          .
          <source>In: Proceedings of CLEF</source>
          <year>2020</year>
          ,
          <article-title>CLEF: Conference and Labs of the Evaluation Forum</article-title>
          , Sep.
          <year>2020</year>
          , Thessaloniki,
          <string-name>
            <surname>Greece.</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Long</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Deep transfer learning with joint adaptation networks</article-title>
          . In: Precup,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Teh</surname>
          </string-name>
          , Y.W. (eds.)
          <source>Proceedings of the 34th International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2017</year>
          ,
          <article-title>Sydney</article-title>
          ,
          <string-name>
            <surname>NSW</surname>
          </string-name>
          , Australia,
          <fpage>6</fpage>
          -
          <issue>11</issue>
          <year>August 2017</year>
          .
          <source>Proceedings of Machine Learning Research</source>
          , vol.
          <volume>70</volume>
          , pp.
          <volume>2208</volume>
          {
          <fpage>2217</fpage>
          .
          <string-name>
            <surname>PMLR</surname>
          </string-name>
          (
          <year>2017</year>
          ), http://proceedings.mlr.press/v70/long17a.html
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Rozantsev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salzmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fua</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Beyond sharing weights for deep domain adaptation</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>41</volume>
          (
          <issue>4</issue>
          ),
          <volume>801</volume>
          {
          <fpage>814</fpage>
          (
          <year>2019</year>
          ). https://doi.org/10.1109/TPAMI.
          <year>2018</year>
          .
          <volume>2814042</volume>
          , https://doi.org/10. 1109/TPAMI.
          <year>2018</year>
          .2814042
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Russakovsky</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Satheesh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Ma,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Karpathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Berg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.C.</given-names>
            ,
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          :
          <article-title>ImageNet Large Scale Visual Recognition Challenge</article-title>
          .
          <source>International Journal of Computer Vision</source>
          (IJCV)
          <volume>115</volume>
          (
          <issue>3</issue>
          ),
          <volume>211</volume>
          {
          <fpage>252</fpage>
          (
          <year>2015</year>
          ), https://doi.org/10.1007/s11263-015-0816-y
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          . In: Bengio,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>LeCun</surname>
          </string-name>
          , Y. (eds.) 3rd
          <source>International Conference on Learning Representations, ICLR</source>
          <year>2015</year>
          , San Diego, CA, USA, May 7-
          <issue>9</issue>
          ,
          <year>2015</year>
          , Conference Track Proceedings (
          <year>2015</year>
          ), http://arxiv.org/abs/1409.1556
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Taigman</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polyak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Unsupervised cross-domain image generation</article-title>
          .
          <source>In: 5th International Conference on Learning Representations, ICLR</source>
          <year>2017</year>
          , Toulon, France,
          <source>April 24-26</source>
          ,
          <year>2017</year>
          , Conference Track Proceedings. OpenReview.net (
          <year>2017</year>
          ), https://openreview.net/forum?id=Sk2Im59ex
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>