<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LifeClef 2017 Plant Identi cation Challenge: Classifying Plants Using Generic-Organ Correlation Features</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sue Han Lee</string-name>
          <email>leesuehan@siswa.um.edu.my</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yang Loong Chang</string-name>
          <email>yangloong@siswa.um.edu.my</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chee Seng Chan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre of Image &amp; Signal Processing, Fac. Comp. Sci. &amp; Info. Tech., University of Malaya</institution>
          ,
          <country country="MY">Malaysia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes our proposal in the multi-organ plant identi cation task (LifeClef2017 challenge [8]). The objective of the challenge is to evaluate to what extent machine learning and computer vision can learn from noisy data compared to trusted data. To address the challenge, we employ our recent proposed hybrid generic-organ convolutional neural network, abbreviated HGO-CNN [11] to train on di erent composition of plant datasets. Overall, all the submitted runs obtained comparable results in the LifeClef2017 plant classi cation task.</p>
      </abstract>
      <kwd-group>
        <kwd>Plant classi cation</kwd>
        <kwd>deep learning</kwd>
        <kwd>convolutional neural network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Plant classi cation has received particular attention in the computer vision eld
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] due to its important implications in agriculture automation and
environmental conservation. Along with the recent advances in science and technology,
automatic plant species recognition has been made possible to assist botanists
in plant identi cation tasks [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. For example: development of an e cient plant
recognition system using the Local Binary Pattern [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] allows the classi cation
of medical plants [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Robotic weed control system drives studies on automatic
plant identi cation in agronomic research aimed at crop improvement by
recognition of crop plants and elimination of weeds [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Despite these, automatic plant
recognition, a foundational capability in this context, is nevertheless still in its
early stages.
      </p>
      <p>
        In 2013, LifeClef challenge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] provided the rst multi-organ plant dataset.
This was the rst multi-organ plant classi cation benchmark for the computer
vision community. This year, LifeClef 2017 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] o ered a bigger amount of plant
biodiversity data [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The objective is to identify 10000 species from images of
plants collected based on two di erent channels: a \trusted" training set and a
\noisy" training set. The trusted training set is collected from the online
collaborative Encyclopedia Of Life (EoL) such as Wikipedia, iNaturalist and Flickr
while the noisy training set is collected based on the Google and Bing image
search results. In this challenge, we employ our recently proposed convolutional
neural network (CNN) architecture { namely the HGO-CNN [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] with small
re nements. Speci cally, it integrates both the generic and organ-speci c
information for the multi-organ plant classi cation task.
      </p>
      <p>The rest of the working note is organized as follows. In Section 2, we present
the methodology of our proposed architecture. Section 3 illustrates its training
scheme. Section 4 shows the experiments and results for both the validation and
testing set. Lastly, Section 5 presents conclusion and future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Method Description</title>
      <p>
        Unlike previous approaches [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] that trained CNN to capture solely generic
representation from the plantation images, HGO-CNN [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is able to encapsulate
both the organ and generic information prior to the plant classi cation. We
consider features from the organ because plant organs in general, are known
prior to the exploration of its characteristics. For instant, when botanists study
a leaf, they focus on the leaf characters such as margin or venation, and, when
they study a ower, they focus on the characteristics of petals, sepals and stamen
to identify the plant species. So, we believe that a better recognition method for
plant species requires prior information of their respective organs.
      </p>
      <p>
        The proposed HGO-CNN comprises of four layers or components: (i) a shared
layer, (ii) an organ layer, (iii) generic layer, and (iv) a species layer. We
introduce shared layer for both the generic and organ components. The reasons are
threefold. First, [
        <xref ref-type="bibr" rid="ref16 ref17">17, 16</xref>
        ] demonstrated that preceding layers in deep networks
response to low-level features such as corners and edges. Since both the higher
level generic and organ components require low-level features to build higher
level features, we introduce shared preceding layers for these components.
Second, according to [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], the shared layer may reduce oating point operations
and memory footprint of the network execution, which are of importance for
real world application. Lastly, using shared layer will help to reduce the number
of training parameters, which is bene cial to the architecture's computational
e ciency. Fig. 1 depicts the con guration of our proposed model. Input to our
proposed model is a color image of 224 224 pixels. For the convolutional layer,
we utilise 3 3 convolution lters with spatial resolution preserved using stride
1. Max pooling is performed using a 2 2 pixel window with stride 2. Three
fully connected layers, which have 4096, 4096 and 10000 channels respectively,
follow behind the stacks of convolutional layers. Finally, the HGO-CNN output
is fed into a softmax layer to produce the softmax output. Note that for the
q-th class, the softmax output is de ned as Pn(q) = PmMe=s1qesm where M stands
for the total number of classes and s stands for the class prediction score. After
performing the softmax operation, softmax loss L is computed as follows:
L =
1 XB
B n=1
log(Pn(T ))
(1)
where B is the batch size and T is the ground truth class label for the n-th input
image.
      </p>
      <p>
        In this challenge, we re ne some of the con gurations in the original
HGOCNN architecture: (1) the data layer normalization technique, called Batch
Normalization (BN) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is included. We added BN from the last convolution layer
of both the generic and organ components respectively until the fully connected
layers. This is to enhance the correlation of representation learning between the
two components, so that it is more robust to non-linearities. (2) During
feature fusion, features summation is performed instead of concatenation to further
amplify the correspondences of these features.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Training</title>
      <p>
        Pre-Training Two-Path CNN We design a two-path CNN as shown in Fig.
2 for the purpose of training two di erent components: the generic and organ
speci c features. This two-path CNN is initially pre-trained using the ImageNet
challenge dataset [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Organ layer After we obtained the pre-trained two-path CNN, one of the CNN
path is repurposed for the organ task. This organ layer is trained together with
the shared layer, using seven kinds of prede ned organ labels. We obtain
organbased feature maps, xorg 2 RH W Z where H; W and Z are the height, width
and number of channels of the respective feature maps. Since PlantClef2017
dataset does not provide organ information for every plant image, we train the
organ layer based on the previous PlantClef2015 training set.</p>
      <p>Generic layer After training the organ layer, another CNN path is
repurposed for the generic task. This generic layer is trained using the species
labels, regardless of organ information. We obtain generic-based feature maps,
xgen 2 RH W Z . To allow both the organ and generic layers to share the
common proceeding layer, we keep the shared layer's weights to be consistent. This
is achieved by setting their learning rate to zero.</p>
      <p>Species layer To introduce correlation between both the organ and generic
components, a fusion function y = g(xorg; xgen) is employed at stage L (after
the last convolutional layer for both components as shown in Fig. 1) to produce
the organ and generic correlation feature maps, y 2 RH W Z . In our model, g
performs summation of these two sets of features:
yi;j;k = xorgi;j;k + xgeni;j;k
(2)
where 1 i H, 1 j W , 1 k Z. The feature maps, y will
then go through two convolution layers to learn the combined representation of
generic and organ features. Since these two convolution layers are new
randomlyinitialised, we set their learning rate to be 10 times higher than the other layers
during training.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <p>
        Our architecture is trained using the Ca e [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] framework. The networks are
trained with back-propagation, using stochastic gradient descent [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. For the
training parameter setting, we employed the xed learning policy. We set the
learning rate to 0.01, and then decreased it by a factor of 10 when the validation
set accuracy stops improving. The momentum was set to 0.9 and the weight
decay to 0.0001. We run the experiments using a NVIDIA K40 graphics card.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Data Preparation</title>
        <p>For the trusted training set, we rst downloaded all 256287 images. We then
randomly selected 208878 images for training and 47409 images for validation.
To increase the robustness of the system in recognising multi-organ plant images,
a multi-scale training was adopted. We isotropically rescaled the training images
into three di erent sizes: 256, 385 and 512, then randomly cropped 224*224 pixels
from the rescaled images to feed into the network for training. By doing this, the
crop from the larger scaled images will correspond to a small part of the image
or particularly subpart of the organ; while the crop from the smaller scaled
images will correspond to the global structure of a plant. Besides that, we also
increased the data size by mirroring the input image during training. After the
data augmentation, we obtained 626634 training images and a validation set of
142227 images.</p>
        <p>However, for the noisy dataset, we only managed to crawl up to 918216
number of images which is about 60% of the total number of images from the web
due to resource limitations. We then separated it into 738716 images for training
and 179500 images for validation. We performed the same data augmentation
to produce another training set that contains 2216148 images and a validation
set of 538500 images. For the testing set, all 25170 images are downloaded and
similarly augmented.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Experimental results on validation set</title>
        <p>
          For the evaluation of our validation set, the softmax output from our CNN
model for each image was rst collected. An averaging fusion was then used to
combine the softmax scores of the augmented validation set. In this experiment,
we computed the top-1 classi cation result (Acc) to infer the robustness of the
system. We compared our method to the generic network, VGG-16 net [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>Table 1 shows the comparison of the performance results. We can observe that
the VGG-16 net performed better in the noisy dataset while our proposed
HGOCNN performed better in the trusted dataset. There are two possible reasons:
(1) the organ layer in HGO-CNN that was trained on previous PlantClef2015
dataset might not be robust enough to model such a huge and diverse data, (2)
noisy dataset in this case is better modeled using generic features regardless of
the organ information as the generic features might include many independent
plant features that help in the classi cation performance.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Experimental Results on Test set</title>
        <p>We submitted four runs to the LifeClef 2017 challenge. We netuned all models
using both the training and validation set to increase the robustness of the
models. To obtain the observation level predictions, an averaging fusion method
was employed to combine the results of the testing images that have the same
observation id. Their performance were evaluated based on Mean Reciprocal
Rank (MRR). The characteristics of each run are stated below:
{ UM Run 1: Proposed HGO-CNN that trained with the trusted set only.
{ UM Run 2: VGG-16 net that trained with the noisy set only.
{ UM Run 3: Combined results of UM Run 1 and UM Run 2 based on averaging
fusion at image level.
{ UM Run 4: Combined results of UM Run 1 and UM Run 2 based on max
voting at image level.
Fig. 3 shows the overall results of the LifeClef2017 multi-organ plant
classication task. We observed that Run 2 which is ranked at 12th out of a total of
28 runs is the best among the submitted runs while Run 1 which is ranked at
19th shows the lowest result. Henceforth, we make a deduction that the fusion
model HGO-CNN that we currently trained is not generalized enough to predict
unseen testing images. Run 3 (ranked at 13th) and Run 4 (ranked at 15th),
both are the combined results of Run 1 and Run 2 respectively, are ranked lower
compared to Run 2. This is clearly due to the poor performance of the Run 1
model. Furthermore, Run 3 is ranked higher than Run 4, an indication that the
averaging fusion method performs better than the max voting method.</p>
        <p>Next, we compared the results of our submitted runs based on di erent
composition of the dataset. Table 2 shows the results of the trusted training
set(EOL). We observe that our UM Run 1 provides a comparable result, where
it is ranked at 5th out of a total of 11 runs. We believe that the performance
could have been better if the organ layer in the HGO-CNN is trained with the
latest Plantclef2017 dataset. However, it was restricted as the organ information
is not provided for most of the images. In Table 3, we have the lowest rank but
we used only 60% of the noisy WEB dataset. Furthermore, there are only two
participants in this category which is hardly a thorough comparison. Lastly, in
Table 4, we observed comparable results for our submitted runs. However, we
believe that our performance can be improved. In the current experiments, we
separately trained the CNN models using two di erent datasets and inferred
the results using average fusion. It is possible that the performance could have
been better if both of the datasets are trained in one single end-to-end CNN
model without the prerequisite of external fusion to infer the species. Moreover,
training on 100% of noisy images might be able to boost up the classi cation
performance.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future work</title>
      <p>This working note explains the implementation of the HGO-CNN for the
Plantclef2017 challenge. We described the methodology of our proposed architecture
and analyzed the results based on both validation and testing set. We observed
that our current HGO-CNN model is not generalized enough to predict unseen
testing images. This might due to the lack of robustness at the organ layer
trained using the previous PlantClef2015 dataset. In the future, we will revise
our proposed model to increase its robustness by re-training all layers using the
latest datasets that incorporate both trusted and noisy images.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>This research is supported by the Postgraduate (PPP) Grant PG007-2016A,
from University of Malaya; and the used K40 GPU was donated by NVIDIA
Corporation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Champ</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lorieul</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Servajean</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A comparative study of ne-grained classi cation methods in the context of the lifeclef plant identi cation challenge 2015</article-title>
          .
          <source>In: CLEF 2015</source>
          . vol.
          <volume>1391</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Plant identi cation with deep convolutional neural network: Snumedinfo at lifeclef plant identi cation task 2015</article-title>
          . In: Working notes of CLEF 2015 conference (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Plant identi cation based on noisy web data: the amazing performance of deep learning (lifeclef 2017)</article-title>
          .
          <source>In: CLEF working notes 2017</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bakic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barthelemy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boujemaa</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molino</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>The imageclef 2013 plant identi cation task</article-title>
          .
          <source>In: CLEF</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Haug</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michaels</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biber</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ostermann</surname>
          </string-name>
          , J.:
          <article-title>Plant classi cation system for crop/weed discrimination without segmentation</article-title>
          .
          <source>In: Applications of Computer Vision (WACV)</source>
          ,
          <source>2014 IEEE Winter Conference on</source>
          . pp.
          <volume>1142</volume>
          {
          <fpage>1149</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Io e, S.,
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Batch normalization: Accelerating deep network training by reducing internal covariate shift</article-title>
          .
          <source>In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15)</source>
          . pp.
          <volume>448</volume>
          {
          <fpage>456</fpage>
          . JMLR Workshop and Conference Proceedings (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shelhamer</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donahue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karayev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Long</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guadarrama</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
          </string-name>
          , T.:
          <article-title>Ca e: Convolutional architecture for fast feature embedding</article-title>
          .
          <source>In: Proc. of the ACM International Conference on Multimedia</source>
          . pp.
          <volume>675</volume>
          {
          <issue>678</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Goeau, H.,
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spampinato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vellinga</surname>
            ,
            <given-names>W.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lombardo</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Planque</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palazzo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Muller, H.:
          <article-title>Lifeclef 2017 lab overview: multimedia species identi cation challenges</article-title>
          .
          <source>In: CLEF 2017 Proceedings, Springer Lecture Notes in Computer Science (LNCS)</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.E.:
          <article-title>Imagenet classi cation with deep convolutional neural networks</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>1097</volume>
          {
          <issue>1105</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>C.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mayo</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Remagnino</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>How deep learning extracts and learns leaf features for plant classi cation</article-title>
          .
          <source>Pattern Recognition</source>
          <volume>71</volume>
          ,
          <issue>1</issue>
          {
          <fpage>13</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>C.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Remagnino</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Hgo-cnn: Hybrid generic-organ convolutional neural network for multi-organ plant classi cation</article-title>
          .
          <source>In: ICIP</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Naresh</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nagendraswamy</surname>
          </string-name>
          , H.:
          <article-title>Classi cation of medicinal plants: an approach using modi ed lbp with symbolic representation</article-title>
          .
          <source>Neurocomputing</source>
          <volume>173</volume>
          ,
          <issue>1789</issue>
          {
          <fpage>1797</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ojala</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pietikainen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maenpaa</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Multiresolution gray-scale and rotation invariant texture classi cation with local binary patterns</article-title>
          .
          <source>TPAMI</source>
          <volume>24</volume>
          (
          <issue>7</issue>
          ),
          <volume>971</volume>
          {
          <fpage>987</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Russakovsky</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Satheesh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Ma,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Karpathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          , et al.:
          <article-title>Imagenet large scale visual recognition challenge</article-title>
          .
          <source>International Journal of Computer Vision</source>
          <volume>115</volume>
          (
          <issue>3</issue>
          ),
          <volume>211</volume>
          {
          <fpage>252</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          .
          <source>CoRR, abs/1409</source>
          .1556 (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , Zhang, H.,
          <string-name>
            <surname>Piramuthu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jagadeesh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DeCoste</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Di</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Hdcnn: Hierarchical deep convolutional neural networks for large scale visual recognition</article-title>
          .
          <source>In: Proc. of the IEEE International Conference on Computer Vision</source>
          . pp.
          <volume>2740</volume>
          {
          <issue>2748</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zeiler</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fergus</surname>
          </string-name>
          , R.:
          <article-title>Visualizing and understanding convolutional networks</article-title>
          .
          <source>In: Computer vision{ECCV</source>
          <year>2014</year>
          , pp.
          <volume>818</volume>
          {
          <fpage>833</fpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>