<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving Model Performance for Plant Image Classification With Filtered Noisy Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andreas R. Ludwig</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Helga Piorek</string-name>
          <email>h.piorek@stud.fh-dortmund.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas H. Kelch</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Rex</string-name>
          <email>darex@live.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sven Koitka</string-name>
          <email>sven.koitka@fh-dortmund.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christoph M. Friedrich</string-name>
          <email>christoph.friedrich@fh-dortmund.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TU Dortmund University Department of Computer Science Otto-Hahn-Str.</institution>
          <addr-line>14, 44227 Dortmund</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Applied Sciences and Arts Dortmund (FHDO) Department of Computer Science Emil-Figge-Strasse 42</institution>
          ,
          <addr-line>44227 Dortmund</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The training of convolutional neural networks for image recognition usually requires large image datasets to produce favorable results. Those large datasets can be acquired by web crawlers that accumulate images based on keywords. Due to the nature of data in the web, these image sets display a broad variation of qualities across the contained items. In this work, a filtering approach for noisy datasets is proposed, utilizing a smaller trusted dataset. Hereby a convolutional neural network is trained on the trusted dataset and then used to construct a filtered subset from the noisy datasets. The methods described in this paper were applied to plant image classification and the created models have been submitted to the PlantCLEF 2017 competition.</p>
      </abstract>
      <kwd-group>
        <kwd>convolutional neural networks</kwd>
        <kwd>plant image classification</kwd>
        <kwd>plantCLEF</kwd>
        <kwd>oversampling</kwd>
        <kwd>transfer learning</kwd>
        <kwd>filtering noisy datasets</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The LifeCLEF plant identification task (PlantCLEF) [
        <xref ref-type="bibr" rid="ref5 ref7">5,7</xref>
        ] is an annual
competition and part of the LifeCLEF evaluation campaign.
      </p>
      <p>
        This year’s PlantCLEF task dataset was comprised of 10,000 classes but
contained only up to around 1,250 samples per class. Some classes were only
represented by a handful of samples. There was a second, noisy dataset, which was
derived from results of Bing and Google search queries. These results are partly
representing images other than plants or have been labeled incorrectly.
Up to now, image classifiers have rarely been trained with datasets with more
than 1,000 classes. In most of these cases, such datasets were only used to test the
effectiveness and results of large distributed learning systems. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] a deep
autoencoder that was trained on an ImageNet-10K [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] dataset comprising 10,000
classes reached a top-1 accuracy of 19.2 %. It was also tested on the
ImageNet21K dataset comprising 21,841 classes, where it reached a top-1 accurary of
15.8 % . These results were later improved by [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], where a top-1 accuracy of
29.8 % on ImageNet-21K was reached with Alexnet [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. With Nearest Class
Mean classification a top-1 accuracy of 23.9 % has been achieved by [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] on the
ImageNet-10K dataset. By using fisher vectors [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] a top-1 accuracy of 19.1 %
was achieved by [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] on the same dataset.
      </p>
      <p>
        This paper aims to show that modern architectures for convolutional neural
networks, such as InceptionV4 or InceptionResNetV2 [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], can achieve
state-ofthe-art results on the given dataset. Furthermore it was assessed if the training
results can be improved by training a subset of the noisy dataset. The subset was
created by filtering the noisy dataset with an already trained neural network.
      </p>
      <p>
        As seen in [
        <xref ref-type="bibr" rid="ref14 ref2 ref3">2,3,14</xref>
        ] transfer-learning can improve the training results of neural
networks. Therefore the models in this work were trained by an approach similar
to the two phases fine-tuning approach described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. At the beginning, the
weights of the output layer were randomly initialized and then trained with a
small learning rate for a few epochs. Afterward, the entire network was trained.
The methodologies of this paper are further described in Section 2. The training
of the neural networks as well as the overall results of this paper are described
in Section 3. The conclusion can be found in Section 4.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>Dataset Split
In order to be able to evaluate the quality of the created classifiers, the data of
the Encyclopedia of Life (EOL) dataset were randomly split into a training and
a validation set. The training set consisted of 90 % of the EOL data and the
validation set consisted of the remaining 10 %. In this way, the pretrained models’
performance could be estimated during development. For the final submission,
the models were trained on the complete EOL dataset in order to take advantage
of the full training set.
2.2</p>
      <p>Regular Training of the Model
The general approach to the task at hand was to utilize pretrained models
provided on the Tensorflow Slim Git page3 as a starting point for the training.
Since the utilized models have been trained to recognize one thousand
different classes, many of which are irrelevant for the PlantCLEF competition, these
3 Maintainer: Nathan Silberman and Sergio Guadarrama; [last access: 29.06.2017]
https://github.com/tensorflow/models/tree/master/slim
models were fine-tuned using the training set. As a result of the fine-tuning,
there were models produced containing only the lower level filters of the
pretrained model and output layers primed to the 10,000 classes of the PlantCLEF
competition. Subsequent to the fine-tuning process, the models were trained
further utilizing the training set and filtered web datasets. The general workflow is
displayed in Figure 1.</p>
      <p>Training set
90% of EOL</p>
      <p>PlantClef 2017 dataset</p>
      <p>Pretrained model
Tensorflow Slim</p>
      <p>Fine-tuning
preparation</p>
      <p>Fine-tuning
Evaluation model</p>
      <p>Training on
complete EOL</p>
      <p>dataset
Finished model</p>
      <p>Validation set
10% of EOL</p>
      <p>
        Tensorflow Slim
Classification with
oversampling
Estimate model
performance
The pretrained models on the Tensorflow Slim Git page were trained on the
ILSVRC-2012-CLS dataset [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The model checkpoints were published with the
accuracy that could be reached on the corresponding test set. Since the
InceptionResNetV2 and InceptionV4 [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] architectures are evidently able to produce
the best accuracies, 80.4 % top-1 for InceptionResNetV2 and 80.2 % top-1 for
InceptionV4. The training was conducted with a fine-tuning procedure similar to
the two-step fine-tuning approach described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The standard TF-slim data
augmentation4 functions were used for training.
2.4
      </p>
      <p>Filtered Web Datasets
After visually examining the web dataset it became apparent, that apart from
plant images in varying qualities, there were also a large number of images that
displayed unrelated items such as postage stamps or music instruments.</p>
      <p>Due to the noisy nature of the web dataset, the data were filtered to reduce
the possibility that the classification results of the existing models are weakened.
For this purpose, the whole web dataset was classified using a model that showed
a good estimate performance. This model has been trained on the whole EOL
set and was submitted as Run 1. Ultimately all the pictures that could be
classified correctly within top-5 predictions were compiled to a web-top-5 dataset
containing 556,584 samples. One could assume that using top-5 web subset for
further training improves the accuracy of the trained model solely with regard
to the classes that have already been learned sufficiently. It would be interesting
to investigate, if filtering different proportions of the predicted classes (e.g.
top100) could improve the accuracy. However due to limitations of processing time
only the filtering approach described above was pursued. Figure 2 visualizes the
procedure used for filtering the web dataset.</p>
      <p>The plots in Figure 3 show the number of occurrences of distinct classes
within the training dataset and the filtered dataset sorted in descending order.
The number of occurrences for every class was logarithmized. The filtered subset
is missing any samples for 696 classes, which is a general problem of the presented
filtering approach. Training data for classes that are already well trained can be
selected from the noisy dataset, but all the images of classes that can not be
classified within a margin will be removed along with the noise. The objective of
further research would be to find a top-x which maximizes noise filtering without
dropping too many classes.
2.5</p>
      <p>
        Oversampling
To improve the results of the classification, multiple crops of an image were used
to calculate an average classification per sample, as described in [
        <xref ref-type="bibr" rid="ref17 ref6 ref8">6,8,17</xref>
        ]. For
each sample, ten crops were created, one at each corner, one centered and a
mirrored version of these five crops. The final prediction was estimated as an
average of the ten samples of every image.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>The estimated performance metrics were calculated using models that had not
yet been trained with the complete EOL dataset. These models were used to
4 Maintainer: Nathan Silberman and Sergio Guadarrama; [last access: 29.06.2017]
https://github.com/tensorflow/models/tree/master/slim</p>
      <p>Model trained on
complete EOL dataset</p>
      <p>Model trained on
training dataset</p>
      <p>Tensorflow Slim</p>
      <p>Classification with oversampling
Construct filtered web dataset from top-5
predictions</p>
      <p>Filtered web dataset
Tensorflow Slim</p>
      <p>Training on filtered web dataset</p>
      <p>Training on EOL dataset</p>
      <p>Finished model</p>
      <p>Noisy web dataset
training set
top−5 filtered set
0
2000
4000
6000
8000</p>
      <p>10000
class
receive an estimate of the performance through classification on the validation
set. It is important to notice that in the EOL dataset, every image belongs to
a unique observation. In contrast, the official test set contains a large number
of observations with multiple images, allowing the model to predict the label
of an image based on an average of multiple images. The official MRR score
was calculated based on observations, whereas the presented validation MRR
scores were estimated on images solely. Table 1 shows the training error of the
finished models and the ensemble. The model performance and accuracy of the
first three runs are presented in Table 2. Run 4 was excluded from both Tables
as it is considered to be broken.
3.1</p>
      <p>FHDO BCSG Run 1 - InceptionResNetV2 Trained on EOL
Dataset
The first submission was trained on the aforementioned pretrained model. For
the fine-tuning training step, the logits and auxiliary logits were randomly
initialized instead of being copied from the utilized checkpoint. These two layers
had been trained using a relatively small learning rate of 0.0045 for five epochs.
The two layers were chosen based on the assumption that the lower level layers
have already been suitably prepared for classification tasks while the topmost
layers needed to be primed onto the 10,000 different classes of the PlantCLEF
competition.</p>
      <p>Subsequently, all the layers of this model were trained on the training set for
fifty epochs with the parameters shown in Table 3. The model that was trained
Mini-batch size 32 32
Steps 50 epochs (334,700 steps) 30 epochs (200,820 steps)
Learning rate 0.045 0.045
Optimizer rmsprop rmsprop
Learning rate decay type exponential exponential
Learning rate decay factor 0.47 0.47
Number of epochs per decay 2 2</p>
      <p>Weight decay 0.00004 0.00004
for fifty epochs yielded an MRR of 0.4661 on the validation set and was therefore
chosen as the model for the first submitted run. To finish the training the net
was trained for another five epochs using the complete EOL dataset. Table 4
shows the effect of oversampling on the estimated MRR scores for Run 1.</p>
      <p>FHDO_BCSG Run 2 - InceptionResNetV2 Trained on EOL
and Web-Top-5 Datasets
The second submission was trained analogously to the first submission. As
mentioned before, a pretrained model was used and only the logits and auxiliary
logits layer were trained for five epochs with a small learning rate of 0.0045.
Eventually, all layers of the model were trained for thirty epochs using the
parameters shown in Table 3.</p>
      <p>In order to examine the effect of the web filtering approach, a web subset was
created using the completely trained Run 1 model. Images were added to this
set if the annotated class of the noisy training set was in the top-5 predictions
of the model trained for Run 1. With this, a web subset finally consisting of
556,584 images was assembled. Following the filter process, the Run 2 model
was trained on the web-top-5 dataset for five epochs and afterwards trained for
another five epochs on the training dataset. The parameters for this training
were chosen analogously to the training of Run 1 with a starting learning rate of
0:000275 and five epochs. The learning rate was adopted from the last training
epoch of the previous training step. To finish up the training for this model, the
net was trained for another five epochs on the complete EOL dataset analogue
to the procedure used for Run 1. Before executing the final training step, the
MRR scores were estimated on the validation set, leading to a result of 0.503.
This value could have been biased though, due to the web-top-5 subset being
compiled with a network that was trained on the complete EOL dataset. Table
5 shows the effects of oversampling on the estimated MRR scores of Run 2.
Oversampling improved the MRR scores slightly at the cost of higher computing
demands.</p>
      <p>
        In order to investigate the effectiveness of the filtering approach, the training
procedure of Run 2 was modified and two more models were trained. The starting
point for both models was an InceptionResNetV2 trained on the training set for
thirty epochs. The training for the first evaluation model was conducted once
again with 556,584 images from the web dataset. Instead of being selected by
the filtering approach, they were chosen randomly. The training for the second
evaluation model was conducted with the complete web dataset. The first model
was trained for five epochs (86,966 steps) and the other model with an equivalent
number of steps (86,966 steps). The training parameters were chosen analogously
to the training of the Run 2 model. Afterwards the models were trained on the
training dataset for another five epochs. The results on the validation set showed
that using the filtered data for training improved the estimated MRR scores
compared to using a randomly drafted dataset or the whole noisy dataset. The
results of this analysis are displayed in Table 6.
Composing an ensemble run from multiple prediction models can improve the
overall accuracy in comparison to a single prediction model [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In order to
create an ensemble, the mean values of all the class predictions of the different
models for every image were calculated and then assembled into a new set of
predictions as shown in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The reasoning behind this is that one model might
be better at predicting certain classes while being worse than the other models
on other classes and vice versa. Presumably, the ensemble would be better if it
was composed from a number of accurate and diverse runs.
      </p>
      <p>Due to the fact, that on the validation set, Run 2 was producing the higher
MRR value of 0.503 compared to the value of 0.471 of Run 1, the models were
weighted for the ensemble. In this way the presumably better model would have
a higher influence on the final prediction. Run 1 contributed 1=3 and Run 2
contributed 2=3 to Run 3.
Run 4 was based on an InceptionV4 architecture but did only achieve modest
estimated MRR scores due to misconfigurations.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>
        The utilized filtering approach improved the predictions of the resulting model
on the validation set and the official score. The filtering approach increased
computation costs. Recent studies suggest [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], that neural networks can only
recognize samples of unknown classes to a certain extent. Since there were only
few samples available for some classes and since some classes were very similar
to one another, there is a chance that samples belong to a known class other
than the one they were labeled with.
      </p>
      <p>
        The use of oversampling leads to a minimal increase in estimated MRR scores,
as shown in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Since multiple crops increase the processing time during
classification, the oversampling method is not suited for every scenario.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement References</title>
      <p>The authors gratefully acknowledge the support of NVIDIA Corporation with
the donation of the Titan X Pascal GPU, which supported this research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Abhijit</given-names>
            <surname>Bendale</surname>
          </string-name>
          and
          <string-name>
            <given-names>Terrance E.</given-names>
            <surname>Boult</surname>
          </string-name>
          <article-title>: Towards Open Set Deep Networks</article-title>
          .
          <source>In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR</source>
          <year>2016</year>
          ). pp.
          <fpage>1563</fpage>
          -
          <lpage>1572</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malik</surname>
          </string-name>
          , J.:
          <article-title>Analyzing the Performance of Multilayer Neural Networks for Object Recognition</article-title>
          .
          <source>In: Proceedings of the 13th European Conference Computer Vision (ECCV</source>
          <year>2014</year>
          ), Zurich, Switzerland, September 6-
          <issue>12</issue>
          ,
          <year>2014</year>
          , Part VII. pp.
          <fpage>329</fpage>
          -
          <lpage>344</lpage>
          . Springer International Publishing (
          <year>2014</year>
          ), http://dx.doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -10584-0_
          <fpage>22</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Branson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horn</surname>
            ,
            <given-names>G.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belongie</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perona</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Bird species categorization using pose normalized deep convolutional nets</article-title>
          .
          <source>In: CoRR</source>
          . vol.
          <source>abs/1406</source>
          .2952 (
          <year>2014</year>
          ), http://arxiv.org/abs/1406.2952
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chilimbi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzue</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Apacible</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyanaraman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Project Adam: Building an Efficient and Scalable Deep Learning Training System</article-title>
          .
          <source>In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation</source>
          (OSDI
          <year>2014</year>
          ). pp.
          <fpage>571</fpage>
          -
          <lpage>582</lpage>
          . USENIX Association, Broomfield, CO (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Goëau</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Plant identification based on noisy web data: the amazing performance of deep learning (LifeCLEF 2017)</article-title>
          .
          <source>In: Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum</source>
          , Dublin, Ireland,
          <fpage>11</fpage>
          -
          <lpage>14</lpage>
          September,
          <year>2017</year>
          .
          <source>CEUR-WS Proceedings Notes</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <article-title>Deep Residual Learning for Image Recognition</article-title>
          .
          <source>In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR</source>
          <year>2016</year>
          )
          <article-title>(</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goëau</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spampinato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vellinga</surname>
            ,
            <given-names>W.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lombardo</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Planqué</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palazzo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
          </string-name>
          , H.:
          <article-title>LifeCLEF 2017 Lab Overview: multimedia species identification challenges</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the 8th International Conference of the CLEF Association, CLEF</source>
          <year>2017</year>
          , Dublin, Ireland,
          <source>September 11-14</source>
          ,
          <year>2017</year>
          .
          <source>Lecture Notes of Computer Science (LNCS)</source>
          , vol.
          <volume>10456</volume>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Koitka</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          :
          <article-title>Optimized Convolutional Neural Network Ensembles for Medical Subfigure Classification</article-title>
          . In:
          <article-title>Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the 8th International Conference of the CLEF Association, CLEF</source>
          <year>2017</year>
          , Dublin, Ireland,
          <source>September 11-14</source>
          ,
          <year>2017</year>
          .
          <source>Lecture Notes of Computer Science (LNCS)</source>
          , vol.
          <volume>10456</volume>
          . Springer Verlag (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.E.:
          <article-title>ImageNet Classification with Deep Convolutional Neural Networks</article-title>
          . In: Pereira,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Burges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.J.C.</given-names>
            ,
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Weinberger</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.Q</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>25</volume>
          , pp.
          <fpage>1097</fpage>
          -
          <lpage>1105</lpage>
          . Curran Associates, Inc. (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kuncheva</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Combining Pattern Classifiers: Methods and Algorithms</article-title>
          . Wiley,
          <volume>2</volume>
          <fpage>edn</fpage>
          . (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monga</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Building high-level features using large scale unsupervised learning</article-title>
          . In: Langford,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Pineau</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 29th International Conference on Machine Learning (ICML</source>
          <year>2012</year>
          ). pp.
          <fpage>81</fpage>
          -
          <lpage>88</lpage>
          . New York, USA (
          <year>July 2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mensink</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verbeek</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perronnin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Csurka</surname>
          </string-name>
          , G.:
          <article-title>Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>35</volume>
          (
          <issue>11</issue>
          ),
          <fpage>2624</fpage>
          -
          <lpage>2637</lpage>
          (
          <year>Nov 2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Perronnin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akata</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harchaoui</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmid</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Towards Good Practice in Large-Scale Learning for Image Classification</article-title>
          .
          <source>In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR</source>
          <year>2012</year>
          ). pp.
          <fpage>3482</fpage>
          -
          <lpage>3489</lpage>
          . IEEE (Jun
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Reyes</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caicedo</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Camargo</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          :
          <article-title>Fine-tuning deep convolutional networks for plant recognition</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.J.F.</given-names>
            ,
            <surname>SanJuan</surname>
          </string-name>
          , E. (eds.)
          <source>CLEF (Working Notes)</source>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>1391</volume>
          . CEURWS.org (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Russakovsky</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Satheesh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Ma,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Karpathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Berg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.C.</given-names>
            ,
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          :
          <article-title>ImageNet Large Scale Visual Recognition Challenge</article-title>
          .
          <source>International Journal of Computer Vision</source>
          (IJCV)
          <volume>115</volume>
          (
          <issue>3</issue>
          ),
          <fpage>211</fpage>
          -
          <lpage>252</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Sánchez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perronnin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mensink</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verbeek</surname>
          </string-name>
          , J.:
          <article-title>Image Classification with the Fisher Vector: Theory and Practice</article-title>
          .
          <source>International Journal of Computer Vision</source>
          <volume>105</volume>
          (
          <issue>3</issue>
          ),
          <fpage>222</fpage>
          -
          <lpage>245</lpage>
          (
          <year>2013</year>
          ), http://dx.doi.org/10.1007/s11263-013-0636-x
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ioffe</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoucke</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alemi</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          :
          <article-title>Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning</article-title>
          .
          <source>In: International Conference on Learning Representations 2016 Workshop (ICLR</source>
          <year>2016</year>
          )
          <article-title>(</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>