<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean
Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexan-
der C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge.
International Journal of Computer Vision (IJCV)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>NovaSearch at ImageCLEFmed 2016 Subfigure Classification Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David Semedo</string-name>
          <email>df.semedo@campus.fct.unl.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>João Magalhães</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>NOVA LINCS, Department of Computer Science Faculty of Science and Technology Universidade NOVA de Lisboa</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>115</volume>
      <issue>3</issue>
      <fpage>16</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>This paper describes the NovaSearch team participation in the ImageCLEF 2016 Medical Task in the subfigure classification subtask. Deep learning techniques have proved to be very effective in automatic representation learning and classification tasks with general data. More specifically, convolutional neural networks (CNNs) have surpassed humanlevel performance in the ImageNET classification task, making them a promising model for the task of medical modality classification. We assess how each model behave when dealing with medical images, by developing three different models, with different depths and components, and analyse the impact of these factors in the performance. One of the key ingredients for the effectiveness of CNNs (and deep learning in general) is the use of large amounts of data for training. This subtask scenario is completely different, due to the small size of the dataset, implying a significant risk of overfitting. We apply state-of-the-art techniques developed to reduce overfitting in these networks to our models and evaluate their effectiveness. Our best model achieves 65:31% accuracy on the test set using only the training data provided.</p>
      </abstract>
      <kwd-group>
        <kwd>Medical Modality Classification</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Convolutional Neural Networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This paper describes the NovaSearch team submissions, from the Faculty of
Science and Technology of Universidade Nova de Lisboa, to the ImageCLEF
2016 [16] Medical task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This task consists of five subtasks: compound figure
detection, multi-label classification, figure separation, subfigure classification and
caption prediction. We addressed the subfigure classification task, which aims to
classify medical images within a given set of modalities.
      </p>
      <p>We were interested in evaluating deep learning methods, namely
convolutional neural networks (CNNs), in the specific scenario of this subtask. More
concretely, we wanted to evaluate their effectiveness when dealing with medical
images, which possess very distinct characteristics compared to general images,
and for which they have proved lately to be very effective.</p>
      <p>The remainder of this paper is organised as follows. In section 2 we describe
our approach. More specifically, we present and discuss in detail the models
and techniques used in our submitted runs. The configuration and results of
each run are presented and discussed in section 3. Additionally, we compare our
classifier results with the results achieved by the winning team. Finally, we draw
conclusions and present future work perspectives on section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>Regarding medical diagnostic and research, effective medical image retrieval
systems (MIRS) can be a valuable tool for aiding clinicians. Furthermore, images
in biomedical literature usually are compound figures with multiple panels each,
with an image from a given modality (e.g. in figure 1), promoting human
readability/interpretation by grouping correlated images, but making automatic
retrieval more difficult.</p>
      <p>ImageCLEFmed promotes research in this direction, by proposing a set of
tasks, which result from splitting the problem of building a MIRS. Concretely,
an effective MIRS must be capable of identifying and separate compound figures
from biomedical articles and classify each subfigure with a given modality.</p>
      <p>
        Knowing the modality of a medical image has been shown to be important to
improve the performance of MIRS [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The subfigure task aims at classifying each
subfigure, from a collection of figures from compound images found in biomedical
articles from PubMed Central1, into 30 modalities structured hierarchically (the
hierarchy is defined in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]).
      </p>
      <p>
        We tackled the subfigure task with a deep neural network classifiers. Deep
neural networks have recently achieved very good results in representation
learning and classification of images [
        <xref ref-type="bibr" rid="ref4 ref5 ref9">9, 11, 14, 5, 4</xref>
        ]. Regarding image classification,
Convolutional Neural Networks (CNNs), which are a type of neural network
suitable for images capable of learning automatically high-level representations
by definition, have proved to be very effective in this task. Since this task involves
classifying images, CNNs were a natural choice.
      </p>
      <p>Training Set Class Distribution
2954
0.4
0.3
se
l
p
m
a
S
#
zaed
il
rom0.2
N
0.1
906</p>
      <p>696
201 208
300
17 33 61 139 14 26 51 10 8 5 29 16 54 61
20
1 PubMed Central (http://www.ncbi.nlm.nih.gov/pmc/) is a subset of PubMed, one of
the major research databases containing scholarly articles that have been published
within the biomedical and life sciences journal literature.
the dataset is highly unbalanced, specially towards the CFIG (Statistical figures,
graphs, charts) class which represents 42% of the examples.</p>
      <p>Deep learning methods have achieved great results using very large datasets
like the ImageNet challenge dataset [10] which contains roughly 1.2 million
images from 1000 classes with approximately 1000 images per class. However, in
the present task, the size of the dataset imposes a big challenge due to its size,
specially for deep learning methods, which tend to overfit with small datasets.
The fact that the dataset is highly unbalanced, making this a very challenging
task.</p>
      <p>We intend to evaluate how CNNs behave in such a scenario, which is clearly
different from the scenarios in which they excelled, and experiment
state-of-theart techniques used to improve their performance.
2.2</p>
      <sec id="sec-2-1">
        <title>Convolutional Neural Networks</title>
        <p>Convolutional neural networks are a type of neural network which have very
interesting characteristics. These networks are able to automatically learn
highlevel and hierarchical representations from data which eliminates the necessity
of selecting a good set of low-level and high-level features to describe each image
by hand.</p>
        <p>
          By definition, CNNs make some assumptions (which are correct in general)
regarding the stationarity of statistics and locality of pixel dependencies, allowing
for a reduction in the number of connections and consequently the number of
parameters to learn [
          <xref ref-type="bibr" rid="ref1 ref9">1, 9</xref>
          ]. Through depth and breadth one can control their
capacity of identifying high-level data representations and relationships between
the input and the output. From this reduction in the number of parameters and
with current GPUs processing power, the task of training deep convolutional
networks is feasible. The motivation for building and training deep CNNs is
based on the fact that as the number of layers (depth) of the network increases,
so the capacity of detecting more high-level details does, in principle.
2.3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Dealing with Unbalanced and Small Datasets</title>
        <p>Despite the fact that this task involves classifying medical images, which have
a set of exclusive characteristics of their own, the challenges of working with
unbalanced and small datasets is a general problem in machine learning. In this
section we describe a set of techniques that we applied to our models in order
to take into account the training dataset characteristics.</p>
        <p>
          Large networks tend to overfit. Traditional techniques to solve the overfitting
problem consist of stopping the training procedure as soon as the validation
error starts increasing (early-stopping) or in using regularisation techniques like
adding an extra term in the function to be minimised or limiting the complexity
of the model [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>Recently, a technique named Dropout was proposed in [12] which has shown
to be very effective in reducing overfitting in large networks, leading to
performance improvements. Dropout can be interpreted as a stochastic regularization
technique which essentially consists in randomly dropping neurons and their
respective connections during training. The idea is to prevent neurons from
coadapting too much to data, making overfitting less likely.</p>
        <p>Another technique we used to address overfitting and the class imbalance
problem was data augmentation. More concretely, we performed real time data
augmentation, in the sense that new images are generated from sample images at
each training batch construction. The following operations are applied randomly
to sampled images: horizontal/vertical shifting and horizontal/vertical flipping.</p>
        <p>For classes which have few examples, the network may not be able to learn
discriminative properties and may fail to generalise. Furthermore, in section 2.1
we pointed out that some classes (e.g. CFIG) dominate the dataset, which means
that the majority of the weight updates during the training phase will be based
in examples of these classes. It is therefore important to avoid focusing learning
in some classes or we take the risk that the network will classify classes with
few examples into a dominating class. We attempt to address this problem by
modifying the loss function and making it a weighted loss function. We assign
weights to each class such that it is worse to misclassify an image from a class
with few examples.</p>
        <p>Let Pw be the ideal number of examples of each class ci assuming that the
dataset is perfectly balanced. Then, Pw is obtained as follows:
where N is the dataset size and N C is the total number of classes. The weight
wi for a class ci is computed using the following expression:</p>
        <p>Pw =</p>
        <p>N</p>
        <p>N C
wi =</p>
        <p>Pw
jSij
where jSij is the cardinality of the set of samples of class ci from the training
dataset. After computing the weights, each wi is normalised such that wi 2 [0; 1].
By computing the weights with the expression above, we somehow simulate
training with a perfectly balanced dataset. This is based on the fact that the
values of the gradient to be back-propagated will be amplified for classes with
few examples and reduced for the remaining ones.</p>
        <p>In our first experiments these technique did not yield any performance
improvement. In fact it deteriorated the performance of our models. After some
experiments we concluded that the problem was that the network was severely
misclassifying images from dominating classes due to the fact the respective
weights are very small. To address this issue we introduced a lower bound on
the weights values.
2.4</p>
      </sec>
      <sec id="sec-2-3">
        <title>CNN Developed Models</title>
        <p>Our submitted runs are based on essentially three different CNN models which
we describe in the following sections and that will denoted from now on by
V GG1-CNN, V GG2-CNN and P ReLU -CNN.
(1)
(2)
For all the network models, the input consists of an 224x224 matrix.</p>
        <p>The last layer of the networks has the sof tmax activation function with
dimension 30 (number of modalities of the medical classification subtask). The
sof tmax function enforces the constraint that outputs must lie between 0 and 1,
and the sum of all output values is equal to 1, allowing the outputs of the network
to be interpreted as posterior probabilities for categorical target variables.</p>
        <p>
          For the special case of medical images, and considering the ImageCLEFmed
hierarchy, images may share characteristics which in principle can help the
classifier discriminate between modalities (e.g. certainly all Radiology images share
some characteristics). However, taking this into account would require a different
and possibly more complex approach. By using sof tmax we do not model the
fact that the modalities are structured hierarchically, and the final model is not
an hierarchical classifier, but an unstructured one. This relaxation has proved
to be effective while not increasing the model conceptual complexity [
          <xref ref-type="bibr" rid="ref9">9, 11, 14</xref>
          ].
VGG-like models Both V GG1-CNN and V GG2-CNN models are inspired in
the VGG model proposed in [11] which achieved top results in the ImageNET
Large Scale Visual Recognition Challenge 2014 (ILSVRC 2014).
VGG2-CNN
VGG1-CNN
        </p>
        <p>Dropout</p>
        <p>Dropout</p>
        <p>
          This model consists of a deep network with several identically parametrised
convolutional layers using small receptive fields (3x3). By using small receptive
fields a deeper network can be achieved while keeping the number of
parameters equivalent to more shallow networks. Max-pooling is performed after some
convolutional layers. For all hidden layers the activation function used is the
Rectified Linear Unit [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] (ReLU) which has shown to yield faster training since
it does not saturate as the tanh and sigmoid activation functions.
        </p>
        <p>We performed some modifications mainly due to computational constraints
but also due to overfitting. The original best performing models have 16 and
19 weight layers respectively, and roughly 140 million parameters each. This
induces large training times and requires a large dataset such that the network
can effectively learn all the parameters and generalize. The medical subfigures
dataset is very small compared to the ILSVRC dataset and overfitting becomes
a serious issue. Therefore, we reduced the number of convolutional layers and
removed one of the fully connected layers. Additionally, we reduced the number
of channels in each convolutional layer, in a way that the new values are still
proportional to the original model. All the remaining components and
characteristics of the architecture like the max-pooling (and its shape), dropout and
activation functions are preserved.</p>
        <p>Figure 3 presents both model architectures. The difference between both
models is in the depth (V GG2-CNN has greater depth). Just like in the original
VGG model, dropout is used between the last convolutional layer and the first
fully connected layer, and between this and the softmax layer.</p>
      </sec>
      <sec id="sec-2-4">
        <title>PReLU with Batch Normalization Model</title>
        <p>The ReLU activation function has a reduced likelihood of suffering from gradient
vanishing. This is due to the fact that derivatives through the rectifier remain
large whenever the unit is active, unlike other activation functions like sigmoid
and tanh. However, when the value of the argument of the ReLU function is not
positive, the gradient will be 0 and it will not be able to learn.</p>
        <p>
          Recently, the Parametric ReLU (PReLU) was proposed in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] which consists
in changing the slope of the ReLU function for inputs in R 0, by multiplying it
by a coefficient ai. The coefficient ai is treated as a learnable parameter. The
final expression is:
f (xi) = max(0; xi) + aimin(0; xi)
(3)
With these expression, when the inputs of the function are in R 0, the gradient
will not be 0.
        </p>
        <p>
          In deep neural networks, despite the fact that in pre-processing steps the
network inputs are normalised (shifted to zero-mean and unit variance), the
distribution of each layer’s inputs changes during training in function of the
parameters of the previous layers. This problem is referred as internal
covariate shift [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Consequently, one needs to use lower learning rates and carefully
initialise the parameters, slowing down training.
        </p>
        <sec id="sec-2-4-1">
          <title>Batch</title>
        </sec>
        <sec id="sec-2-4-2">
          <title>Normalization</title>
        </sec>
        <sec id="sec-2-4-3">
          <title>Max-Pooling 2x2</title>
          <p>P
R
e
L
U
P
R
e
L
U</p>
        </sec>
        <sec id="sec-2-4-4">
          <title>Batch</title>
        </sec>
        <sec id="sec-2-4-5">
          <title>Normalization</title>
        </sec>
        <sec id="sec-2-4-6">
          <title>Max-Pooling 2x2</title>
          <p>P
R
e
L
U
P
R
e
L
U</p>
          <p>
            To address these problems in [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] the Batch Normalisation technique is
proposed. It essentially consists of performing mini-batch normalisation at
intermediate nodes of the network.
          </p>
          <p>We developed a CNN model (P ReLU -CNN) which uses both PReLU’s and
batch normalisation techniques. The model can be seen in figure 4. Both these
techniques increase the computational requirements during training, therefore,
we had to reduce the network depth to be able to train the network in feasible
time for the submission.</p>
        </sec>
      </sec>
      <sec id="sec-2-5">
        <title>Networks Hyperparameters</title>
        <p>The three proposed models are trained using stochastic gradient descent with
Nesterov momentum [13] and learning rate decay is applied. The training
hyperparameters values are the same and can be seen in Table 1. These values were
defined empirically.</p>
        <p>
          Our model implementation allows enabling or disabling dropout. In section 3
in which we describe our submitted runs, we point out which ones use it. The
probability of dropping units between the last convolutional layer and the first
fully connected layer is 0:25, between the first fully connected layer and the
sof tmax layer is 0:5. Xavier initialisation algorithm [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] was used for initialising
all the weights of our models.
As stated in the previous section, despite the fact that our models architecture is
based on the VGG model which takes as input an RGB image (the image matrix
has one dimension for each color channel) we only used one channel, to reduce
the training time. All images are resized to 224x224. Additionally, the elements
of each image matrix were scaled to the interval [0; 1].
        </p>
        <p>We split training data into 70/30 training and validation splits, respectively.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Submitted Runs and Results</title>
      <p>All our models were trained using exclusively the training dataset provided. The
dataset has the same sample images as the set defined by the merge of the
training and test sets from the 2015 edition of the subfigure task. Therefore, for
fixing parameters we used the 2015 training/test split since it will have a similar
distribution.</p>
      <p>Training CNNs is a very computationally demanding task. Recently several
efficient libraries that take advantage of GPU computing capabilities to speedup
computations, have been developed. We implemented our models using Keras2
along with Theano [15] (Python) libraries. Keras is a neural networks python
library that features a rich set of components for developing neural networks
models and Theano is an efficient numerical computation library with GPU
support.</p>
      <p>We submitted four runs for the subfigure classification task. Table 2 depicts
each run configuration and the epoch in which the lowest validation error was
obtained.
2 Keras python library: https://github.com/fchollet/keras
is not enabled. We verified that except for run #1 which achieves the lowest
validation error at roughly half of the total number of epochs, in remaining runs
it is achieved in late epochs.</p>
      <p>The results (accuracy on validation and test datasets) for each run are
presented in table 3. Our models are not in par with the best results achieved for
the task in the current edition.</p>
      <p>A first observation is that our models performance both on validation and
test sets are very similar, which indicates that training splits were suited, i.e.,
have an identical distribution as the test set. The deeper model with dropout
(run #3) achieved the best result (65:31% accuracy) taking 6 hours to train.
The exact same model with the same configuration but without dropout (run
#2) took almost 6 hours to train and achieved approximately less 3% accuracy.
Therefore, it is clear that dropout does contribute to performance improvements.</p>
      <p>From the results table, we can also see that the run #1 got almost as good
results as run #3 despite the fact that its model is not as deep, taking 4 hours
to train. This is an indication that we are probably observing overfitting on both
models.</p>
      <p>To assess this issue we focused on the best performing run model (#3). We
plotted in the y -axis the error obtained in the training and validation splits across
the 500 epochs (x -axis) for the model used in our best run (#3). The plot is
shown in figure 5. The error curves are very noisy but it is visible that the model
is able to gradually converge towards smaller errors until it starts overfitting.
We believe that using a low learning rate (0.005) was crucial to achieve this
convergence.</p>
      <p>Although this model uses both dropout and data augmentation techniques,
which help regularising the model, it is clear that overfitting behaviour can be
verified from epoch 100. The training dataset class skewness and size contribute
to this behaviour. Therefore, a different approach than the one we used for this
subtask is needed to deal with overfitting and help the model generalise. The
remaining models also suffer from overfitting.</p>
      <p>Regarding run #4 which uses the P ReLU -CNN model, it achieved the third
best result on the test set, with an accuracy of 63:8%, outperforming model
100
200
300</p>
      <p>400
# Epoch
V GG1-CNN, despite being a less deeper model and taking significantly more
time to train ( 21 hours). This is very likely a consequence of using both
PReLUs and batch normalisation techniques, although in order to confirm this
fact, additional experiments using a network with the same depth as the ones
from models V GG1-CNN and V GG2-CNN are required such that models are
comparable.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>In this paper we developed three deep learning models for tackling the subfigure
medical modality classification subtask. We performed a set of experiments using
CNNs, which excel with general images in scenarios where large amounts of data
are available but not with small and highly unbalanced datasets. Our objective
was to assess their effectiveness in classifying medical images with respect to this
task challenging scenario. A set of state-of-the-art techniques for dealing with
overfitting in CNNs were used in the models developed. Our best model achieved
an accuracy of 65:30% using only training data provided, which we believe its
a good result given the challenge, making CNNs a promising tool for medical
image modality classification.</p>
      <p>We observed severe overfitting in our models despite our efforts in reducing
it, therefore, a different approach is needed. Namely, one technique would be to
modify the batch construction and ensure that batches are constructed by
sampling from the dataset with all the 30 classes being uniformly distributed. The
key idea is that the probability of each batch having a sample from a given class
is the same for all classes, possibly reducing overfitting on the most dominant
classes.</p>
      <p>A better hyperparameters tuning protocol must be used so that we
understand the true influence of each hyperparameter in the results achieved.
Additionally, we may be losing important properties of images by not using all
the three color channels, therefore our experiments should be repeated with the
input of all the models being an RGB image.</p>
      <p>Acknowledgements This research was partially supported by the project
NOVA LINCS Ref. UID/CEC/04516/2013. We gratefully acknowledge the
support of NVIDIA Corporation with the donation of the Titan X GPU used for
this research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Ian</given-names>
            <surname>Goodfellow Yoshua Bengio</surname>
          </string-name>
          and
          <string-name>
            <given-names>Aaron</given-names>
            <surname>Courville</surname>
          </string-name>
          .
          <article-title>Deep learning. Book in preparation for</article-title>
          MIT Press,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Alba</given-names>
            <surname>García Seco de Herrera</surname>
          </string-name>
          , Roger Schaer, Stefano Bromuri, and
          <string-name>
            <given-names>Henning</given-names>
            <surname>Müller</surname>
          </string-name>
          .
          <article-title>Overview of the ImageCLEF 2016 medical task</article-title>
          .
          <source>In Working Notes of CLEF</source>
          <year>2016</year>
          (
          <article-title>Cross Language Evaluation Forum)</article-title>
          ,
          <year>September 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Xavier</given-names>
            <surname>Glorot</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <article-title>Understanding the difficulty of training deep feedforward neural networks</article-title>
          .
          <source>In In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS'10)</source>
          .
          <source>Society for Artificial Intelligence and Statistics</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kaiming</surname>
            <given-names>He</given-names>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren, and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In arXiv preprint arXiv:1506.01497</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kaiming</surname>
            <given-names>He</given-names>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren, and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <article-title>Delving deep into rectifiers: Surpassing human-level performance on imagenet classification</article-title>
          .
          <source>In 2015 IEEE International Conference on Computer Vision</source>
          , ICCV 2015, Santiago, Chile, December 7-
          <issue>13</issue>
          ,
          <year>2015</year>
          , pages
          <fpage>1026</fpage>
          -
          <lpage>1034</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Sergey</given-names>
            <surname>Ioffe</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Szegedy</surname>
          </string-name>
          .
          <article-title>Batch normalization: Accelerating deep network training by reducing internal covariate shift</article-title>
          .
          <source>In Proceedings of the 32nd International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2015</year>
          , Lille, France,
          <fpage>6</fpage>
          -
          <issue>11</issue>
          <year>July 2015</year>
          , pages
          <fpage>448</fpage>
          -
          <lpage>456</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Jayashree</given-names>
            <surname>Kalpathy-Cramer</surname>
          </string-name>
          , Alba García Seco de Herrera, Dina Demner-Fushman, Sameer Antani, Steven Bedrick, and
          <string-name>
            <given-names>Henning</given-names>
            <surname>Müller</surname>
          </string-name>
          .
          <article-title>Evaluating performance of biomedical image retrieval systems - an overview of the medical image retrieval task at ImageCLEF 2004-2014</article-title>
          .
          <source>Computerized Medical Imaging and Graphics</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Jayashree</given-names>
            <surname>Kalpathy-Cramer</surname>
          </string-name>
          and
          <string-name>
            <given-names>William R.</given-names>
            <surname>Hersh</surname>
          </string-name>
          .
          <article-title>Automatic image modality based classification and annotation to improve medical image retrieval</article-title>
          .
          <source>In MEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics - Building Sustainable Health Systems</source>
          ,
          <volume>20</volume>
          -
          <issue>24</issue>
          <year>August</year>
          ,
          <year>2007</year>
          , Brisbane, Australia, pages
          <fpage>1334</fpage>
          -
          <lpage>1338</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Alex</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , Ilya Sutskever, and
          <string-name>
            <given-names>Geoffrey E</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          ,
          <source>NIPS 2012</source>
          , pages
          <fpage>1097</fpage>
          -
          <lpage>1105</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>