<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Investigating the Robustness of Pre-trained Networks on OCT-Dataset</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipl.-Inf. Holger Langner Professorship Media Informatics, University of Applied Sciences Mittweida Technikumplatz 17</institution>
          ,
          <addr-line>Mittweida</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Prof. Dr. Marc Ritter Professorship Media Informatics, University of Applied Sciences Mittweida Technikumplatz 17</institution>
          ,
          <addr-line>Mittweida</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Prof. Dr. Maximilian Eibl Chair of Media Informatics, Chemnitz University of Technology Chemnitz</institution>
          ,
          <addr-line>Germany D-09111</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Rama Hasan M.Sc. Chair of Media Informatics, Chemnitz University of Technology Chemnitz</institution>
          ,
          <addr-line>Germany D-09111</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>200</fpage>
      <lpage>207</lpage>
      <abstract>
        <p>Convolutional Neural Networks (CNN) is one of the main categories that have proven highly effective in various high-level tasks such as image classification. Pre-trained Neural Networks are models introduced in ILSVRC (ImageNet-LargeScale-Visual-Recognition-Challenge) which have been trained successfully for hundreds of hours on powerful GPUs. Furthermore, they are applicable to new application domains. The aim of this work is to investigate the effectiveness and the application of pre-trained models from natural (non-medical) images to images from the OCT (optical coherence tomography) domain in ophthalmology. The experiments show the robustness of a series of models without the demand to train a model from scratch again, what leads in effect to reduced training times and computational costs.</p>
      </abstract>
      <kwd-group>
        <kwd>Pre-trained CNN</kwd>
        <kwd>Transfer Learning</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>OCT dataset</kwd>
        <kwd>Ophthalmology</kwd>
        <kwd>ILSVRC</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In recent years Convolution Neural Networks (CNNs) have been used widely as a powerful tool to solve several
Machinelearning tasks in several domains like natural language processing, speech recognition and computer vision [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as well as
semantic segmentation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or object detection [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The power of CNNs became stronger and more effective after the
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) competition in 2010 which made a revolution through the
efficient use of graphics processing units (GPUs), rectified linear units, new dropout regularization, and effective data
augmentation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The success was achieved primarily by deep CNNs, while the depth of the network makes it more robust
and allows for extracting a set of discriminating features at multiple levels of abstraction. Training a deep CNN from
scratch requires a huge amount of labeled training data that represents a big challenge in domains like medical image
classification and detection since in a lot of use cases or applications, it is not easy to obtain such high numbers of labeled
data. In addition to the extensive computing and storage resources that the network requires in order to overcome the
training time-consuming. However pre-trained Neural Networks introduced in ILSVRC have been trained on a large
benchmark (of natural images) dataset [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for hundreds of hours on powerful GPUs in order to solve a problem similar to
the one that we want to solve in the remainder of the paper. Thus, they could be used as a starting point for a new training
problem without the need to train our network from scratch again, especially by tweaking the already trained convolutional
layers in order to fit our problems by fine-tuning and transfer-learning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Despite the significant differences between
natural and medical images, natural image descriptors such as the scale-invariant feature transform (SIFT) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and the
histogram of oriented gradients (HOG) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] have been widely used for object detection and segmentation in medical image
analysis. Recently, several studies are employed to solve diagnosis medical problems by using transfer learning.
      </p>
      <p>
        Azizpour [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] suggests that the success of knowledge transfer depends on the contrast or difference between the dataset
on which a CNN is trained and the dataset to which the knowledge is to be transferred. The study shows that it is possible
to transfer the knowledge from networks trained on natural (non-medical) images to medical images. In Bar et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
pretrained CNNs are used as a feature generator for chest pathology identification. Ginneken et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] suggest that the
integration of CNN-based features together with handcrafted features enables improved performance. Chen et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] used
the fine-tuned pre-trained network to localize standard planes in ultrasound images. Tajbakhsh et al. [13] show that
finetuned CNNs and fully trained CNNs outperform the corresponding handcrafted alternatives in medical imaging
applications.
      </p>
      <p>The aim of this work is to investigate the effectiveness of the application of pre-trained (natural image) models on
specifically chosen foreign domains in order to determine the degree of transferability. OCT (Optical coherence
tomography) images have been used in the following study. The Experiments carried out make use of two of the most
widely spread pre-trained CNNs: VGG16 and Resnet50 [14,15]. In addition, and in contrast, a CNN handcraft architecture
has been built and trained from scratch on our OCT-image set. In order to better understand how Convolutional Neural
Networks make their decisions, we apply Gradient-weighted Class Activation Mapping (Grad-CAM) [16] visualization
method on a pre-trained Resnet50.</p>
      <p>The remainder of this study is organized as follows: Section 2 presents the description of the OCT datasets. An overview
of pre-trained Convolutional Networks is given in Section 3. Methodology and applied networks architectures are briefly
outline in Section 4. Section 5 comprises our experimental study and an introduction to our results. Finally, our findings are
briefly summed up in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2 OCT Dataset</title>
      <p>''Optical coherence tomography (OCT) is an optical analog of ultrasound imaging that uses low coherence interferometry to
produce cross-sectional images of the retina. It captures optical scattering from the tissue to decode spatial details of tissue
microstructures. It uses infrared light from a super-luminescent diode that is divided into two parts: one of which is
reflected from a reference mirror and the other is scattered from the biological tissue. The two reflected beams of light are
made to produce interference patterns to obtain the echo time delay and their amplitude information that makes up an
AScan. A-Scans that are captured at adjacent retinal locations by transverse scanning mechanism are combined to produce a
2-dimensional image.'' [17].</p>
      <p>Our dataset consists of real clinical images which had been acquired during ten years of practice at The Eye Center in the
Medical Center of the University of Freiburg in Germany during 2007 and 2018. It contains ophthalmological data for
about 3,600 patients. Each patient suffers from Age-Related Macular Degeneration (AMD) [18] or a related disease such as
(diabetic retinopathy or retinal vein occlusion). The data for each patient had been collected during a long-term application
of Anti-VEGF therapy [19], and it remains unfiltered, i.e. patients suffer from other eye diseases (e.g. glaucoma or
cataract), too. Figure (1.a) shows a healthy macula, the Retinal Pigment Epithelium in the middle appears almost as a
straight and smooth line. On the other hand, the presence of druses represents an optical marker for dry AMD (see Figure
(1.b)).</p>
      <p>Typical signs referring to fluid AMD as the new but abnormal blood vessels grow (Choroidal Neovascularizations) are
contained in Figure (1.c). The leakiness, which leads to aggregation of fluid, i.e. intraretinal or subretinal edema, also leads
to a scarf (fibrosis) as shown in Figures (1.d and 1.e) respectively. We faced several OCT data training problems. In this
work, our experiments focus on the Visual Acuity Performance (VP) classification problem where only a few patients could
be safely associated with a specific performer class, where after giving the very first OCT finding of a given patient, and
after a long period of time and therapy, the performance class could be expected with a significant confidence.</p>
    </sec>
    <sec id="sec-3">
      <title>3 Pre-Trained Networks</title>
      <p>The VGG-16 is a CNNs which is pre-trained deep network using more than one million images retrieved from the
ImageNet dataset. This network is designed by its simplicity employing only 3×3 convolution layers which are stacked on
top of each other at increasing depth. The volume size is minimized by Max Pooling. Then, two fully connected layers (of
4,096 neurons) are followed by a softmax classifier. VGG-16 contains 16 deep layers and is capable to classify images into
1,000 classes such as a mouse, keyboard, pencil and animals etc. Consequently, the network has learned extensive feature
representations for a variety of images. The network has an input image size of 224 x 224 pixels.</p>
      <p>ResNet-50 is a pre-trained convolutional neural network which also utilized more than 1 million images retrieved from
the ImageNet dataset during the training process. ResNet-50 employs deep residual learning on 50 layers and has the ability
to classify a large number of objects into 1,000 classes like VGG-16 while also maintaining an input image size of 224 x
224 pixels.</p>
    </sec>
    <sec id="sec-4">
      <title>4 Methodology</title>
      <p>In the following, we practically investigate the robustness of pre-trained natural image CNNs on the OCT domain. The
biggest challenge arises through the difficulty of correctly classifying these kinds of medical images by ophthalmologists.
We examine the transferability of knowledge embedded in pre-trained CNNs for this type of medical images. We also
employ the Grade-CAM technique to visualize the regions on the image input, which is important for these pre-formed
CNN predictions, in order to gain a better understanding of how these networks create their decisions. Our experiments are
conducted on our classification problem of Visual Acuity Performance (VP). The VP problem set contains three classes:
 1-decreasing: the visual acuity of the patient drops after a period of time.
 2-stable: the visual acuity of the patient stabilizes after a long period; however, it needs consistent therapy.
 3-increasing: the patient's visual acuity increased immediately from therapy,</p>
      <p>(the problem of the outcome of the therapy quality).</p>
      <p>We used VGG16 pre-trained network [20] keeping the weights and filters of the top layers of the network which identify
simple features like edges, lines, and corners and retrained the last four layers. Then, we added a fully connected layer
followed by a Softmax activation [21] with a number of outputs corresponding to the number of classes in each OCT-image
set. The same procedure was applied to the Resnet50 pre-trained network. We apply Grad-CAM to all those networks in
order to highlight the specific discriminative regions of an image detected by the pre-trained CNNs. The annotated ground
truth labels are converted and forwarded to the last layer in order to calculate the appropriate class scores.</p>
      <p>The workflow of Grad-CAM is shown in Figure 2, where for all classes, the gradient is set to zero except that the true
class which is set to 1. The error signal is then back propagated to the feature map of interest where the Grad-CAM
localizations use the gradients of the target class flowing into the final convolutional layer to create a coarse localization
map which highlights the important parts in the image for the predicting of the respective class [16].
Three convolutional neural networks (pre-trained VGG16, pre-trained ResNet50 and a handcrafted CNN) are used in our
experiments. Since the lower layers only detect more (localized) general and simple features like edges and lines, and as the
network increases the complexity in the higher layers, we decided to retrain the last four layers and leave the others frozen
whereas a frozen layer does not change during training in VGG16 pre-trained network. Then, we added a fully connected
layer with 512 neurons and ReLU activation [22] followed by dropout layer in order to avoid overfitting. In addition, an
output layer with a number of neurons matches the number of classes followed by Softmax activation. We employed an
Adam optimizer with a learning rate of 0.0001 [23]. Within the Resnet50 pre-trained network, also the first 41 layers are
frozen; a flattened layer followed by a dense layer with several neurons matching the number of classes by using Softmax
activation. SGD optimizer is used with 0.01 learning rate [24]. Our handcrafted CNN consists of four convolutional layers
with filter-weights of sizes (5x5x32), (5x5x64), (7x7x64), (7x7x128), respectively. Each convolutional layer is followed by
a (2x2) max pooling and ReLU activation. In addition, a fully connected layer with 512 neurons is followed by a last output
layer with a number of neurons equal to the number of classes within the Softmax activation.</p>
      <p>Our dataset consists of 8,434 OCT-images. 20% of our samples are used as the validation set and 20% as a test set. After
the training phase of our three networks (VGG16, ResNet50 and our handcrafted model) for 10-times of runs we got an
accuracy range over a validation set of values between (92.20% - 94.54%), (92.57% - 94.72%), (75.11%-76.90%), for
these models respectively. Thus, as shown in Figure 3 which plots the training and validation accuracy and the training and
validation loss during the training process of these three CNNs in the last run, the two pre-trained models outperformed our
handcrafted model which is had been solely trained from scratch on our selected three classes OCT-image set.
Figure 3 – Training and validation accuracy and loss of the last runs for our modified models of:</p>
      <p>(a) VGG16, (b) ResNet50, (c) Handcrafted model.</p>
      <p>For more robust evaluation of each classification network, we calculated the classification accuracy after training each
CNN networks ten-times over an OCT-test set (test-accuracy), which represented 20 % from our OCT image samples. The
results show the resulting performance of pre-trained CNN over our Handcrafted CNN. Not only for validation and test
accuracy, but also for the number of epochs our Handcrafted CNN performs worse while employing 50 epochs to fit
training image samples in comparison to 20 and 10 epochs for VGG16 and ResNet50, respectively. As shown in Table 1,
the best test accuracy was obtained by pre-trained VGG16 with a test accuracy average of 88.814 %. ResNet50 achieved
even better in terms of the number of epochs, which was 10 epochs.</p>
      <p>CNN
VGG16</p>
      <sec id="sec-4-1">
        <title>Resnet50</title>
      </sec>
      <sec id="sec-4-2">
        <title>Handcraft</title>
        <p>In this study, we introduced an experimental study in order to investigate the robustness of pre-trained Neural Networks
towards a special kind of medical images (OCT-images) described in Section 2 without the need to retrain these networks
from scratch. We explore the outcome of keeping the first few layers of the pre-trained networks are while retraining the
latter layers in order to adjust and fit our classification problem. The experimental results show the resulting performances
of pre-trained networks over a Handcraft network, which has been built and trained from scratch, and which augments the
concept of knowledge transfer despite the big difference between the natural and medical image domains. We also applied
Grad-CAM visualization method on pre-trained ResNet50 to get a better understanding which features appear relevant to
the CNNs in order to distinguish between the different medical image classes. Future work encompasses the investigation
of semi-automated and active learning algorithms to solve the massive annotation problems, since these algorithm classes
are capable to fill the gap between labelled and unlabelled data while idealistically only querying such samples that would
lead to an increase in precision or accuracy. In addition, it is essential to enhance the current framework and tool chain to
address at least a wider variety of real-world ophthalmologic challenges.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>7 Acknowledgement</title>
      <p>We like to acknowledge that Prof. Dr. Andreas Stahl and the collaborators of the TOPOs project provided the OCT image
data that was used in this study, as well as the ophthalmological background. TOPOs (“Therapievorhersage durch Analyse
von Patientendaten in der Ophthalmologie”) is a collaborative project that is funded by BMBF (”Bundesministerium fur
Bildung und Forschung”) (FKZ: 13GW0170B) from March 2017 to January 2020. The European Social Fund (ESF) also
funded this work within the Innovative PhD Scholarship entitled “Aggregation, Visualisierung und Optimierung von
überwachten Deep Learning-Technologien mit Hilfe der virtuellen und erweiterten Realität”.
[13] N. Tajbakhsh et al., “Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?,”</p>
      <p>IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1299–1312, 2016.
[14] K. Simonyan and A. Zisserman, “VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE</p>
      <p>RECOGNITION”, arXiv preprint arXiv:1409.1556, 2015.
[15] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” In Proceedings of the IEEE
conference on computer vision and pattern recognition, pp. 770-778. 2016..
[16] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations
from Deep Networks via Gradient-Based Localization,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-Octob, pp.
618–626, 2017.
[17] M. Bhende, S. Shetty, M. Parthasarathy, and S. Ramya, “Optical coherence tomography: A guide to interpretation of
common macular diseases,” Indian J. Ophthalmol., vol. 66, no. 1, p. 20, 2018.
[18] “Age-Related Macular Degeneration (AMD) | National Eye Institute.” [Online]. Available:
https://nei.nih.gov/health/maculardegen. [Accessed: 09-Jul-2019].
[19] T. Y. Y. Lai, C. M. G. Cheung, and W. F. Mieler, “Ophthalmic Application of Anti-VEGF Therapy,” Asia-Pacific J.</p>
      <p>Ophthalmol., vol. 6, no. 6, pp. 479–480, 2017.
[20] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition”, arXiv
preprint arXiv:1409.1556, Sep. 2014.
[21] Y. Tang, “Deep Learning using Linear Support Vector Machines”, arXiv preprint arXiv:1306.0239, Jun. 2013.
[22] C. Enyinna Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall, “Activation Functions: Comparison of Trends in</p>
      <p>Practice and Research for Deep Learning.”, arXiv preprint arXiv:1811.03378, 2018.
[23] D. P. Kingma and J. Lei Ba, “ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION.”, arXiv preprint
arXiv:1412.6980, 2014.
[24] H. Robbins and S. Monro, “A Stochastic Approximation Method”, The annals of mathematical statistics,
pp.400407, 1951.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Voulodimos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Doulamis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doulamis</surname>
          </string-name>
          , and E. Protopapadakis, “
          <article-title>Deep Learning for Computer Vision: A Brief Review,”</article-title>
          <string-name>
            <surname>Comput. Intell. Neurosci.</surname>
          </string-name>
          , vol.
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          , Feb.
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Long</surname>
          </string-name>
          , E. Shelhamer, and T. Darrell, “
          <article-title>Fully Convolutional Networks for Semantic Segmentation</article-title>
          ,
          <source>2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , pp.
          <fpage>3431</fpage>
          -
          <lpage>3440</lpage>
          , Boston, MA,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Diba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pazandeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Pirsiavash</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Van Gool</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K. Leuven, “
          <article-title>Weakly Supervised Cascaded Convolutional Networks</article-title>
          .”,
          <source>2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , pp.
          <fpage>5131</fpage>
          -
          <lpage>5139</lpage>
          , Honolulu,
          <string-name>
            <surname>HI</surname>
          </string-name>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Thoma</surname>
          </string-name>
          , “
          <article-title>Analysis and Optimization of Convolutional Neural Network Architectures,”</article-title>
          <source>arXiv preprint arXiv:1707.09725</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5] “ImageNet.” [Online]. Available: http://www.image-net.org/. [Accessed:
          <fpage>12</fpage>
          -Jul-2019].
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Research</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , “
          <article-title>Instance-based Deep Transfer Learning</article-title>
          .”,
          <source>2019 IEEE Winter Conference on Applications of Computer Vision (WACV)</source>
          , pp.
          <fpage>367</fpage>
          -
          <lpage>375</lpage>
          ,
          <string-name>
            <surname>Waikoloa</surname>
            <given-names>Village</given-names>
          </string-name>
          ,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lindeberg</surname>
          </string-name>
          , “Scale Invariant Feature Transform,” Scholarpedia, vol.
          <volume>7</volume>
          , no.
          <issue>5</issue>
          , p.
          <fpage>10491</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Dalal</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Triggs</surname>
          </string-name>
          , “
          <article-title>Histograms of Oriented Gradients for Human Detection</article-title>
          ,” in
          <source>2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)</source>
          , vol.
          <volume>1</volume>
          , pp.
          <fpage>886</fpage>
          -
          <lpage>893</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Azizpour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Razavian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sullivan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maki</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Carlsson,</surname>
          </string-name>
          “
          <article-title>From Generic to Specific Deep Representations for Visual Recognition</article-title>
          .”,
          <source>2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</source>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>45</lpage>
          , Boston, MA,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bar</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Diamant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wolf</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Greenspan</surname>
          </string-name>
          , “
          <article-title>Deep learning with non-medical training used for chest pathology identification</article-title>
          .”,
          <source>2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI)</source>
          , pp.
          <fpage>294</fpage>
          -
          <lpage>297</lpage>
          , New York, NY,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>B. van Ginneken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A. A.</given-names>
            <surname>Setio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Ciompi</surname>
          </string-name>
          , “
          <article-title>Off-the-shelf convolutional neural network features for pulmonary nodule detection in computed tomography scans</article-title>
          ,” in
          <source>2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI)</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>286</fpage>
          -
          <lpage>289</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          et al.,
          <source>“Standard Plane Localization in Fetal Ultrasound via Domain Transferred Deep Neural Networks,” IEEE J. Biomed. Heal. Informatics</source>
          , vol.
          <volume>19</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>1627</fpage>
          -
          <lpage>1636</lpage>
          , Sep.
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>