<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Full training versus ne tuning for radiology images concept detection task for the ImageCLEF 2019 challenge</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Priyanshu Sinha</string-name>
          <email>sinha@outlook.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saptarshi Purkayastha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Judy Gichoya</string-name>
          <email>gichoya@ohsu.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indiana University Purdue University</institution>
          ,
          <addr-line>Indianapolis, IN 46202</addr-line>
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Mentor Graphics India Pvt. Ltd.</institution>
          <addr-line>priyanshu</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Oregon Health Science University</institution>
          ,
          <addr-line>Portlnd, OR 97239</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Concept detection from medical images remains a challenging task that limits implementation of clinical ML/AI pipelines because of the scarcity of the highly trained experts to annotate images. There is a need for automated processes that can extract concrete textual information from image data. ImageCLEF 2019 provided us a set of images with labels as UMLS concepts. We participated for the rst time for the concept detection task using transfer learning. Our approach involved an experiment of layerwise ne tuning (full training) versus ne tuning based on previous reported recommendations for training classi cation, detection and segmentation tasks for medical imaging. We ranked number 9 in this year's challenge, with an F1 result of 0.05 after three entries. We had a poor result from performing layerwise tuning (F1 score of 0.014) which is consistent with previous authors who have described the bene t of full training for transfer learning. However when looking at the results by a radiologist, the terms do not make clinical sense and we hypothesize that we can achieve better performance when using medical pretrained image models for example PathNet and utilizing a hierarchical training approach which is the basis of our future work on this dataset.</p>
      </abstract>
      <kwd-group>
        <kwd>Transfer Learning Layer wise Fine Tuning Deep Learning in Radiology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Concept detection from medical images remains a challenging task that limits
implementation of clinical ML/AI pipelines because of the scarcity of the highly
trained experts to annotate images. ImageCLEF is an annual challenge now in its
third year that seeks contributions that provide techniques to map visual
information to condensed textual descriptions. The process of automatic extraction
of high level concepts from low level features is di cult when the images have
occlusion, background clutter, intra-class variation, pose and lighting changes[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Participants from past challenges in 2017 and 2018 noted a broad range of
content and hence the 2019 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] challenge was narrowed down in focus to only
radiology images [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The focus on concept detection in the 2019 challenge is
important because it is the rst step of automatic image captioning, while also
providing metadata to support context based image retrieval.
      </p>
      <p>
        This was our rst time participating in the ImageCLEF challenge. The
challenge is a multi-label classi cation problem, where one radiology image can have
multiple labels. Previous participants had good performance when using transfer
learning, hence we focussed on optimizing the ResNet50 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] network which had
the best performance compared to VGG19 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Xception Net [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and
InceptionResNetV2 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We ranked number 9 in this year's challenge, with an F1 result of
0.05 after three entries. We had a poor result from performing layerwise tuning
(F1 score of 0.014) which is consistent with previous authors who have described
the bene t of full training during transfer learning. However when looking at the
results by a radiologist, the terms do not make clinical sense and we hypothesize
that we can achieve better performance when using medical pretrained image
models for example PathNet which is the basis of our future work on this dataset.
We describe our approach in detail in the remaining sections of this paper.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>DATASET</title>
      <p>
        A total of 6,031,814 image - caption pairs were extracted from PubMed Open
Access and after processing were reduced to 72,187 radiology images from
various modalities. This dataset included archived images from February 2018 to
February 2019 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Table 1 shows a summary of the images in the training, test
and validation set. We did not use additional radiology training data for the
purpose of our submission to this challenge. Each label is a UMLS concept provided
as a csv le. Table 2 shows a representative sample of the data showing images
in the training (First row), validation (second row) and test set (third row).
      </p>
      <p>No of Images
56629
14157
10000</p>
    </sec>
    <sec id="sec-3">
      <title>STUDY EXPERIMENT</title>
      <p>
        Data Analysis
The ImageCLEF images were formatted to the Imagenet directory style where
the directory name is the UMLS label. This was because our approach was
mainly based on transfer learning and would make repeat experiments easy to
perform. Summary statistics of the dataset found 5217 unique UMLS/label
concepts. There was image imbalance with approximately 90% of the labels
containing less than 100 images; and 30% labels containing a single image. Table 3
shows the top 10 concepts occurring in the highest frequency in the training set.
Analysis of the top 25 labels (summarized in Figure 1) show that there is
persistent data imbalance with one label containing more than 6500 images (C0441633
- \Scanning") and one label containing less than 2000 images (C0006104
\Brain"). We therefore discarded labels containing less than 1000 images and
used class weight technique from sklearn for balancing our training data [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Each input image was resized into 224x224 pixels without cropping. We used a
batch size of 32 with learning rate 0.0001. The batches were formed by randomly
shu ing the dataset. Optimization was performed using Adam optimizer with
default beta 1 (0.9) and beta 2 (0.999). Image augmentation during training was
performed using the Keras ImageDataGenerator. Augmentations performed
include rescaling, rotation, zooming, shearing and horizontal ipping. A total of
100 epochs were executed. We split the data to 85% training set and 15%
validation set. The network was trained using the Keras framework with tensor ow
as the backend, running on a NVIDIA Quadro P6000 GPU.
      </p>
      <p>
        We treated this as multi label classi cation problem and limited our training
to the top 25 labels. Our base model was ResNet50, from which we removed the
fully connected top layers and added our own auxiliary convolutional layer along
with dense layers. To prevent over tting, we used dropout between dense layers.
After evaluating our performance with ne tuning the last layers and reviewing
the literature on ne tuning versus full training [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], we embarked on layerwise
ne tuning using Resnet50(run 2). In the second run we sequentially trained each
layer while freezing others. For this approach we decreased the learning rate for
higher layers and ne tuned it layer wise by unfreezing layers below a particular
layer.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation and Analysis</title>
      <p>
        Tajbakhsh et al [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] performed the most comprehensive experiment evaluating
the approach of ne tuning a network versus training a network from scratch.
In their review of classi cation, detection and segmentation tasks using
multiple imaging modalities including radiology, colonoscopy and ultrasound, they
demonstrate better performance with layerwise ne tuning. Our attempt to
replicate their superior performance when approaching concept detection task on the
ImageCLEF 2019 dataset led to lower performance when layerwise ne tuning
(F1 score of 0.014) versus whole ne tuning the network as a whole (F1 score
0.05) summarized in table 4. Our poor comparative performance may be due to
poor selection of hyperparameters for ne tuning the network.
      </p>
      <p>Our approach included a clinical review of some of the sample output by a
radiologist who is one of the authors of this paper, and we notice a large
discrepancy in the utility of the generated concepts (Table 5). For example the rst
row demonstrated a chest xray with a pneumoperitoneum, and our model does
not generate terms closely related to the actual radiograph interpretation. We
hypothesize that a stepwise approach to training where ontology hierarchies for
example laterality and anatomy are maintained may generate a superior
performance that is clinically meaningful.</p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSION</title>
      <p>Despite previous documentations of superior performance with layer wise ne
tuning of medical image tasks, we had a poor performance with this approach
for concept detection. There is an opportunity to improve on layer wise ne
tuning for such tasks. We advance the challenge by reviewing clinical relevance
of output, for which despite our performance at number 9 in the challenge we
found that the clinical utility of the concepts detected was low and hypothesize
that we can achieve better performance and improved clinical utility using a
hierarchical approach to training.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D.</given-names>
            <surname>Katsios</surname>
          </string-name>
          and E. Kavallieratou, \
          <article-title>Concept Detection on Medical Images using Deep Residual Learning Network,"</article-title>
          <source>CLEF</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <article-title>Bogdan and Muller, Henning and Peteri, Renaud and Dang-Nguyen, DucTien and Piras, Luca and Riegler, Michael and Tran, Minh-Triet and Lux, Mathias and Gurrin, Cathal and Cid, Yashin Dicente and Liauchuk, Vitali and Kovalev, Vassili and Ben Abacha, Asma and Hasan, Sadid A. and Datla, Vivek and Liu, Joey and Demner-Fushman, Dina and Pelka, Obioma and Friedrich, Christoph M. and</article-title>
          <string-name>
            <surname>Chamberlain</surname>
          </string-name>
          , Jon and Clark, Adrian and de Herrera,
          <article-title>Alba Garc a Seco and Garcia, Narciso and Kavallieratou, Ergina and del Blanco, Carlos Roberto and Rodr guez, Carlos Cuevas and Vasillopoulos, Nikos and Karampidis, Konstantinos, "Overview of ImageCLEF 2019 : Challenges, Datasets and Evaluation", Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction,
          <source>in Proceedings of the Tenth International Conference of the CLEF Association (CLEF</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>O.</given-names>
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Koitka</surname>
          </string-name>
          , J. Ruckert,
          <string-name>
            <given-names>F.</given-names>
            <surname>Nensa</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          , \
          <article-title>Radiology objects in context (ROCO): A multimodal image dataset," in Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis:</article-title>
          7th Joint International Workshop, CVII-STENT 2018 and Third International Workshop, LABELS 2018,
          <article-title>Held in Conjunction with MICCAI 2018, Granada</article-title>
          , Spain,
          <year>September 16</year>
          ,
          <year>2018</year>
          , Proceedings, vol.
          <volume>11043</volume>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , S. Balocco,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sznitman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Martel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Maier-Hein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Duong</surname>
          </string-name>
          , G. Zahnd,
          <string-name>
            <given-names>S.</given-names>
            <surname>Demirci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Albarqouni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-L.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Moriconi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cheplygina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mateus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Trucco</surname>
          </string-name>
          , E. Granger, and P. Jannin, Eds. Cham: Springer International Publishing,
          <year>2018</year>
          , pp.
          <volume>180</volume>
          {
          <fpage>189</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          , \
          <article-title>Deep residual learning for image recognition,"</article-title>
          <source>in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2016</year>
          , pp.
          <volume>770</volume>
          {
          <fpage>778</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          , \
          <article-title>Very Deep Convolutional Networks for Large-Scale Image Recognition,"</article-title>
          arXiv,
          <source>Sep</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>F.</given-names>
            <surname>Chollet</surname>
          </string-name>
          , \Xception:
          <article-title>Deep Learning with Depthwise Separable Convolutions,"</article-title>
          <source>in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2017</year>
          , pp.
          <year>1800</year>
          {
          <year>1807</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          , S. Io e, V.
          <article-title>Vanhoucke, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Alemi</surname>
          </string-name>
          , \
          <article-title>Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,"</article-title>
          arXiv,
          <source>Feb</source>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. sklearn.utils.class weight.
          <source>compute class weight | scikit-learn 0.21.2 documentation."</source>
          [Online]. Available: https://scikitlearn.org/stable/modules/generated/sklearn.utils.
          <article-title>class weight.compute class weight</article-title>
          .html. [Accessed:
          <fpage>27</fpage>
          -May-2019].
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>N.</given-names>
            <surname>Tajbakhsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Gurudu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. T.</given-names>
            <surname>Hurst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. B.</given-names>
            <surname>Kendall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Gotway</surname>
          </string-name>
          , and Jianming Liang, \
          <article-title>Convolutional neural networks for medical image analysis: full training or ne tuning?,"</article-title>
          <source>IEEE Trans. Med</source>
          . Imaging, vol.
          <volume>35</volume>
          , no.
          <issue>5</issue>
          , pp.
          <volume>1299</volume>
          {
          <issue>1312</issue>
          ,
          <string-name>
            <surname>Mar</surname>
          </string-name>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pelka</surname>
          </string-name>
          , Obioma and Friedrich,
          <string-name>
            <surname>Christoph</surname>
            <given-names>M</given-names>
          </string-name>
          and
          <article-title>Garc a Seco de Herrera, Alba and Muller, Henning, "Overview of the ImageCLEFmed 2019 Concept Prediction Task"</article-title>
          ,
          <source>CLEF2019 Working Notes, CEUR Workshop Proceedings (CEUR- WS.org)</source>
          ,
          <source>ISSN 1613-0073</source>
          , http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2380</volume>
          /,
          <year>2019</year>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <article-title>Bogdan Ionescu and Henning Muller and Renaud Peteri and Yashin Dicente Cid and Vitali Liauchuk and Vassili Kovalev and Dzmitri Klimuk and Aleh Tarasau and Asma Ben Abacha and Sadid A. Hasan and Vivek Datla and Joey Liu and Dina Demner-Fushman and Duc-Tien Dang-Nguyen and Luca Piras and Michael Riegler and Minh-Triet Tran and Mathias Lux and Cathal Gurrin and Obioma Pelka and Christoph M. Friedrich and Alba Garc a Seco de Herrera and Narciso Garcia and Ergina Kavallieratou and Carlos Roberto del Blanco and Carlos Cuevas Rodr guez and Nikos Vasillopoulos and Konstantinos Karampidis and Jon Chamberlain and Adrian Clark and Antonio Campello, "ImageCLEF 2019: Multimedia Retrieval in Medicine, Lifelogging, Security and Nature", Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction,
          <source>in Proceedings of the Tenth International Conference of the CLEF Association (CLEF</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>