<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AUEB NLP Group at ImageCLEFmed Caption 2019</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vasiliki Kougia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John Pavlopoulos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ion Androutsopoulos</string-name>
          <email>iong@aueb.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, Athens University of Economics and Business</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present the systems that AUEB's NLP Group used to participate in the ImageCLEFmed 2019 Caption task. The goal of this task is to automatically select medical concepts related to each image, as a first step towards generating image captions, medical reports, or to help in medical diagnosis. We participated with four systems, all using CNN image encoders. The encoder of each system is combined with an image retrieval method or a feed-forward neural network to predict concepts. Our systems were ranked 1st, 2nd, 3rd, and 5th.</p>
      </abstract>
      <kwd-group>
        <kwd>Medical Images</kwd>
        <kwd>Concept Detection</kwd>
        <kwd>Image Retrieval</kwd>
        <kwd>Multi-label Classification</kwd>
        <kwd>Image Captioning</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Deep Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Deep learning methods are being developed to automatically interpret biomedical
images in order to help clinicians who examine large numbers of images daily [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The
ImageCLEFmed Caption task [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is part of ImageCLEF 2019 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].1 Image CLEF is a
campaign that suggests novel challenges and develops benchmarking resources for the
evaluation of systems operating on images. The ImageCLEFmed Caption Task ran for
the 3rd year in 2019. It included a Concept Detection sub-task, where the goal was to
perform multi-label classification of medical images by automatically selecting medical
concepts that should be assigned to each image. The concepts come from the Unified
Medical Language System (UMLS).2 Selecting the appropriate concepts per image can
be a first step towards automatically generating image captions, longer medical reports,
and can also assist, more generally, in computer-assisted diagnosis [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In the two
previous years, ImageCLEFmed also included a Caption Prediction (generation) sub-task
[
        <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
        ], which was not included this year.
      </p>
      <p>
        This paper presents the four Concept Detection systems that AUEB’s NLP Group
used to participate in ImageCLEFmed 2019 Caption. The systems were ranked 1st, 2nd,
3rd, and 5th. The system that was ranked 3rd consists of a DenseNet-121 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
Convolutional Neural Network (CNN) image encoder and a k-Nearest Neighbors (k-NN)
retrieval component that uses the encoding of the image being classified to retrieve similar
training images with known concepts; these are then used to assign concepts to the new
image. The top-ranked system is a re-implementation of CheXNet [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], with
modifications for ImageCLEFmed Caption 2019. CheXnet also uses the DenseNet-121 encoder
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], combined with a feed-forward neural network (FFNN) that performs multi-label
classification. The second-best system is an ensemble combining concept probability
scores obtained from the CheXNet-based system and image similarity scores produced
by k-NN retrieval of similar training images. The system ranked 5th uses the VGG-19
image encoder [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], which was also used by Jing et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], combined with a FFNN for
multi-label classification.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Data</title>
      <p>
        The ImageCLEFmed Caption 2019 dataset is a subset of the Radiology Objects in
COntext (ROCO) dataset [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. It consists of medical images extracted from open access
biomedical journal articles of PubMed Central.3 Each image was extracted along with
its caption. The caption was processed using QuickUMLS [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to produce the gold
UMLS concept unique identifiers (CUIs). An image can be associated with multiple
CUIs (Figure 1). Each CUI is accompanied by its corresponding UMLS term.
In ImageCLEFmed Caption 2017 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and 2018 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the datasets were noisy. They
included generic and compound images, covering a wide diversity of medical images;
3https://www.ncbi.nlm.nih.gov/pmc/
there was also a large total number of concepts (111,155) and some of them were too
generic and did not appropriately describe the images [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. In the ROCO dataset,
compound and non-radiology images were filtered out using a CNN model. This led to
80,786 radiology images in total, of which 56,629 images were provided as the training
set, 14,157 as the validation set, and the remaining 10,000 images were used for testing.
In ImageCLEFmed Caption 2019, the total number of UMLS concepts was reduced to
5,528, with 6 concepts assigned to each training image on average. The minimum
number of concepts per training image is 1, and the maximum is 72. Table 1 shows the 6
most frequent concepts of the training set and how many training images they were
assigned to, according to the gold annotations. We note that 312 of the 5,528 total
concepts are not assigned to any training image; and 1,530 concepts are assigned to only
one training image.
      </p>
      <p>CUI UMLS term Images
C0441633 diagnostic scanning 6,733
C0043299 x-ray procedure 6,321
C1962945 radiogr 6,318
C0040395 tomogr 6,235
C0034579 pantomogr 6,127</p>
      <p>C0817096 thoracics 5,981
We randomly selected 20% of the training images and used them as our development
set (11,326 images along with their gold concepts). The models we used to produce the
submitted results were trained on the entire training set. The validation set was used for
hyper-parameter tuning and early stopping.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <p>This section describes the four methods we developed for ImageCLEFmed Caption
2019.
3.1</p>
      <sec id="sec-3-1">
        <title>System 1: DenseNet-121 Encoder + k-NN Image Retrieval (Ranked 3rd)</title>
        <p>
          In this system, we followed a retrieval approach, extending the 1-NN baseline of our
previous work on biomedical image captioning [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Given a test image, the previous
1-NN baseline returned the caption of the most similar training image, using a CNN
encoder to map each image to a dense vector. For ImageCLEFmed Caption 2019, we
retrieve the k-most similar training images and use their concepts, as described below.
        </p>
        <p>
          We use the DenseNet-121 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] image encoder, a CNN with 121 layers, where all
layers are directly connected to each other improving information flow and avoiding
vanishing gradients. We started with DenseNet-121 pre-trained on ImageNet [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and
fine-tuned it on ImageCLEFmed Caption 2019 training images.4 The fine-tuning was
4We used the implementation of https://keras.io/applications/#densenet.
performed as when training DenseNet-121 in System 2, including data augmentation
(Section 3.2). Without fine-tuning, the performance of the pre-trained encoder was
worse. ImageCLEFmed Caption 2019 images were rescaled to 224 224 and
normalized with the mean and standard deviation of ImageNet to match the requirements of
DenseNet-121 and how it was pre-trained on ImageNet. Having fine-tuned
DenseNet121, we used it to obtain dense vector encodings, called image embeddings, of all
training images. The image embeddings are extracted from the last average pooling layer of
DenseNet-121. Given a test image (Fig. 2), we again use the fine-tuned DensNet-121 to
obtain the image’s embedding. We then retrieve the k training images with the highest
cosine similarity (computed on image embeddings) to the test image, and return the r
concepts that are most frequent among the concepts of the k images. We set r to the
average number of concepts per image of the particular k retrieved images. We tuned
the value of k in the range from 1 to 200 using the validation set, which led to k = 199.
Further fine-tuning may improve performance further. This system ranked 3rd.
This system, which is based on CheXNet [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], achieved the best results in
ImageCLEFmed Caption 2019. In its original form, CheXNet maps X-rays of the ChestX-ray
14 dataset [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] to 14 labels. It uses DenseNet-121 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] to encode images, adding a FFNN
to assign one or more of the 14 labels (classes) to each image.
        </p>
        <p>
          We re-implemented CheXNet in Keras5 and extended it for the many more labels
(5,528 vs. 14) of ImageCLEFmed Caption 2019. The images of ImageCLEFmed
Caption 2019 were again rescaled to 224 224 and normalized using the mean and standard
deviation values of ImageNet. Also the training images of ImageCLEFmed Caption
2019 were augmented by applying random horizontal flip. Image embeddings are again
extracted from the last average pooling layer of DenseNet-121. In this system, however,
the image embeddings are then passed through a dense layer with 5,528 outputs and
sigmoid activations to produce a probability per label. We trained the model by minimizing
binary cross entropy loss. We used Adam [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] with its default hyper-parameters, early
stopping on the validation set, and patience of 3 epochs. We also decayed the learning
rate by a factor of 10 when the validation loss stopped improving.
        </p>
        <p>At test time, we predict the concepts for each test image using their probabilities, as
estimated by the trained model. For each concept (label), we assign it to the test image if
the corresponding predicted probability exceeds a threshold t. We use the same t value
for all 5,528 concepts. We tuned t on the validation set, which led to t = 0:16.
3.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>System 3: Based on Jing et al., VGG-19 Encoder + FFNN (Ranked 5th)</title>
        <p>
          This system is based on the work of Jing et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], who presented an encoder-decoder
model to generate tags and medical reports from medical images. Roughly speaking,
the full model of Jing et al. uses a VGG-19 [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] image encoder, a multi-label classifier
to produce tags (describing concepts) from the images, and a hierarchical LSTM that
generates texts by attending on both image and tag embeddings; the top level of the
LSTM generates sentence embeddings, and the bottom level generates the words of
each sentence. We implemented in Keras a simplified version of the first part of Jing et
al.’s model, the part that performs multi-label image classification.
        </p>
        <p>
          Again, we rescale the ImageCLEFmed Caption 2019 images to 224 224 and
normalize them using the mean and standard deviation of ImageNet. We feed the resulting
images to the VGG-19 CNN, which has 19 layers and uses small kernels of size 3 3.
We used VGG-19 pre-trained on ImageNet.6 We feed whole images to VGG-19, unlike
Jing et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], who divide each image into regions and encode each region separately.
The output of the last fully connected layer of VGG-19 is then given as input to a dense
layer with a softmax activation to obtain a probability distribution over the concepts.
The model is trained using categorical cross entropy, which is calculated as:
E =
jCj
X ytrue;i log2(ypred;i)
i=1
where C is the set of jCj = 5; 528 concepts, ytrue is the ground truth binary vector
of a training image, and ypred is the predicted softmax probability distribution over the
concepts C for the training image. Categorical cross entropy sums loss terms only for
the gold concepts of the image, which have a value of 1 in ytrue. When using
softmax and categorical cross-entropy, usually ytrue is a one-hot vector and the classes are
mutually exclusive (single-label classification). To use softmax with categorical cross
entropy for multi-label classification, where ytrue is binary but not necessarily one-hot,
the loss is divided by the number of gold labels (true concepts) [
          <xref ref-type="bibr" rid="ref11 ref3">3, 11</xref>
          ]. Jing et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
achieve this by dividing the ground truth binary vector ytrue by its L1 norm, which
equals the number of gold labels. Hence, the categorical cross-entropy loss is computed
as follows:
        </p>
        <p>E =
jCj
X</p>
        <p>ytrue;i
i=1 k ytrue k1
log2(ypred;i) =
1 M</p>
        <p>X log2(ypred;j )
M j=1
6https://keras.io/applications/#vgg19
(1)
(2)
where M is the number of gold labels (true concepts) of the training image, which is
different per training image. In this model, the loss of Eq. 2 achieved better results on
the development set, compared to binary cross entropy with a sigmoid activation per
concept. We used the Adam optimizer with initial learning rate 1e-5 and early stopping
on the validation set with patience 3 epochs. Given a test image, we return the six
concepts with the highest probability scores, since the average number of gold concepts
per training image is 6.
3.4</p>
      </sec>
      <sec id="sec-3-3">
        <title>System 4: Ensemble, k-NN Image Retrieval + CheXNet (Ranked 2nd)</title>
        <p>This method is an ensemble of System 1 (DenseNet-121 + k-NN Image Retrieval) and
System 2 (CheXNet-based), where System 1 is modified to produce a score for each
returned concept.</p>
        <p>Given a test image g, we use System 1 (Fig. 2) to retrieve the k most similar training
images g1; : : : ; gk, their gold concepts, and the cosine similarities s(g; g1); : : : ; s(g; gk)
between the test image g and each one of the k retrieved images. Let C be again the set
of jCj = 5; 528 concepts. For each concept cj 2 C, the modified System 1 assigns to
ci the following score:
v1(cj ; g) =
k
X s(g; gi) (cj ; gi)
i=1
(3)
where (cj ; gi) = 1 if cj is a gold concept of the retrieved training image gi, and
(cj ; gi) = 0 otherwise. In other words, the score of each concept cj is the sum of the
cosine similarities of the retrieved documents where cj is a gold concept.</p>
        <p>For the same test image g, we also obtain concept probabilities from System 2,
i.e., a vector of 5,528 probabilities. Let v2(cj ; g) be the probability of concept cj being
correct for test image g according to System 2. For each cj 2 C, the ensemble’s score
v(cj ; g) of cj is simply the average of v1(cj ; g) and v2(cj ; g). The ensemble returns the
six concepts with the highest v(cj ; g) scores, as in System 3, on the grounds that the
average number of gold concepts per training image is 6.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>
        Systems were evaluated in ImageCLEFmed Caption 2019 by computing their F1 scores
on each test image (in effect comparing the binary ground truth vector ytrue to the
predicted concept probabilities ypred) and then averaging over all test images [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Table 2
reports the evaluation results of our four systems on the development and test data, as
well as their ranking among the approximately 60 systems that participated in the task.
The ensemble (System 4) had the best results on development data, but the
CheXNetbased system (System 2) had the best results on the test set.
Ranking
We described the four systems that AUEB’s NLP Group used to participate in
ImageCLEFmed 2019 Caption. The four systems were ranked 1st, 2nd, 3rd, and 5th. Our
top system was a re-implementation of CheXNet [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], with modifications to handle
the much larger label set of ImageCLEFmed 2019 Caption and data augmentation. The
system that was ranked 3rd used DenseNet [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to encode images and k-NN retrieval to
return the concepts of the most similar training images. Our second-best system was an
ensemble of the previous two (CheXNet-based and k-NN based), indicating that the two
approaches are complementary. Our weakest system, which nevertheless was ranked
5th, was based on the multi-label classification part of the system of Jing et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
which aims to generate draft medical reports using an encoder-decoder approach.
      </p>
      <p>
        In future work, we aim to experiment with, combine, and improve upon additional
methods and datasets for medical image captioning. Towards that direction, we recently
published a survey on medical image to text methods [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which we also plan to extend.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fei-Fei</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>ImageNet: A Large-Scale Hierarchical Image Database</article-title>
          .
          <source>In: IEEE Conference on Computer Vision and Pattern Recognition</source>
          . pp.
          <fpage>248</fpage>
          -
          <lpage>255</lpage>
          . Miami Beach, FL, USA (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Eickhoff</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwall</surname>
            , I., de Herrera,
            <given-names>A.G.S.</given-names>
          </string-name>
          , Mu¨ller, H.:
          <article-title>Overview of ImageCLEFcaption 2017 - the Image Caption Prediction and Concept Extraction Tasks to Understand Biomedical Images</article-title>
          .
          <source>In: CLEF2017 Working Notes. CEUR Workshop Proceedings</source>
          , CEUR-WS.org &lt;http://ceur-ws.
          <source>org&gt;</source>
          , Dublin,
          <source>Ireland (September</source>
          <volume>11</volume>
          -14
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leung</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toshev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ioffe</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Deep Convolutional Ranking for Multilabel Image Annotation</article-title>
          . In: International Conference on Learning Representations (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eickhoff</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andrearczyk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , Mu¨ller, H.:
          <article-title>Overview of the ImageCLEF 2018 Caption Prediction Tasks</article-title>
          .
          <source>In: CLEF2018 Working Notes. CEUR Workshop Proceedings</source>
          , CEUR-WS.org &lt;http://ceur-ws.
          <source>org&gt;</source>
          , Avignon,
          <source>France (September 10-14</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van der Maaten</surname>
          </string-name>
          , L.,
          <string-name>
            <surname>Weinberger</surname>
            ,
            <given-names>K.Q.</given-names>
          </string-name>
          :
          <article-title>Densely Connected Convolutional Networks</article-title>
          .
          <source>In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition</source>
          . pp.
          <fpage>4700</fpage>
          -
          <lpage>4708</lpage>
          . Honolulu,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Mu¨ller, H., Pe´teri, R.,
          <string-name>
            <surname>Cid</surname>
            ,
            <given-names>Y.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klimuk</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tarasau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abacha</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datla</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dang-Nguyen</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lux</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelka</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavallieratou</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>del Blanco</surname>
            ,
            <given-names>C.R.</given-names>
          </string-name>
          , Rodr´ıguez,
          <string-name>
            <given-names>C.C.</given-names>
            ,
            <surname>Vasillopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Karampidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Chamberlain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Campello</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>ImageCLEF 2019: Multimedia Retrieval in Medicine, Lifelogging, Security and Nature</article-title>
          . In:
          <article-title>Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the 10th International Conference of the CLEF Association (CLEF</source>
          <year>2019</year>
          ),
          <source>LNCS Lecture Notes in Computer Science</source>
          , Springer, Lugano,
          <source>Switzerland (September 9-12</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jing</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xing</surname>
          </string-name>
          , E.:
          <article-title>On the Automatic Generation of Medical Imaging Reports</article-title>
          . In:
          <article-title>Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers)</article-title>
          . pp.
          <fpage>2577</fpage>
          -
          <lpage>2586</lpage>
          . Melbourne,
          <string-name>
            <surname>Australia</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A Method for Stochastic Optimization</article-title>
          .
          <source>arXiv:1412.6980</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kougia</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pavlopoulos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Androutsopoulos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>A Survey on Biomedical Image Captioning</article-title>
          . In: Workshop on Shortcomings in
          <article-title>Vision and Language of the Annual Conference of the North American Chapter of the Association for Computational Linguistics</article-title>
          . pp.
          <fpage>26</fpage>
          -
          <lpage>36</lpage>
          . Minneapolis, MN, USA (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Litjens</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kooi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bejnordi</surname>
            ,
            <given-names>B.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Setio</surname>
            ,
            <given-names>A.A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciompi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghafoorian</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laak</surname>
            ,
            <given-names>J.A.V.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ginneken</surname>
            ,
            <given-names>B.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sa</surname>
          </string-name>
          ´nchez,
          <string-name>
            <surname>C.I.:</surname>
          </string-name>
          <article-title>A Survey on Deep Learning in Medical Image Analysis</article-title>
          .
          <source>Medical Image Analysis</source>
          <volume>42</volume>
          ,
          <fpage>60</fpage>
          -
          <lpage>88</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mahajan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramanathan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paluri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bharambe</surname>
          </string-name>
          , A.,
          <string-name>
            <surname>van der Maaten</surname>
          </string-name>
          , L.:
          <article-title>Exploring the Limits of Weakly Supervised Pretraining</article-title>
          .
          <source>In: European Conference on Computer Vision</source>
          . pp.
          <fpage>181</fpage>
          -
          <lpage>196</lpage>
          . Munich,
          <string-name>
            <surname>Germany</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Pelka</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          , Mu¨ller, H.:
          <article-title>Overview of the ImageCLEFmed 2019 Concept Prediction Task</article-title>
          .
          <source>In: CLEF2019 Working Notes. CEUR Workshop Proceedings</source>
          , vol.
          <source>ISSN 1613-0073</source>
          . CEUR-WS.org &lt;http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2380</volume>
          /&gt;, Lugano,
          <source>Switzerland (September</source>
          <volume>09</volume>
          -12
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pelka</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koitka</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Ru¨ ckert, J.,
          <string-name>
            <surname>Nensa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          :
          <article-title>Radiology Objects in COntext (ROCO): A Multimodal Image Dataset</article-title>
          .
          <source>In: MICCAI Workshop on Large-scale Annotation of Biomedical data and Expert Label Synthesis</source>
          . pp.
          <fpage>180</fpage>
          -
          <lpage>189</lpage>
          . Granada,
          <string-name>
            <surname>Spain</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Rajpurkar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Irvin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehta</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , et al.:
          <article-title>CheXNet: Radiologist-Level Pneumonia Detection on Chest X-rays with Deep Learning</article-title>
          .
          <source>arXiv:1711.05225</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Very Deep Convolutional Networks for Large-Scale Image Recognition</article-title>
          .
          <source>arXiv:1409.1556</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Soldaini</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goharian</surname>
          </string-name>
          , N.:
          <article-title>QuickUMLS: A Fast, Unsupervised Approach for Medical Concept Extraction</article-title>
          . In: MedIR workshop (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bagheri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Summers</surname>
            ,
            <given-names>R.M.:</given-names>
          </string-name>
          <article-title>ChestX-ray8: Hospitalscale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases</article-title>
          .
          <source>In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition</source>
          . pp.
          <fpage>2097</fpage>
          -
          <lpage>2106</lpage>
          . Honolulu,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>ImageSem at ImageCLEF 2018 Caption Task: Image Retrieval and Transfer Learning</article-title>
          .
          <source>In: CLEF2018 Working Notes. CEUR Workshop Proceedings</source>
          . Avignon, France (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>