=Paper=
{{Paper
|id=Vol-2696/paper_109
|storemode=property
|title=Concept Detection in Medical Images using Xception Models - TUCMC at ImageCLEFmed 2020
|pdfUrl=https://ceur-ws.org/Vol-2696/paper_109.pdf
|volume=Vol-2696
|authors=Nisnab Udas,Frederik Beuth,Danny Kowerko
|dblpUrl=https://dblp.org/rec/conf/clef/UdasBK20
}}
==Concept Detection in Medical Images using Xception Models - TUCMC at ImageCLEFmed 2020==
<pdf width="1500px">https://ceur-ws.org/Vol-2696/paper_109.pdf</pdf>
<pre>
      Concept Detection in Medical Images using
                 Xception Models -
         TUC MC at ImageCLEFmed 2020

           Nisnab Udas1 , Frederik Beuth2         , and Danny Kowerko3
                    1
                      Chemnitz University of Technology, Germany
                              nisnab.udas@gmail.com
                    2
                      Chemnitz University of Technology, Germany
                             beuth@cs.tu-chemnitz.de
                    3
                      Chemnitz University of Technology, Germany
                         danny.kowerko@cs.tu-chemnitz.de


        Abstract. This paper summarizes the approach and the results of the
        submission of the Media Computing group from the Chemnitz University
        of Technology (TUC MC) at ImageCLEFmed Caption task, launched by
        ImageCLEF 2020. In the task, contents of medical images have to be
        detected, for the goal of a better diagnosis of medical diseases and condi-
        tions. In the context of a master thesis by Nisnab Udas, Xception model,
        which slightly outperformed InceptionV3 on the ImageNet dataset in
        2017, was adapted to this caption task. Out-of-the-box, his approach
        achieved an F1 score of 35.1% compared to the best contribution with
        39.4%, which places our team in the top-5. Part of his strategy was to
        optimize the confidence threshold, and to bring in a max pooling in the
        last layer which reduced the number of parameters making the model
        less prone to overfitting.


1     Introduction
Computer science challenges have been established in the last decades to advance
diverse problems in text, audio and video processing [4]. In this tradition, chal-
lenges are organized within the established ImageCLEF or LifeCLEF lab since
2003 and 2014, respectively. Since 2003, medical (retrieval) tasks have been part
of the challenge and been continuously developed into 3 subtasks, where one
is called medical concept detection since 2017 [2,13,10,6]. It contains automatic
image captioning and scene understanding to identify the presence and location
of relevant concepts in a large corpus of medical images. The latter stem from
the PubMed Open Access subset containing 1,828,575 archives. A total number
of 6,031,814 image - caption pairs were extracted. A combination of automatic
filtering with deep learning systems and manual revisions was applied to focus
    Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25 Septem-
    ber 2020, Thessaloniki, Greece.
Table 1. Columns 2 to 4 show the F1 metric results in % for the best submission
run of the top 3 teams at medical caption subtask since 2017. Note, that the number
of registrations is usually considerably larger than the number of teams who submit
results (last column).

                                                        No. of
                     Year      1st     2nd      3rd
                                                        teams
                     2017     12.1      9.6     5.0        4
                     2018     25.0     18.0     17.3       5
                     2019     28.2     26.6     22.4      11
                     2020     39.4     39.2     38.1       7


merely on radiology images and non-compound figures. The origin of the biomed-
ical images distributed in this challenge is a subset from the extended ROCO
(Radiology Objects in COntext) dataset [11]. In ImageCLEF 2020, additional
information regarding the modalities of all 80,747 images was distributed [6].
    Evaluation is conducted in terms of set coverage metrics such as precision,
recall, and combinations thereof. Leaderboards utilize the F1 metric summarized
in Table 1. The results prove that the task remains challenging even though a
continuous improvement from year to year is to be noted. The results of this
year bring the top-3 group closer together for the first time. In the caption
task of 2019 [10], Kougia et al. won the competition by combining their CNN
(Convolutional Neural Network) image encoders with an image retrieval method
or a feed-forward neural network and achieved an F1 score of 28.2% [7]. Xu et al.
applied a multilabel classification model based on ResNet [5] and achieved 26.6%
[14]. Guo et al. achieved 22.4% F1 score with a two-stage concept including the
medical image pre-classification based on body parts with AlexNet ([8]) and the
transfer learning-based multi-label classification model based on Inception V3
and Resnet152 [3].

2   Data Analysis
The amount of images has increased from 2019 to 2020. The concept detection
task this year, contains training and validation images in 7 separate folders.
In total, there are 64,753 training images, 14,970 validation images and 3,534
test images, respectively. Concept frequency was reduced from 5,528 last year to
3,047 in 2020 as low occuring concepts were removed by the organizers. Top-20
concepts in our training images are shown in Fig. 1. Concepts ’C0040405’ and
’C0040398’ both occur 25022 images in training images. The figure clearly shows
how our concepts are imbalanced in the dataset.

In Fig. 2, we show distribution of concept length in training dataset. Maxi-
mum number of images, 5,248, to be specific, have only 2 concepts. The second
and third largest group of images have 3 and 4 concepts per image, respectively.
The highest number of concepts occurring in an image is 140 which occurs one
time.
                    2 5 0 0 0
                                                                                                                                             T o p -2 0
                    2 0 0 0 0
O c c u rra n c e


                    1 5 0 0 0


                    1 0 0 0 0


                      5 0 0 0


                               0
                                        05
                                              98
                                                    99
                                                          85
                                                                18
                                                                       78
                                                                              92
                                                                                    53
                                                                                          66
                                                                                                 65
                                                                                                          74
                                                                                                               33
                                                                                                                    09
                                                                                                                         04
                                                                                                                                   79
                                                                                                                                        11
                                                                                                                                             51
                                                                                                                                                  91
                                                                                                                                                          20
                                                                                                                                                                 26
                                       04
                                             03
                                                   32
                                                         44
                                                               16
                                                                      29
                                                                            87
                                                                                  18
                                                                                        50
                                                                                              76
                                                                                                      02
                                                                                                            38
                                                                                                                  07
                                                                                                                        03
                                                                                                                              22
                                                                                                                                     17
                                                                                                                                           83
                                                                                                                                                 05
                                                                                                                                                       96
                                                                                                                                                                15
                                   04
                                        04
                                              04
                                                    02
                                                          04
                                                                 00
                                                                       01
                                                                               02
                                                                                    02
                                                                                           22
                                                                                                     03
                                                                                                          26
                                                                                                               01
                                                                                                                    03
                                                                                                                         32
                                                                                                                                   77
                                                                                                                                        03
                                                                                                                                             07
                                                                                                                                                  17
                                                                                                                                                          44
                         C 0
                                       C 0
                                             C 0
                                                   C 0
                                                         C 0
                                                               C 0
                                                                      C 0
                                                                            C 0
                                                                                  C 0
                                                                                        C 0
                                                                                              C 0
                                                                                                      C 0
                                                                                                            C 0
                                                                                                                  C 0
                                                                                                                        C 1
                                                                                                                               C 0
                                                                                                                                     C 0
                                                                                                                                           C 0
                                                                                                                                                 C 0
                                                                                                                                                       C 1
                                                                                         C o n c e p t n a m e s

                                        Fig. 1. Frequency distribution of Top-20 concepts.


                    6 0 0 0

                                                                                                                                     T ra in in g im a g e s
                    5 0 0 0


                    4 0 0 0
O c c u rre n c e


                    3 0 0 0


                    2 0 0 0


                    1 0 0 0


                           0
                                   0                            1 0                            2 0                           3 0                          4 0        1 4 0
                                                                           N u m b e r o f c o n c e p ts p e r im a g e
                                       Fig. 2. Frequency distribution of the concepts length.
Fig. 3. Model architecture describing the mechanism of concept detection from pre-
trained Xception model.


3   Proposed System
Our deep learning based architecture was based on Xception architecture [1] and
is shown in Fig. 3. The Xception model slightly outperforms Inception V3 on
the ImageNet dataset in 2017, and was chosen due to this performance and as
our preliminary test found it well working on our medical detection task. For
fine-tuning concerning model, we utilize transfer learning and use weights pre-
trained on ImageNet Dataset [12]. We then eliminated the top classifier layer as
is required in transfer learning. We froze the entire Xception model and made
only the last six layers trainable.
    Generally, as in transfer learning, before adding classifier to a pre-trained
model, the layer is flattened. Flattening transforms a 2D matrix of features to
the vector, which can be provided to a fully-connected layer (FC layer). In our
case, we used a max pooling layer of window size (2,2) followed by the dropout
layer to reduce the number of free parameters and facilitate object-size dependent
pooling as a special trick. Afterwards, the usual flattening layer is added.
    Subsequently to adding the flattening layer, we used the ReLU activation
function, followed by the dropout layer. The data contains 3,047 concepts in
total, thus our final FC layer contains 3,047 units and sigmoid as our activation
function because we are dealing with a multi-label problem. The white rings
in the FC layer represent neurons (Fig. 3). The top lambda layer extracts top-
100 highest probabilities. These probabilities are compared against a threshold
Table 2. Model summary with layers and feature maps changing shapes as it passes
through these layers.

                    Layer                   Output shape
                    Xception               16 × 5 × 5×2048
                    Max pooling            16 × 2 × 2×2048
                    Dropout                16 × 2 × 2×2048
                    Flatten                   16 × 8192
                    Activation                16 × 8192
                    Dropout                   16 × 8192
                    Dense                     16 × 3047


                 Table 3. Influence of the high-level max-pooling.

           Method                 Mean F1 score      Free trainable parameters
          Base model                 0.349                   29,712,871
      Without max-pooling            0.345                  160,758,247


value, e.g. t=0.12, which generates boolean values for these 100 probabilities. The
lower lambda layer gives the index of these individual neurons/concepts. In data
processing, these indices are used to locate only the neurons with ’True’ boolean
value. In the data processing part, results are reformatted into the competition
format. Table 2 shows how feature maps change shape after passing through
each layer.
    For training our network, a proper optimization is necessary and we conduct
the following optimization methods. The major contribution to a satisfying F1
score had the optimization of the confidence threshold, along with the max-
pooling, as shown in the next sections. Besides these two methods, we deploy
other minor approaches to raise the performance. These include the tuning of
the drop-out level and the data augmentation level. Drop-out is a well-suitable
technique to avoid overfitting and the values were optimized for both neuronal
layers of drop-out (Table 2) via conducting a cross-test of 25 different combi-
nations. The best configuration has a drop-out value of 0.2 for the first layer
and 0.5 for the second layer. Additionally, data augmentation was tuned, also
increasing the F1 score value by 0.01−0.02 depending on the configuration. Each
of the methods raise the F1 score by 0.01 − 0.02 only, but in total, these effects
add up to an elevate of the F1 score by a level of 0.03 − 0.05.

4   Results
One of the ideas for improving the original Xception model [1] was the intro-
duction of an additional max-pooling operation before the highest layer. It is
shown in Table 2 in the second entry. This particular max-pooling operation
reduces, on the application set, the spatial resolution, inducing a reduction of
the free parameters in the next layer. In our dataset, the layer before the max-
pooling operation had a 5 × 5 resolution, which is reduced by a 2 × 2 pooling to a
                                   0 .8
                                                                                               A c c u ra c y
                                   0 .7                                                         F 1

                                   0 .6

                                   0 .5
       E v a lu a tio n m e tric


                                   0 .4

                                   0 .3

                                   0 .2

                                   0 .1

                                   0 .0

                                          0 .0   0 .2   0 .4                     0 .6   0 .8          1 .0
                                                               T h re s h o ld
      Fig. 4. Variation in F1 score and accuracy with respect to the threshold.


2 × 2 layer resolution. This operation reduces the free trainable parameters from
160, 758, 247 parameters to 29, 712, 871 parameters in total, which fabricates a
more robust and stable model. As a second argument, the operation allows the
recognition of concepts in the image more independently from the position. In
the original ImageNet data set, objects are larger on average than in our medical
image data set. To compensate for this difference in size, we increase the pooling
as the objects in the original dataset cover large portions of the image, while
our concepts are typically appearing in a smaller region. The pooling operation
allows both, a recognition independent of the concept’s place, and smaller object
sensitive filters facilitating the recognition of smaller objects. The difference in
F1 score and performance, i.e. free trainable parameters, is shown in Table 3.


4.1   Confidence threshold optimization

Fig. 4 shows the threshold variation against accuracy and F1. A classical accu-
racy metric is not optimal for training a model in this challenge as we have a
large class imbalance. Therefore, the F1 score metric is used.
    Confidence threshold selection plays a crucial role in the multi-label problem.
The threshold determines over which predicted probability a concept is mapped
to our image or not. When a class is predicted, the network outputs a probability
and only probabilities exceeding a certain threshold are counted as that this
concept is in that image. Given that our model is well trained, an unoptimized
threshold may still have a substantial effect on our result. And determining the
optimum threshold can often be tricky.
    Therefore, we varied the threshold systematically and tuned the value of the
threshold on the validation set, shown in Fig. 4. The maximum performance
with respect to the confidence threshold is identified to range around 0.1 to
0.25. Hence, we submitted several runs with different threshold values between
θ = 0.12 and θ = 0.25 (see Table 4). As expected from the Fig. 4, improvements
of F1 are within a large amount.


4.2   Optimization techniques

There are plenty of ways of managing limited data volume and imbalanced
datasets such as eliminating outliers, expanding the data set, augmentation,
etc. In the medical image domain, few type of disease or conditions occurs less
frequently to humans resulting in less sample numbers. Thus, to tackle these
problems, we decided to use image data augmentation. Available methods are
for example in Keras the following, which we employ as parameterized below:

 – Rotation is performed by randomly rotating an image around its center of
   up to 5◦ .
 – Vertical and horizontal flip. Flipping images is one of the most widely im-
   plemented techniques popularized by [8].
 – Height and width shift range: The images are randomly shifted horizontally
   or vertically up to 5% of the total height and width respectively.
 – Zoom: Objects in images are randomly zoomed in a range of ±5%.
 – Brightness shift: The image is randomly darkening or brightening in range
   of 80 − 120% of the initial brightness.
 – Samplewise center: To eliminate the problem of vanishing gradients or sat-
   urating values, data are normalized in such a way that the mean value of
   each data sample becomes 0.
 – Samplewise standard normalization: This pre-processing method approaches
   the same concept as sample-wise centering, but rather it fixes the standard
   deviation to 1.

   The enabling of data augmentation increases the F1 score and contributes to
a more robust working of the system.

The competition requires only 100 concepts per image. Therefore, to ensure
that, probabilities were sorted in descending order and Top-100 probabilities
were selected.


4.3   Run description

We submitted ten runs (Table 4). The runs often utilize the same base-structure,
an Xception model, and all use transfer learning from ImageNet. The runs vary in
meta-parameters as we tested different ones. We vary primarily: (i) the threshold
in the last layer, (ii) slightly different base-models, and (iii) with and without
max-pooling in the highest layers.
        Table 4. Test results of our 10 submitted runs. For details see text.

      Run id    Method              Name            Mean F1 score Rank
      68077 Early stopping     model thr0 18.csv       0.351       20
      68078 CNN2, θ = 0.25 streamlined1 thr0 25.csv    0.349       21
      68034 CNN2, θ = 0.20 streamlined1 thr0 20.csv    0.349       22
      68074 CNN2, θ = 0.15     streamlined1.csv        0.349       23
      68029 CNN1, θ = 0.20 basemodel thr0 20.csv       0.347       24
      68045  Slow learning model low lr thr0 20.csv    0.345       25
      68067 No max-pooling streamlined1 nomax.csv      0.345       27
      68024 CNN1, θ = 0.15      basemodel.csv          0.343       28
      68073 CNN2, θ = 0.12 streamlined1 thr0 12.csv    0.342       29
      68076 Exp. Normalizing model weighting.csv       0.332       32


Run ID 1/68024 We deploy an Xception model and utilize transfer learning
  from ImageNet. We ran our model for N=100 epochs and set the learning
  rate to 1e-3. The model uses the confidence threshold theta in the last layer
  to map concepts probabilities to true/false for the concepts. We tuned the
  threshold to 0.15 from the validation set, selected the top 100 concepts and
  submitted our results.
Run ID 2/68029 This run again uses the Xception model and generally the
  configuration of Run ID 1. It optimizes the threshold further, setting it to
  0.20.
Run ID 3/68034 We again deploy an Xception model and utilize transfer
  learning from ImageNet. This submission has a more streamed-lined source
  code structure and explores different meta-parameters: We ran our model
  for N=30 epochs and set the learning rate to 1e-2. We tuned the threshold
  to 0.15 from the validation set.
Run ID 4/68045 We again deploy an Xception model and utilize the configu-
  ration of run 1. This submission explores different meta-parameters: We ran
  our model for N=50 epochs and set the learning rate to 1e-4. We tuned the
  threshold to 0.20 from the validation set.
Run ID 5/68067 This run again uses the Xception model and generally the
  configuration of Run ID 3, while exploring which effect has the max-pooling
  layer before the highest layer. The max-pooling was removed here to show
  the effect.
Run ID 6/68073 This run uses the more streamed-lined source code structure
  and explores again different meta-parameters: We ran our model for N=30
  epochs and set the learning rate to 1e-2. We tuned the threshold to 0.12
  from the validation set.
Run ID 7/68074 This run again uses the Xception model and general the
  configuration of Run ID 6, while tuning the threshold to 0.25.
Run ID 8/68076 We again deploy the standard configuration of Run ID 1.
  This submission focuses on an experimental normalizing of the dataset, but
  was not very successfully.
Run ID 9/68077 We again deploy an Xception model and utilize transfer
  learning from ImageNet. This submission explores an early stop strategy:
  the best, i.e. lowest loss, was used over a run period of N=30 epochs. The
  learning rate was 1e-3 and the threshold was tuned to 0.18.
Run ID 10/68078 This run deploys the more streamed-lined source code struc-
  ture and explores a different threshold: 0.20.

    In Table 5, we listed the top teams with their best F1 score in percent. Our
team, TUC MC occupied 5th position in terms of team ranking with F1 score
of 0.3512.

Table 5. Top-7 team performance in concept detection problem 2020. F1 metrics are
given in percent. [9]

                      Group Name               F1 score
                      AUEB NLP Group             39.4
                      PwC MedCaption 2020        39.2
                      essexgp2020                38.1
                      iml                        37.5
                      TUC MC                     35.1
                      Morgan CS                  16.7
                      saradadevi                 13.5


5   Conclusion and Outlook
Our approach of adapting an Xception model for the medical caption task 2020
achieves an F1 score of 35.1% which is better than the 2019 results and close
to the best contributions of 2020 which achieved 39.4%. Our strategies to rely
on a modern Xception neural network proved to be successful. It also shows
that transfer learning, with weights pre-learned on ImageNet, is very usable on
an indeed different image material such as medical images. The introduction of
a max pooling in the last layer, and to optimize the confidence threshold, have
boosted the performance of our Xception model. Further investigation could lead
in the direction of optimization learning through entropy-based analysis concepts
of neural networks. Moreover, a more in-depth analysis of certain concept classes
might be carried out in order to better understand the errors in the present
classification task.
References
 1. Chollet, F.: Xception: Deep Learning with Depthwise Separable Con-
    volutions. In: 2017 IEEE Conference on Computer Vision and Pattern
    Recognition (CVPR). pp. 1800–1807. IEEE, Honolulu, HI (Jul 2017).
    https://doi.org/10.1109/CVPR.2017.195,               http://ieeexplore.ieee.org/
    document/8099678/
 2. Eickhoff, C., Schwall, I., Muller, H.: Overview of ImageCLEFcaption 2017 – Image
    Caption Prediction and Concept Detection for Biomedical Images. Working Notes
    of CLEF 2017 - Conference and Labs of the Evaluation Forum 1866, 1–10 (Sep
    2017), http://ceur-ws.org/Vol-1866/
 3. Guo, Z., Wang, X., Zhang, Y., Li, J.: ImageSem at ImageCLEFmed Caption 2019
    Task: a Two-stage Medical Concept Detection Strategy. Working Notes of CLEF
    2019 - Conference and Labs of the Evaluation Forum 2380, 1–8 (Sep 2019),
    http://ceur-ws.org/Vol-2380/paper_80.pdf
 4. Hanbury, A., Müller, H., Balog, K., Brodt, T., Cormack, G.V., Eggel, I., Gollub,
    T., Hopfgartner, F., Kalpathy-Cramer, J., Kando, N., Krithara, A., Lin, J., Mercer,
    S., Potthast, M.: Evaluation-as-a-Service: Overview and Outlook. arXiv:1512.07454
    [cs] pp. 1–28 (Dec 2015), http://arxiv.org/abs/1512.07454
 5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.
    In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    pp. 770–778 (2016)
 6. Ionescu, B., Müller, H., Péteri, R., Abacha, A.B., Datla, V., Hasan, S.A., Demner-
    Fushman, D., Kozlovski, S., Liauchuk, V., Cid, Y.D., Kovalev, V., Pelka, O.,
    Friedrich, C.M., de Herrera, A.G.S., Ninh, V.T., Le, T.K., Zhou, L., Piras, L.,
    Riegler, M., l Halvorsen, P., Tran, M.T., Lux, M., Gurrin, C., Dang-Nguyen, D.T.,
    Chamberlain, J., Clark, A., Campello, A., Fichou, D., Berari, R., Brie, P., Doga-
    riu, M., Ştefan, L.D., Constantin, M.G.: Imageclef 2020: Multimedia retrieval in
    medical, lifelogging, nature, and internet applications. In: Experimental IR Meets
    Multilinguality, Multimodality, and Interaction. Proceedings of the 11th Interna-
    tional Conference of the CLEF Association (CLEF 2020), vol. 12260. LNCS Lecture
    Notes in Computer Science, Springer, Thessaloniki, Greece (September 22-25 2020)
 7. Kougia, V., Pavlopoulos, J., Androutsopoulos, I.: AUEB NLP Group at Image-
    CLEFmed Caption 2019. In: Working Notes of CLEF 2019 - Conference and Labs
    of the Evaluation Forum. vol. 2380, pp. 1–8. Lugano, Switzerland (Sep 2019),
    http://ceur-ws.org/Vol-2380/paper_136.pdf
 8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet Classification with Deep
    Convolutional Neural Networks. In: Pereira, F., Burges, C.J.C., Bottou, L.,
    Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25,
    pp. 1097–1105. Curran Associates, Inc. (2012), http://papers.nips.cc/paper/
    4824-imagenet-classification-with-deep-convolutional-neural-networks.
    pdf
 9. Pelka, O., Friedrich, C.M., Garcı́a Seco de Herrera, A., Müller, H.: Overview of the
    ImageCLEFmed 2020 concept prediction task: Medical image understanding. In:
    CLEF2020 Working Notes. CEUR Workshop Proceedings, CEUR-WS.org, Thes-
    saloniki, Greece (September 22-25 2020)
10. Pelka, O., Friedrich, C.M., Seco De Herrera, A.G., Müller, H.: Overview of the
    ImageCLEFmed 2019 Concept Detection Task. In: Working Notes of CLEF 2019
    - Conference and Labs of the Evaluation Forum. vol. 2380, pp. 1–13. Lugano,
    Switzerland (Sep 2019), http://ceur-ws.org/Vol-2380/paper_245.pdf
11. Pelka, O., Koitka, S., Rückert, J., Nensa, F., Friedrich, C.M.: Radiology Ob-
    jects in COntext (ROCO): A Multimodal Image Dataset. In: Stoyanov, D.,
    Taylor, Z., Balocco, S., Sznitman, R., Martel, A., Maier-Hein, L., Duong, L.,
    Zahnd, G., Demirci, S., Albarqouni, S., Lee, S.L., Moriconi, S., Cheplygina,
    V., Mateus, D., Trucco, E., Granger, E., Jannin, P. (eds.) Intravascular Imag-
    ing and Computer Assisted Stenting and Large-Scale Annotation of Biomedi-
    cal Data and Expert Label Synthesis, vol. 11043, pp. 180–189. Springer Inter-
    national Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-01364-6 20,
    http://link.springer.com/10.1007/978-3-030-01364-6_20, series Title: Lec-
    ture Notes in Computer Science
12. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang,
    Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet
    Large Scale Visual Recognition Challenge. Int J Comput Vis 115(3), 211–252
    (Dec 2015). https://doi.org/10.1007/s11263-015-0816-y, http://link.springer.
    com/10.1007/s11263-015-0816-y
13. Seco De Herrera, A.G., Eickhoff, C., Andrearczyk, V., Müller, H.: Overview of
    the ImageCLEF 2018 Caption Prediction Tasks. In: Working Notes of CLEF 2018
    - Conference and Labs of the Evaluation Forum. vol. 2125, pp. 1–12. Avignon,
    France (Sep 2018), http://ceur-ws.org/Vol-2125/invited_paper_4.pdf
14. Xu, J., Liu, W., Liu, C., Wang, Y., Chi, Y., Xie, X., Hua, X.: Concept detection
    based on multi-label classification and image captioning approach - DAMO at
    ImageCLEF 2019. In: Working Notes of CLEF 2019 - Conference and Labs of the
    Evaluation Forum. vol. 2380, pp. 1–10. Lugano, Switzerland (Sep 2019), http:
    //ceur-ws.org/Vol-2380/paper_141.pdf

</pre>