<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring the Clinical Significance of the Textual Descriptions Derived from Medical Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xuwen Wang</string-name>
          <email>wang.xuwen@imicams.ac.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhen Guo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chunyuan Xu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lianglong Sun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiao Li</string-name>
          <email>li.jiao@imicams.ac.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College</institution>
          ,
          <addr-line>Beijing, 100020</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Medical Information and Library, Chinese Academy of Medical Sciences and Peking Union Medical</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Life Science, Beijing Institute of Technology</institution>
          ,
          <addr-line>Beijing, 100081</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the work of ImageSem group in the ImageCLEFmed Caption 2021 task. In the concept detection subtask, we employed the transfer learning-based multi-label classification model as our baseline. We also trained multiple fine-grained MLC models based on manually annotated semantic categories, such as Imaging Type, Anatomic Structure, and Findings, which may reveal clinical insights of radiology images. We submitted 9 runs to the concept detection subtask, and achieved the F1 Score of 0.419, which ranked 3rd in the leader board. In the caption prediction subtask, our first method simply combines detected concepts according to the sentence patterns. The second method used a dual path CNN model for matching images and captions. We submitted 4 runs to the caption prediction subtask, and achieved the BLEU score of 0.257, which ranked 6th among the participating teams. Concept detection, caption prediction, multi-label classification, fine-grained semantic Information and Library, Chinese Academy of Medical Sciences, our Image Semantics group (ImageSem)</p>
      </abstract>
      <kwd-group>
        <kwd>labelling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The medical track of ImageCLEF[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]aims at promoting the research of computer-aided radiology image
analysis and interpretation. ImageCLEFmed Caption 2021[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]is one of the ImageCLEFmedical tasks,
which focus on mapping visual information of radiology images to textual descriptions. It consists of two
subtasks, namely Concept Detection and Caption Prediction. On behalf of the Institute of Medical
participated in both of the two subtasks.
      </p>
      <p>
        The concept detection subtask aims to identify the UMLS [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]Concept Unique Identifiers (CUIs) for a given
radiology image. Following our previous work on ImageCLEF 2019 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we employed transfer
learningbased multi-label classification (MLC) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] as our first method for modeling all the concepts in the
training set. In order to annotate each image with more meaningful concepts, we manually classified the
concepts into three categories according to their UMLS semantic types, namely Imaging Type, Anatomical
Structure, and Findings. Then we trained MLC sub models separately for different concept categories as
our second method.
      </p>
      <p>
        The caption prediction subtask asks participants to generate coherent captions for the entirety of an image,
which requires higher accuracy and semantic interpretability of expression. We also employed two
methods for caption prediction. The first method was the pattern-based combination of concepts identified
in the previous task. The second method was based on the dual path CNN model [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which
      </p>
      <p>2021 Copyright for this paper by its authors.
is commonly used in the image-text retrieval field to match images and captions for instance-level retrieval.
This paper is organized as follows. Section 2 describes the data set of the ImageCLEFmed Caption 2021
task. Section 3 presents the methods for concept detection and caption prediction. Section 4 lists all of our
submitted runs. Section 5 makes a brief summarization.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset</title>
      <p>
        The ImageCLEFmed Caption 2021 task is in its 5th edition this year. Compared with previous years, the
released images were strictly limited to radiology, and the number of images and associated UMLS
concepts were reduced. There were 222,314 images with 111,156 concepts in 2018 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], 70,786 radiology
images with 5,528 concepts in 2019 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], 80,747 radiology images with 3,047 concepts in 2020 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and
3,256 radiology images with 1,586 concepts and 3,256 captions in 2021. Another improvement of the
dataset is that the validation set and test set include real radiology images annotated by medical doctors,
which increased the medical context relevance of the UMLS concepts. For one thing, the reduction of
concept scope and size lowered the difficulty of concept identification. For another thing, the reduction of
image size is not conducive to training large-scale neural networks.
      </p>
      <p>The organizers provided UMLS concepts along with their imaging modality information, for training
purposes. We observed that most images were assigned with concepts indicating the diagnostic procedure
or medical device, and some images were accompanied by concepts indicating the body part, organ or
clinical findings. As shown in Table 1, the high-frequency concepts are concentrated in several specific
semantic types. For our experiments, we utilized this feature and manually classified three concept
categories for building fine-grained multi-label classification models.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>This section describes methods we used in two subtasks. Fig. 1 shows the workflow and submissions of
ImageSem in ImageCLEFmed Caption 2021.</p>
    </sec>
    <sec id="sec-4">
      <title>Concept detection</title>
      <p>In the concept detection subtask, for one thing, we employed the transfer learning-based multi-label
classification model to identify overall concepts; for another thing, we paid more attention to the distinction
of labels with different semantic types, and focus on three major categories of concepts, which may reveal
clinical insights of radiology images.</p>
    </sec>
    <sec id="sec-5">
      <title>3.1.1. Transfer learning-based multi-label classification</title>
      <p>
        In our previous work, we employed a transfer learning-based multi-label classification model to assign
multiple CUIs to a specific medical image. This is a classic approach under the condition of limited tag
size and high frequency concepts. In our first method, for modeling overall concepts, we applied the
Inception-V3[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and DenseNet 201[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]which were pre-trained on the ImageNet datasets [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The fully
connected layer before the last softmax layer was replaced and the parameters of the pre-trained CNN
model were transferred as the initial parameters of our MLC model.
      </p>
      <p>During the training process, we collected 1,586 CUIs from both of training set and validation set as our
labels. Then we fine-tuned the models on the validation set. For a given test image, concepts of high
probabilities above the threshold were selected as the prediction labels. Empirically, we adjusted the
threshold gradually from 0.1 to 0.7 on the basis of the validation set.</p>
    </sec>
    <sec id="sec-6">
      <title>3.1.2. Fine-grained multi-label classification</title>
      <p>In this method, according to the UMLS semantic types, we go further to divide ImageCLEF concepts into
four semantic categories, namely Imaging Type (IT), Anatomic Structure (AS), Findings (FDs) and others.
Based on the official training set and validation set, we reprocessed the images and associated concepts
via our medical image annotation platform.</p>
      <p>
        As shown in Figure 2, for a given radiology image, there are three sources of related concepts. The first
one is ImageCLEF concepts annotated by concept extraction tools and medical doctors. These concepts
are semantically related, but often incomplete, since many images having only one concept. The second
source of concepts are automatically annotated from the given image captions, using the Metamap tool
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]together with UMLS 2020ab. These concepts are more comprehensive, but also introduce noise words.
The third source is the expanding concepts that we summarize manually based on the high- frequency
ImageCLEF concepts, for labelling convenience purpose.
      </p>
      <p>We invited graduate students majoring in medical imaging to label images with reference to visual
information, caption descriptions and the above three sources of concepts. The labeling protocol is that
each radiology image was assigned with at least one IT label, zero or more AS labels, and zero or more
FDs labels. Specifically, ImageCLEF concepts that are difficult to be classified to the above categories,
can be assigned to the ‘Others’.</p>
      <p>Then we build three image-concept sub collections for training fine-grained MLC models. These
collections have same training and validation images, but differentiate in related concepts. Table 2 shows
the distribution of different concept categories.</p>
      <p>We verified our MLC models based on the re-annotated validation set. The experimental results showed
that our model performs well on the prediction of Imaging Type labels, with F1 score of 0.9273. However,
the predictions for the other two kinds of labels are far from satisfactory. One possible reason is that there
are few images but too many labels for training. It is intuitively understandable that images of the same or
similar cases would have a similar anatomic structure or medical findings label. Whereas the data
characteristics of this subtask are obviously not suitable for specific diseases, which raised the difficulty
to predict accurate body part, organ, or findings.</p>
    </sec>
    <sec id="sec-7">
      <title>Caption Prediction</title>
    </sec>
    <sec id="sec-8">
      <title>3.2.1. Pattern-based caption generation</title>
      <p>For generating reasonable image captions, the first method was the pattern-based combination of
concepts identified in the previous task. We designed a simple sentence pattern based on the
characteristic of captions in the training and validation set, see Table 3. Obviously, the accuracy of
concept detection results would directly determine the quality of sentence generation.
&lt;image&gt; of &lt;body&gt; demonstrate / show
/suggest &lt;findings&gt;
&lt;image&gt; demonstrate / show / suggest
&lt;findings&gt; in/of/within &lt;body&gt;</p>
      <p>Sample
synpic24243: Sagittal T1-weighted image of the
cervical spine demonstrates cord expansion.
synpic19193: Lateral radiograph of the skull shows
lytic lesions in the temporoparietal region.</p>
    </sec>
    <sec id="sec-9">
      <title>3.2.2. Image matching for caption prediction</title>
      <p>In this method, we employed the algorithm commonly used in the image-text retrieval field to match
images and captions for instance-level retrieval. It is based on an unsupervised assumption that every
image/test group can be viewed as one class, so each category is equivalent to 1+m (1 image vs m
descriptions) samples.</p>
      <p>
        We use the model proposed by Zheng[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which contains two convolutional neural networks to learn visual
and textual representations simultaneously. When testing, we first extract the image feature by image CNN
and the text feature by text CNN, and then use the cosine distance to evaluate the similarity between the
image and candidate sentences.
● Data Preparation
In this field, most existing works use two generic retrieval datasets (Flickr30k and MSCOCO), which have
more than 30,000 images. Each image in these datasets is annotated with around five sentences. So we
expanded the caption from 1 to 5 sentences per image. Specifically, we first translate the caption into
Chinese, Japanese, German, French and then translate back to English. We use GoogleNews-vectors
word2vec model trained by Google, which contains 2,000,000 words to get our dictionary. Our dictionary
ultimately have 6039 words, each has a 1*300 vector corresponding to it.
● Train
Given a sentence, we convert it into code T of size n * d, where n is the length of the sentence, and d
denotes the size of the dictionary. T is used as the input for the text CNN. Given an image, we resize it to
224 × 224 pixels, which are randomly cropped.
      </p>
      <p>The training process includes two stages: in the first stage, we use the instance loss to learn fine- grained
differences between intra-modal samples with similar semantics. in the second stage, we use the ranking
loss to focus on the distance between the two modalities to build the relationship between the image and
text.
● Test
In this experiment, we use 16,280 sentences from training set and validation set as candidate captions, each
sentence is corresponding to its text feature extracted by text CNN. For each test image, we first extract
the image feature by image CNN, and then use the cosine distance to evaluate the similarity between the
image and candidate sentences.</p>
      <p>When we use the model trained on ImageCLEF datasets, we get the almost same top 10 sentences from
16,280 candidate captions, because the features learned by text CNN between each captions is not
discriminative. However, when we test it on the model trained by MSCOCO datasets, each query image
can get different sentences, but they do not match either.</p>
    </sec>
    <sec id="sec-10">
      <title>4. Submitted runs</title>
      <p>concepts of Imaging Types achieved the best F1 score of 0.419, indicating the high precision and coverage
of this kind of concepts in radiology images. As to the concepts from other types and baseline results, they
introduce more unmentioned words and reduce the overall score. However, in view of our experience on
manual labeling, we believe that some unmentioned words may also be helpful in interpreting the given
image. Figure 3 shows two examples of our method on the validation set.
03ImagingTypes
02Comb_ImagingTypes_Baseline
07Intersect_06_baseline
04Comb_ImagingTypes_AnatomicStructure
05Comb_ImagingTypes_MedicalFindings
06Comb_ImagingTypes_AnatomicStructure_Findings
08AnatomicStructure
09Findings
01baseline</p>
    </sec>
    <sec id="sec-11">
      <title>5. Conclusions</title>
      <p>This paper presents the participation of the ImageSem Group at the ImageCLEFmed Caption 2021 task.
We tried different strategies for both subtasks. In the concept detection subtask, we used the transfer
learning-based MLC model to detect overall 1,586 concepts. We also trained multiple fine-grained MLC
models based on manually annotated semantic categories. One of the lessons is that we have become much
clearer about which concepts are clinically relevant to radiology images, and in order to obtain better
predictions, the semantic labels of images should be more focused and specific. Furthermore, how to
generate a readable description based on clear and clinically meaningful concepts, is still worth exploring.</p>
    </sec>
    <sec id="sec-12">
      <title>6. Acknowledgements</title>
      <p>This work has been supported by the National Natural Science Foundation of China (Grant No.
61906214), the Beijing Natural Science Foundation (Grant No. Z200016).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Peteri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sarrouti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jacutprakart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kozlovski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Liauchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dicente Cid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chamberlain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Campello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Moustahfid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          ,
          <article-title>The 2021 ImageCLEF Benchmark: Multimedia Retrieval in Medical, Nature</article-title>
          ,
          <source>Internet and Social Media Applications</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>616</fpage>
          -
          <lpage>623</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <article-title>Overview of the ImageCLEFmed 2021 concept &amp; caption prediction task</article-title>
          , in: CLEF2021 Working Notes, 'CEUR' Workshop Proceedings, CEURWS.org, Bucharest, Romania,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>O.</given-names>
            <surname>Bodenreider</surname>
          </string-name>
          ,
          <article-title>The unified medical language system (umls): integrating biomedical terminology</article-title>
          ,
          <source>Nucleic Acids Research</source>
          <volume>32</volume>
          (
          <year>2004</year>
          )
          <fpage>267</fpage>
          -
          <lpage>270</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Imagesem at imageclefmed caption 2019 task: a two-stage medical concept detection strategy</article-title>
          ,
          <source>Lugano</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vanhoucke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ioffe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shlens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wojna</surname>
          </string-name>
          ,
          <article-title>Rethinking the inception architecturefor computer vision</article-title>
          , in: IEEE,
          <year>2016</year>
          , pp.
          <fpage>2818</fpage>
          -
          <lpage>2826</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Laurens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Q.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          ,
          <article-title>Densely connected convolutional networks</article-title>
          ,
          <source>in: IEEE Computer Society</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Garrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-D.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <article-title>Dual-path convolutional image-text embedding with instance loss</article-title>
          ,
          <source>ACM Transactions on Multimedia Computing, Communications, and Applications 2</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          . doi: https://doi.org/10.1145/3383184 .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Imagesem at imageclef 2018 caption task: Image retrieval and transfer learning</article-title>
          .
          <source>in: Clef2018 working notes</source>
          , Avignon, France,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kougia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pavlopoulos</surname>
          </string-name>
          , Androutsopoulos, Aueb nlp group at imageclefmed caption
          <year>2019</year>
          . in: Clef2019 working notes, CEUR-WS.org, Lugano, Switzerland (
          <year>2019</year>
          ),
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>I. B</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Péteri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. B.</surname>
          </string-name>
          , M. G,
          <article-title>Overview of the imageclef 2020: Multimedia retrieval in medical, lifelogging, nature, and internet applications</article-title>
          , Lecture Notes in Computer Science (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Russakovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satheesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karpathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khosla</surname>
          </string-name>
          , Bernstein,
          <article-title>Imagenet large scale visual recognition challenge</article-title>
          ,
          <source>International Journal of Computer Vision</source>
          (
          <year>2014</year>
          )
          <fpage>1</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Aronson</surname>
          </string-name>
          ,
          <article-title>Effective mapping of biomedical text to the umls metathesaurus: the metamap program</article-title>
          ,
          <year>2001</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>