<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>VQA-Med: Overview of the Medical Visual Question Answering Task at ImageCLEF 2019</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Asma Ben Abacha</string-name>
          <email>asma.benabacha@nih.gov</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sadid A. Hasan</string-name>
          <email>sadid.hasan@philips.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vivek V. Datla</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joey Liu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dina Demner-Fushman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Henning Muller</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lister Hill Center, National Library of Medicine</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Philips Research Cambridge</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Applied Sciences Western Switzerland</institution>
          ,
          <addr-line>Sierre</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents an overview of the Medical Visual Question Answering task (VQA-Med) at ImageCLEF 2019. Participating systems were tasked with answering medical questions based on the visual content of radiology images. In this second edition of VQA-Med, we focused on four categories of clinical questions: Modality, Plane, Organ System, and Abnormality. These categories are designed with di erent degrees of di culty leveraging both classi cation and text generation approaches. We also ensured that all questions can be answered from the image content without requiring additional medical knowledge or domain-speci c inference. We created a new dataset of 4,200 radiology images and 15,292 question-answer pairs following these guidelines. The challenge was well received with 17 participating teams who applied a wide range of approaches such as transfer learning, multi-task learning, and ensemble methods. The best team achieved a BLEU score of 64.4% and an accuracy of 62.4%. In future editions, we will consider designing more goal-oriented datasets and tackling new aspects such as contextual information and domain-speci c inference.</p>
      </abstract>
      <kwd-group>
        <kwd>Visual Question Answering</kwd>
        <kwd>Data Creation</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Radiology Images</kwd>
        <kwd>Medical Questions and Answers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Recent advances in arti cial intelligence opened new opportunities in clinical
decision support. In particular, relevant solutions for the automatic interpretation
of medical images are attracting a growing interest due to their potential
applications in image retrieval and in assisted diagnosis. Moreover, systems capable of
understanding clinical images and answering questions related to their content
could support clinical education, clinical decision, and patient education. From a
computational perspective, this Visual Question Answering (VQA) task presents
an exciting problem that combines natural language processing and computer
vision techniques. In recent years, substantial progress has been made on VQA
with new open-domain datasets [
        <xref ref-type="bibr" rid="ref3 ref8">3, 8</xref>
        ] and approaches [
        <xref ref-type="bibr" rid="ref23 ref7">23, 7</xref>
        ].
      </p>
      <p>
        However, there are challenges that need to be addressed when tackling VQA
in a specialized domain such as Medicine. Ben Abacha et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] analyzed some
of the issues facing medical visual question answering and described four key
challenges (i) designing goal-oriented VQA systems and datasets, (ii)
categorizing the clinical questions, (iii) selecting (clinically) relevant images, and (iv)
capturing the context and the medical knowledge.
      </p>
      <p>
        Inspired by the success of visual question answering in the general domain,
we conducted a pilot task (VQA-Med 2018) in ImageCLEF 2018 to focus on
visual question answering in the medical domain [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Based on the success of the
initial edition, we continued the task this year with enhanced focus on a well
curated and larger dataset.
      </p>
      <p>
        In VQA-Med 2019, we selected radiology images and medical questions that
(i) asked about only one element and (ii) could be answered from the image
content. We targeted four main categories of questions with di erent di culty
levels: Modality, Plane, Organ system, and Abnormality. For instance, the rst
three categories can be tackled as a classi cation task, while the fourth category
(abnormality) presents an answer generation problem. We intentionally designed
the data in this manner to study the behavior and performance of di erent
approaches on both aspects. This design is more relevant to clinical decision
support than the common approach in open-domain VQA datasets [
        <xref ref-type="bibr" rid="ref3 ref8">3, 8</xref>
        ] where
the answers consist of one word or number (e.g. yes, no, 3, stop).
      </p>
      <p>In the following section, we present the task description with more details
and examples. We describe the data creation process and the VQA-Med-2019
dataset in section 3. We present the evaluation methodology and discuss the
challenge results respectively in sections 4 and 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Task Description</title>
      <p>In the same way as last year, given a medical image accompanied by a clinically
relevant question, participating systems in VQA-Med 2019 are tasked with
answering the question based on the visual image content. In VQA-Med 2019, we
speci cally focused on radiology images and four main categories of questions:
Modality, Plane, Organ System, and Abnormality. We mainly considered
medical questions asking only about one element: e.g., \what is the organ principally
shown in this MRI?", \in what plane is this mammograph taken?", \is this a
t1 weighted, t2 weighted, or air image?", \what is most alarming about this
ultrasound?").</p>
      <p>All selected questions can be answered from the image content without
requiring additional domain-speci c inference or context. Other questions
including these aspects will be considered in future editions of the challenge, e.g.: "Is
this modality safe for pregnant women?", "What is located immediately inferior
to the right hemidiaphragm?", "What can be typically visualized in this plane?",
"How would you measure the length of the kidneys?"
3</p>
    </sec>
    <sec id="sec-3">
      <title>VQA-Med-2019 Dataset</title>
      <p>We automatically constructed the training, validation, and test sets, by (i)
applying several lters to select relevant images and associated annotations, and (ii)
creating patterns to generate the questions and their answers. The test set was
manually validated by two medical doctors. The dataset is publicly available4.
Figure 1 presents examples from the VQA-Med-2019 dataset.
3.1</p>
      <sec id="sec-3-1">
        <title>Medical Images</title>
        <p>We selected relevant medical images from the MedPix5 database with lters
based on their captions, modalities, planes, localities, categories, and diagnosis
methods. We selected only the cases where the diagnosis was made based on
the image. Examples of the selected diagnosis methods: CT/MRI Imaging,
Angiography, Characteristic imaging appearance, Radiographs, Imaging features,
Ultrasound, Diagnostic Radiology.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Question Categories and Patterns</title>
        <p>We targeted the most frequent question categories: Modality, Plane, Organ
system and Abnormality (Ref:VQA-RAD).</p>
        <p>1) Modality: Yes/No, WH and closed questions. Examples:
{ was gi contrast given to the patient?
{ what is the mr weighting in this image?
{ what modality was used to take this image?
{ is this a t1 weighted, t2 weighted, or air image?</p>
        <p>2) Plane: WH questions. Examples:
{ what is the plane of this mri?
{ in what plane is this mammograph taken?</p>
        <p>3) Organ System: WH questions. Examples:
{ what organ system is shown in this x-ray?
{ what is the organ principally shown in this mri?</p>
        <p>4) Abnormality: Yes/No and WH questions. Examples:
{ does this image look normal?
4 github.com/abachaa/VQA-Med-2019
5 https://medpix.nlm.nih.gov
(a) Q: what imaging method was
used? A: us-d - doppler ultrasound
(b) Q: which plane is the image shown
in? A: axial
(c) Q: is this a contrast or
noncontrast ct? A: contrast
(d) Q: what plane is this?
A: lateral
(e) Q: what abnormality is seen in
the image? A:nodular opacity on the
left#metastastic melanoma
(f) Q: what is the organ system in
this image? A: skull and contents
(g) Q: which organ system is
shown in the ct scan? A: lung,
mediastinum, pleura
(h) Q: what is abnormal in the
gastrointestinal image? A:
gastric volvulus (organoaxial)
{ are there abnormalities in this gastrointestinal image?
{ what is the primary abnormality in the image?
{ what is most alarming about this ultrasound?
Planes (16): Axial; Sagittal; Coronal; AP; Lateral; Frontal; PA; Transverse;
Oblique; Longitudinal; Decubitus; 3D Reconstruction; Mammo-MLO;
MammoCC; Mammo-Mag CC; Mammo-XCC.</p>
        <p>Organ Systems (10): Breast; Skull and Contents; Face, sinuses, and neck; Spine
and contents; Musculoskeletal; Heart and great vessels; Lung, mediastinum,
pleura; Gastrointestinal; Genitourinary; Vascular and lymphatic.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Modalities (36):</title>
        <p>
          Patterns: For each category, we selected question patterns from hundreds of
questions naturally asked and validated by medical students from the VQA-RAD
dataset [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
3.3
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Training and Validation Sets</title>
        <p>The training set includes 3,200 images and 12,792 question-answer (QA) pairs,
with 3 to 4 questions per image. Table 1 presents the most frequent answers per
category. The validation set includes 500 medical images with 2,000 QA pairs.
3.4</p>
      </sec>
      <sec id="sec-3-5">
        <title>Test Set</title>
        <p>A medical doctor and a radiologist performed a manual double validation of the
test answers. A total of 33 answers were updated by (i) indicating an optional
part (8 answers), (ii) adding other possible answers (10), or (iii) correcting the
automatic answer. 15 answers were corrected, which corresponds to 3% of the test
answers. The corrected answers correspond to the following categories:
Abnormality (8/125), Organ (6/125), and Plane (1/125). For abnormality questions,
the correction was mainly changing the diagnosis that is inferred, by the problem
seen in the image. We expect a similar error rate in the training and validation
sets that were generated using the same automatic data creation method. The
test set consists of 500 medical images and 500 questions.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation Methodology</title>
      <p>
        The evaluation of the systems that participated in the VQA-Med 2019 task
was conducted based on two primary metrics: Accuracy and BLEU. We use an
adapted version of the accuracy metric from the general domain VQA6 task
that strictly considers exact matching of a participant provided answer and the
ground truth answer. We calculate the overall accuracy scores as well as the
scores for each question category. To compensate for the strictness of the
accuracy metric, BLEU [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] is used to capture the word overlap-based similarity
between a system-generated answer and the ground truth answer. The overall
methodology and resources for the BLEU metric are essentially similar to last
year's task [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Results and Discussion</title>
      <p>
        Out of 104 online registrations, 61 participants submitted signed end-user
agreement forms. Finally, 17 groups submitted a total of 90 runs, indicating a notable
interest in the VQA-Med 2019 task. Figure 2 presents the results of the 17
participating teams. The best overall result was obtained by the Hanlin team,
6 https://visualqa.org/evaluation.html
achieving 0.624 Accuracy and 0.644 BLEU score. Table 2 gives an overview of
all participants and the number of submitted runs7. The overall results of the
participating systems are presented in Table 3 to Table 4 for the two metrics in
a descending order of the scores (the higher the better). Detailed results of each
run are described in the ImageCLEF 2019 lab overview paper [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
7 There was a limit of maximum 10 run submissions per team. The table includes only
the valid runs that were graded (total# 80 out of 90 submissions)
      </p>
      <p>Turner.JCE
JUST19
Team Pwc Med
Techno
deepak.gupta651
ChandanReddy
Dear stranger
abhishekthanki
IITISM@CLEF
26913
27142
26941
27079
27232
26884
26895
27307
26905</p>
      <p>
        Similar to last year, participants mainly used deep learning techniques to build
their VQA-Med systems. In particular, the best-performing systems leveraged deep
convolutional neural networks (CNNs) like VGGNet [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] or ResNet [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] with a
variety of pooling strategies e.g., global average pooling to encode image features and
transformer-based architectures like BERT [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or recurrent neural networks (RNN) to
extract question features. Then, various types of attention mechanisms are used coupled
with di erent pooling strategies such as multimodal factorized bilinear (MFB) pooling
or multi-modal factorized high-order pooling (MFH) in order to combine multimodal
features followed by bilinear transformations to nally predict the possible answers.
      </p>
      <p>Analyses of the question category-wise8 accuracy in Table 3 suggest that in general,
participating systems performed well to answer modality questions, followed by plane
and organ questions because the possible types of answers for each of these question
categories were nite. However, for the abnormality type questions, systems did not
perform well in terms of accuracy because of the underlying complexity of open-ended
8 Note that the question category-wise accuracy scores are normalized (each divided
by a factor of 4) so that the summation is equal to the overall accuracy.
questions and possibly due to the strictness of the accuracy metric. To compensate
for the strictness of the accuracy, we computed the BLEU scores to understand the
similarity of the system generated answers and the ground-truth answers. The higher
BLEU scores of the systems this year (0.631 best BLEU vs. 0.162 in 2018) further
verify the e ectiveness of the proposed deep learning-based models for the VQA task.
Overall, the results obtained this year clearly denote the robustness of the provided
dataset compared to last year's task.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>We presented the VQA-Med 2019 task, the new dataset, the participating systems, and
o cial results. To ensure that the questions are naturally phrased, we used patterns
from question asked by medical students to build clinically relevant questions
belonging to our four target categories. We created a new dataset for the challenge9 following
goal-oriented guidelines, and covering questions with varying degrees of di culty. A
wide range of approaches have been applied such as transfer learning, multi-task
learning, ensemble methods, and hybrid approaches combining classi cation models and
answer generation methods. The best team achieved 0.644 BLEU score and 0.624
overall accuracy. In future editions we are considering more complex questions that might
include contextual information or require domain-speci c inference to reach the right
answer.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported by the intramural research program at the U.S. National
Library of Medicine, National Institutes of Health.</p>
      <p>We thank Dr. James G. Smirniotopoulos and Soumya Gayen from the MedPix team
for their support.
9 www.crowdai.org/clef tasks/13/task dataset les?challenge id=53</p>
      <p>github.com/abachaa/VQA-Med-2019</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Al-Sadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talafha</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Al-Ayyoub</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jararweh</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Costen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Just at imageclef 2019 visual question answering in the medical domain</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Allaouzi</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benamrou</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          :
          <article-title>An encoder-decoder model for visual question answering in the medical domain</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Antol</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            , J., Mitchell,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zitnick</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>VQA: visual question answering</article-title>
          .
          <source>In: 2015 IEEE International Conference on Computer Vision</source>
          , ICCV 2015, Santiago, Chile, December 7-
          <issue>13</issue>
          ,
          <year>2015</year>
          . pp.
          <volume>2425</volume>
          {
          <issue>2433</issue>
          (
          <year>2015</year>
          ), https://doi.org/10.1109/ICCV.
          <year>2015</year>
          .279
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Gayen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Lau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.J.</given-names>
            ,
            <surname>Rajaraman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>NLM at imageclef 2018 visual question answering in the medical domain</article-title>
          .
          <source>In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum</source>
          , Avignon, France,
          <source>September 10-14</source>
          ,
          <year>2018</year>
          . (
          <year>2018</year>
          ), http://ceur-ws.org/Vol2125/paper 165.pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bounaama</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abderrahim</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          : Tlemcen university at imageclef 2019 visual
          <article-title>question answering task</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of NAACL</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fukui</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>D.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rohrbach</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rohrbach</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Multimodal compact bilinear pooling for visual question answering and visual grounding</article-title>
          .
          <source>In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP</source>
          <year>2016</year>
          , Austin, Texas, USA, November 1-
          <issue>4</issue>
          ,
          <year>2016</year>
          . pp.
          <volume>457</volume>
          {
          <issue>468</issue>
          (
          <year>2016</year>
          ), http://aclweb.org/anthology/D/D16/D16-1044.pdf
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khot</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Summers-Stay</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Making the V in VQA matter: Elevating the role of image understanding in visual question answering</article-title>
          .
          <source>In: 2017 IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2017</year>
          ,
          <article-title>Honolulu</article-title>
          ,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA, July
          <volume>21</volume>
          -
          <issue>26</issue>
          ,
          <year>2017</year>
          . pp.
          <volume>6325</volume>
          {
          <issue>6334</issue>
          (
          <year>2017</year>
          ), https://doi.org/10.1109/CVPR.
          <year>2017</year>
          .670
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farri</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Lungren</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of imageclef 2018 medical domain visual question answering task</article-title>
          .
          <source>In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum</source>
          , Avignon, France,
          <source>September 10-14</source>
          ,
          <year>2018</year>
          . (
          <year>2018</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2125</volume>
          /paper 212.pdf
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In: 2016 IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2016</year>
          ,
          <string-name>
            <surname>Las</surname>
            <given-names>Vegas</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NV</surname>
          </string-name>
          , USA, June 27-30,
          <year>2016</year>
          . pp.
          <volume>770</volume>
          {
          <issue>778</issue>
          (
          <year>2016</year>
          ), https://doi.org/10.1109/CVPR.
          <year>2016</year>
          .90
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Peteri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cid</surname>
            ,
            <given-names>Y.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klimuk</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tarasau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben</surname>
            <given-names>Abacha</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Datla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Dang-Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.T.</given-names>
            ,
            <surname>Piras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.T.</given-names>
            ,
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.M.</given-names>
            ,
            <surname>de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.G.S.</given-names>
            ,
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Kavallieratou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>del Blanco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.R.</given-names>
            , Rodr guez, C.C.,
            <surname>Vasillopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Karampidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Chamberlain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Campello</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>ImageCLEF 2019: Multimedia retrieval in medicine, lifelogging, security and nature</article-title>
          . In:
          <article-title>Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the 10th International Conference of the CLEF Association (CLEF</source>
          <year>2019</year>
          ),
          <source>LNCS Lecture Notes in Computer Science</source>
          , Springer, Lugano,
          <source>Switzerland (September 9-12</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kornuta</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rajan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shivade</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Asseman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozcan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Leveraging medical visual question answering with supporting facts</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gayen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben</surname>
            <given-names>Abacha</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.:</surname>
          </string-name>
          <article-title>A dataset of clinically generated visual questions and answers about radiology images</article-title>
          .
          <source>Scienti c Data</source>
          <volume>5</volume>
          (
          <issue>180251</issue>
          ) (
          <year>2018</year>
          ), https://www.nature.com/articles/sdata2018251
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ou</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Che</surname>
          </string-name>
          , J.:
          <article-title>Vqa-med: An xception-gru model</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Papineni</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roukos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ward</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>W.J.:</given-names>
          </string-name>
          <article-title>BLEU: a method for automatic evaluation of machine translation</article-title>
          .
          <source>In: Proceedings of the 40th annual meeting on association for computational linguistics</source>
          . pp.
          <volume>311</volume>
          {
          <fpage>318</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gadgil</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verma</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Medical visual question answering at imageclef 2019- vqa med</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosen</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          :
          <article-title>Deep multimodal learning for medical visual question answering</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          .
          <source>In: 3rd International Conference on Learning Representations, ICLR</source>
          <year>2015</year>
          , San Diego, CA, USA, May 7-
          <issue>9</issue>
          ,
          <year>2015</year>
          , Conference Track Proceedings (
          <year>2015</year>
          ), http://arxiv.org/abs/1409.1556
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Spanier</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Lstm in vqa-med, is it really needed? validation study on the imageclef 2019 dataset</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Thanki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Makkithaya</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Mit manipal at imageclef 2019 visualquuestion answering in medical domain</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Vu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sznitman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nyholm</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lfstedt</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Ensemble of streamlined bilinear visual question answering models for the imageclef 2019 challenge in the medical domain</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Zhejiang university at imageclef 2019 visual question answering in the medical domain</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.J.:</given-names>
          </string-name>
          <article-title>Stacked attention networks for image question answering</article-title>
          .
          <source>In: 2016 IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2016</year>
          ,
          <string-name>
            <surname>Las</surname>
            <given-names>Vegas</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NV</surname>
          </string-name>
          , USA, June 27-30,
          <year>2016</year>
          . pp.
          <volume>21</volume>
          {
          <issue>29</issue>
          (
          <year>2016</year>
          ), https://doi.org/10.1109/CVPR.
          <year>2016</year>
          .10
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Tua1 at imageclef 2019 vqa-med: A classi cation and generation model based on transfer learning</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>