<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>NLM at VQA-Med 2020: Visual Question Answering and Generation in the Medical Domain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mourad Sarrouti</string-name>
          <email>mourad.sarrouti@nih.gov</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>U.S. National Library of Medicine, National Institutes of Health</institution>
          ,
          <addr-line>Bethesda, MD</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of the U.S. National Library of Medicine (NLM) in Visual Question Answering (VQA) and Visual Question Generation (VQG) tasks of the VQA-Med challenge at ImageCLEF 2020. In the VQA task, I proposed a variational autoencoders model that takes as input a medical question-image pair and generates a natural language answer as output. The encoder consists of a pre-trained CNN model and LSTM to encode the dense vectors of the images and the questions into a latent space, respectively. The decoder network uses LSTM to decode questions from the latent space. I also presented a multi-class image classi cation-based method for VQA that takes as input an image and returns an answer as output. I used the pre-trained model ResNet-50 with the last layer (the Softmax layer) removed, and added a Softmax layer with di erent answers as classes. I used the VQA-Med 2019 and VQA-Med 2020 training datasets to train my models. In the VQG task, I presented a variational autoencoders model that takes as input an image and generates a question as output. I also generated new training data from the existing VQA-Med 2020 VQG dataset, based on contextual word embeddings and image augmentation techniques. My best VQA and VQG models achieve 44.1% and 11.6% respectively in terms of BLEU score.</p>
      </abstract>
      <kwd-group>
        <kwd>Visual Question Answering</kwd>
        <kwd>Visual Question Generation</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Variational Autoencoders</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Computer Vision</kwd>
        <kwd>ImageCLEF 2020</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Visual Question Answering (VQA) and Visual Question Generation (VQG) from
images is a rising research topic in both elds of natural language processing
[
        <xref ref-type="bibr" rid="ref12 ref13">12,14,13</xref>
        ] and computer vision [
        <xref ref-type="bibr" rid="ref11">11,17,15</xref>
        ]. A VQA system takes as input an
image and a natural language question about the image, and produces a natural
language answer as the output. Whereas a VQG system aims at generating
natural language questions from the images. Both VQA and VQG combine
natural language processing that provide an understanding of the question and
the ability to produce the answer, and computer vision techniques that provide
an understanding of the content of the image.
      </p>
      <p>
        In contrast to answering and generating visual questions from the content
of the image in the open domain, answering and generating visual questions
has received little attention in the medical domain. A few recent works have
attempted to answer and generate questions about medical images [
        <xref ref-type="bibr" rid="ref5">5,15</xref>
        ].
      </p>
      <p>
        This paper presents the participation of the U.S. National Library of Medicine
(NLM) in VQA and VQG tasks of the VQA-Med challenge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which is organized
by ImageCLEF 2020 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The VQA-Med challenge aims at answering and
generating questions about medical images. For the VQA task, I proposed a
variational autoencoders model that is tasked with answering a natural language
question when shown a medical image. I also presented another VQA model
based on multi-class image classi cation approach. The questions format are
repetitive so they might not contribute in answer predictions and only the
image can determine the answer. For the VQG task, I introduced and used
our recent VQG system based on variational autoencoders that takes as input
a medical image and generates a question as output [15]. All my models use
the pre-trained convolutional neural networks (CNN), ResNet50 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], for visual
features extractions.
      </p>
      <p>The rest of paper is organized as follows. Section 2 presents the most relevant
work. Section 2 describes datasets used in the 2020 VQA-Med challenge. Section 4
presents my proposed models for answering and generating visual questions from
medical images. O cial results for all models are presented in Section 5. Finally,
the paper is concluded in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Open-domain VQA and VQG research areas has received much attention from
the research community in recent years. They bene ted from the open-domain
VQA challenge1, which takes place regularly every year since 2015. Otherwise,
VQA and VQG has been a challenge from the past few years in the medical
domain. There has been no signi cant progress toward this, because of the lack
of labelled data and the di culty of creating such data. Medical images such as
radiology images are highly domain-speci c, which can only be interpreted by
well-educated medical professionals. Since the launch of the VQA-Med challenge
at ImageCLEF [
        <xref ref-type="bibr" rid="ref6 ref7">7,6</xref>
        ], methods in medical VQA continue to evolve to better
meet the needs of users visual questions. Many participants systems follow a
traditional supervised maximum likelihood estimation (MLE) paradigm that
typically relies on a convolutional neural network (CNN) + recurrent neural
network (RNN) encoder-decoder formulation (e.g., [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). They leveraged CNN
like VGGNet [16] or ResNet [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] with a variety of pooling strategies to encode
1 https://visualqa.org/challenge_2016.html
image features and RNN to extract question features. Some other participants
formulated VQA as multi-class image classi cation task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In addition, some
teams have improved VQA performance using advanced techniques such as the
stacked attention networks and multimodal compact bilinear (MCB) pooling [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>In contrast to VQA, VQG has received little interest so far in the medical
domain. More recently, the task of VQG in the medical domain has been studied
and explored in [15]. The authors introduced VQGR, a VQG system that is able
to generate natural language questions when shown radiology images. They have
used variational autoencoders as a neural network and introduced a text data
augmentation technique to create more training data.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Data Description</title>
      <p>
        VQA
Given an image and a question expressed in natural language, the VQA task
consists in providing an answer based on the image content. The dataset used in
VQA-Med 2020 consists of 4,000 radiology images with 4,000 Question-Answer
(QA) pairs as training data, 500 radiology images with 500 QA pairs as validation
data, and 500 radiology images with 500 questions as test data. Figure 1 shows
examples from the VQA-Med 2020 data.
Given a radiology image, the VQG task consists in generating a natural language
question based on the content of the image. The dataset used in VQA-Med 2020
consists of 780 radiology images with 2,156 associated questions as training data,
141 radiology images with 164 questions as validation data, and 80 radiology
images as test data. Figure 2 shows examples from VQA-Med 2020 VQG data.
In this section, I present in details the proposed methods for my participation in
VQA and VQG task of the 2020 VQA-Med challenge.
Variational autoencoders-based method for VQA: To address the VQA
task of the VQA-Med challenge at ImageCLEF 2020, I proposed a VQA model
based on the variational autoencoders approach [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] that takes as input a medical
question-image pair and generates a natural language answer as output.
      </p>
      <p>As shown in Figure 3, the proposed model consists of two neural network
modules, encoder, and decoder, for learning the probability distributions of data
p(x). First, the encoder creates a latent variable z from the image v and the
question q, and encodes the dense vectors hv and hq into a latent space z space.
A CNN is used to obtain the image feature map v, and an LSTM is used to
generate the embedded question features q. Then, the model reconstructs the
input features Lv; Lq from the z space using a simple Multi Layer Perceptron
(MLP) which is a neural network with fully connected layers. I optimize the
model by minimizing the following l2 loss:</p>
      <p>Lv = jjhv
h^vjj2; Lq = jjhq
h^qjj2
(1)</p>
      <p>Finally, it uses LSTM decoder to generate the answer a^ from the z space.
The decoder takes a sample from the latent dimension z-space, and uses that as
an input to output the answer a^. It receives a \start" symbol and proceeds to
output an answer word by word until it produces an \end" symbol. Cross Entropy
loss function have been used to evaluate the quality of the neural network and to
minimize the error Lg between the generated answer a^ and the reference answer
a.</p>
      <p>The nal loss of the proposed VQA model is as follows:</p>
      <p>Loss = 1Lg + 2KL + 3Lv + 4Lq
(2)
where KL is Kullback-Leibler divergence, 1; 2; 3; 4 are hyper-parameters that
control the variational loss, the question generation loss, the image reconstruction
loss, the question reconstruction loss, respectively.</p>
      <p>Multi-class image classi cation-based method for VQA: I introduced
another VQA system based on multi-class image classi cation approach to solve
the VQA task of the VQA-Med challenge at ImageCLEF 2020. The questions are
repetitive and have almost the same format and meaning even if they use some
di erent words. So, the questions would not contribute in answer predictions and
only the image can determine the answer. Moreover, the majority of questions
have a xed number of candidate answers (332 answers in the whole data) and
therefore they can be answered by multi-way classi cation. Consequently, the
VQA task can be equivalently formulated as multi-class classi cation problems
with 332 classes. To do so, I use the pre-trained model ResNet50 with the last
layer (the Softmax layer) removed. The output from this part is fed into a Softmax
layer with 332 classes (candidate answers).
4.2</p>
      <p>Visual Question Generation
To address the VQG task of the VQA-Med challenge at ImageCLEF 2020, I used
our recent VQG model based on the variational autoencoders approach [15] that
takes as input a medical image and generates a natural language question as
output. This model rst uses a CNN for obtaining the image feature map v and
encoding the dense vectors hv into a latent (hidden) representation z space. It
then reconstructs the inputs from the z space using a simple MLP. Finally, it
uses a decoder LSTM to generate the question q^ from the z space. The decoder
takes a sample from the latent dimension z-space, and uses that as an input to
output the question q^. I trained this model on the augmented data obtained
using contextual word embeddings and image augmentation techniques. I rst
make use of VQA-Med image-question pairs to generate a heavily augmented
dataset for training the question generation model. Each question is tagged
with part-of-speech and each candidate word replaced by its most cosine-similar
neighbor in a word embedding space based on vocabulary from English Wikipedia,
PubMed and PubMedCentral to generate a new augmented question. Each image
is also augmented with shifts, ips, rotations, and blurs. More details of this
method appairs in [15].
5</p>
    </sec>
    <sec id="sec-4">
      <title>Results and Discussion</title>
      <p>In this section, I report my o cial results in the 2020 VQA-Med challenge. The
evaluation metrics are accuracy and BLEU for the VQA task, and BLEU for
the VQG task. For all models, all images are resized to 224*224, adam optimiser
with a learning rate of 0.0001 and a batch size of 32 is used. All models are
trained for 20 epochs and the best validation results are used as nal results. I
implemented these models using PyTorch. The source code are available at https:
//github.com/sarrouti/vqa and https://github.com/sarrouti/vqa-mcc.</p>
      <p>I submitted ve automatic runs to the VQA task at ImageCLEF 2020
VQA-Med:
{ Run 1: This run used variational autoencoders. The VQA model was trained
on the 2020 VQA-Med data, and without inputs reconstruction. The output
length is 3.
{ Run 2: This run used variational autoencoders. The VQA model was trained
on the 2020 VQA-Med and 2019 VQA-Med training data, and without inputs
reconstruction. The output length is 3.
{ Run 3: This run used variational autoencoders. The VQA model was trained
on the 2020 VQA-Med and 2019 VQA-Med training data, and with inputs
reconstruction. The output length is 4.
{ Run 4: This run used variational autoencoders. The VQA model was trained
on the 2020 VQA-Med and 2019 VQA-Med training data, and with inputs
reconstruction. The output length is 10.
{ Run 5: This run used multi-class image classi cation-based method for VQA.</p>
      <p>Table 1 shows my o cial results in the VQA task of the VQA-Med challenge.
Run 5 which deals with VQA as a multi-class image classi cation problem
has the best accuracy score and best BLEU score among my submissions. The
multi-class image classi cation approach signi cantly outperforms the variational
autoencoders-based method for VQA. This is likely because the questions were
repetitive and all questions have almost the same format and meaning even if they
use di erent words. So, it is expected that the questions would not contribute well
in answer generation and only the image can determine the answer. Moreover,
in the VQA-Med 2020 data, the majority of questions have a xed number of
candidate answers and hence can be answered by multi-class classi cation.</p>
      <p>On the other hand, I submitted 3 automatic runs to the VQG task at
ImageCLEF 2020 VQA-Med:
{ Run 1: This run used the varational autoencoders. The VQG model was
trained on the augmented data, and without inputs reconstruction. The
output length is 10.
{ Run 2: This run used the varational autoencoders. The VQG model was
trained on the augmented data, and with inputs reconstruction. The output
length is 20.
{ Run 3: This run used the varational autoencoders. The VQG model was
trained on the augmented data, and with inputs reconstruction. The output
length is 30.</p>
      <p>Table 3 shows my o cial results in the VQG task of the VQA-Med challenge at
ImageCLEF 2020. Run 2 has the best BLEU score (0.116) among my submissions.
Table 4 presents the results of the participating teams. Although many teams
have participated in the VQA task of the VQA-Med challenge, only 3 teams
have submitted runs for the VQG task. The results obtained by my VQG system
compared with other systems are encouraging and I hope to make improvements
in the future. VQG is a challenging task, especially in the medical domain as the
available dataset is too small for training e cient VQG models. Small data might
require models that have low complexity. Whereas the variational autoencoders
model requires a large amount of training data as it tries to learn deeply the
underlying data distribution of the input to output new sequences.
In this paper, I described my participation in the VQA and VQG tasks at
ImageCLEF VQA-Med 2020. I introduced a variational autoencoders-based
method and a multi-class image classi cation-based method for VQA. My VQA
model's best accuracy is 0.40 with 0.441 BLEU score. I also presented a variational
autoencoders-based method for VQG. The VQG model achieved 0.116 in terms
of BLEU score. In the future, I plan to use the generated questions to advance
VQA in the medical domain. I also plan to improve the performance of both VQA
and VQG by using the attention mechanism that allows to pay more attention
to speci c regions that better represent the question instead of the whole image.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This work was supported by the intramural research program at the U.S. National
Library of Medicine, National Institutes of Health.
question answering. Journal of Biomedical Informatics 68, 96{103 (apr
2017). https://doi.org/10.1016/j.jbi.2017.03.001, https://doi.org/10.1016%2Fj.
jbi.2017.03.001
14. Sarrouti, M., Alaoui, S.O.E.: Sembionlqa: A semantic biomedical question
answering system for retrieving exact and ideal answers to natural
language questions. Arti cial Intelligence in Medicine 102, 101767 (2020).
https://doi.org/https://doi.org/10.1016/j.artmed.2019.101767
15. Sarrouti, M., Ben Abacha, A., Demner-Fushman, D.: Visual question generation
from radiology images. In: Proceedings of the First Workshop on Advances
in Language and Vision Research. pp. 12{18. Association for Computational
Linguistics, Online (Jul 2020), https://www.aclweb.org/anthology/2020.alvr-1.
3
16. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition (2014)
17. Zhang, S., Qu, L., You, S., Yang, Z., Zhang, J.: Automatic generation of grounded
visual questions. In: Proceedings of the Twenty-Sixth International Joint Conference
on Arti cial Intelligence (IJCAI-17). pp. 4235{4243 (2016)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abacha</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gayen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rajaraman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Nlm at imageclef 2018 visual question answering in the medical domain</article-title>
          .
          <source>In: CLEF (Working Notes)</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Al-Sadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talafha</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Al-Ayyoub</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jararweh</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Costen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Just at imageclef 2019 visual question answering in the medical domain</article-title>
          .
          <source>In: CLEF (Working Notes)</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Allaouzi</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benamrou</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>An encoder-decoder model for visual question answering in the medical domain</article-title>
          .
          <source>In: CLEF (Working Notes)</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Datla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.V.</given-names>
            ,
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          , Muller, H.:
          <article-title>Overview of the vqa-med task at imageclef 2020: Visual question answering and generation in the medical domain</article-title>
          .
          <source>In: CLEF 2020 Working Notes. CEUR Workshop Proceedings</source>
          , CEUR-WS.org, Thessaloniki,
          <source>Greece (September</source>
          <volume>22</volume>
          -25
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Datla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.V.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          , Muller, H.:
          <article-title>Vqa-med: Overview of the medical visual question answering task at imageclef 2019</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.E.</surname>
          </string-name>
          , Muller, H. (eds.) Working Notes of CLEF 2019 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Lugano, Switzerland, September 9-
          <issue>12</issue>
          ,
          <year>2019</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2380</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2019</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2380</volume>
          /paper_272.pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Datla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.V.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          , Muller, H.:
          <article-title>Vqa-med: Overview of the medical visual question answering task at imageclef 2019</article-title>
          .
          <source>In: CLEF 2019 Working Notes. CEUR Workshop Proceedings</source>
          , CEUR-WS.org &lt;http://ceur-ws.
          <source>org&gt;</source>
          , Lugano,
          <source>Switzerland (September 9-12</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farri</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Lungren</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          :
          <article-title>Overview of imageclef 2018 medical domain visual question answering task</article-title>
          .
          <source>In: CLEF (Working Notes)</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <article-title>Deep residual learning for image recognition (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Peteri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Datla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Kozlovski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Liauchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Cid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.D.</given-names>
            ,
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.M.</given-names>
            ,
            <surname>de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.G.S.</given-names>
            ,
            <surname>Ninh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.T.</given-names>
            ,
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.K.</given-names>
            ,
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Piras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          , l Halvorsen,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.T.</given-names>
            ,
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dang-Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.T.</given-names>
            ,
            <surname>Chamberlain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Campello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Fichou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Berari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Brie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Stefan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.D.</given-names>
            ,
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.G.</surname>
          </string-name>
          :
          <article-title>Overview of the ImageCLEF 2020: Multimedia retrieval in medical, lifelogging, nature, and internet applications</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welling</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Auto-encoding variational bayes</article-title>
          .
          <source>arXiv preprint arXiv:1312.6114</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mostafazadeh</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Misra</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devlin</surname>
            , J., Mitchell,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderwende</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Generating natural questions about an image</article-title>
          .
          <source>In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          . pp.
          <year>1802</year>
          {
          <year>1813</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          , Berlin, Germany (Aug
          <year>2016</year>
          ). https://doi.org/10.18653/v1/
          <fpage>P16</fpage>
          -1170, https://www.aclweb.org/ anthology/P16-1170
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sarrouti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alaoui</surname>
            ,
            <given-names>S.O.E.:</given-names>
          </string-name>
          <article-title>A machine learning-based method for question type classi cation in biomedical question answering</article-title>
          .
          <source>Methods of Information in Medicine</source>
          <volume>56</volume>
          (
          <issue>03</issue>
          ),
          <volume>209</volume>
          {
          <fpage>216</fpage>
          (
          <year>2017</year>
          ). https://doi.org/10.3414/me16-01-0116, https://doi.org/ 10.3414%
          <fpage>2Fme16</fpage>
          -
          <fpage>01</fpage>
          -0116
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sarrouti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alaoui</surname>
            ,
            <given-names>S.O.E.</given-names>
          </string-name>
          :
          <article-title>A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>