<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of ImageCLEF 2018 Medical Domain Visual Question Answering Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sadid A. Hasan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuan Ling</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oladimeji Farri</string-name>
          <email>dimeji.farrig@philips.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joey Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Henning Muller</string-name>
          <email>henning.mueller@hevs.ch</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthew Lungren</string-name>
          <email>mlungren@stanford.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Arti cial Intelligence Lab, Philips Research North America</institution>
          ,
          <addr-line>Cambridge, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Radiology, Stanford University</institution>
          ,
          <addr-line>Stanford, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Applied Sciences Western Switzerland (HES-SO)</institution>
          ,
          <addr-line>Sierre</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents an overview of the inaugural edition of the ImageCLEF 2018 Medical Domain Visual Question Answering (VQA-Med) task. Inspired by the recent success of visual question answering in the general domain, a pilot task was proposed this year to focus on visual question answering in the medical domain. Given medical images accompanied with clinically relevant questions, participating systems were tasked with answering the questions based on the visual image content. A dataset of 6,413 question-answer pairs accompanied with 2,866 medical images extracted from PubMed Central articles was provided; from which, 5,413 question-answer pairs with 2,278 medical images were used for training, 500 question-answer pairs with 324 medical images were used for validation, and 500 questions with 264 medical images were used for testing. Among 28 registered participants, 5 groups submitted a total of 17 runs, indicating a considerable interest in the VQA-Med task.</p>
      </abstract>
      <kwd-group>
        <kwd>ImageCLEF 2018</kwd>
        <kwd>Visual Question Answering</kwd>
        <kwd>Medical Image Interpretation</kwd>
        <kwd>Question Generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        With the increasing interest in arti cial intelligence (AI) to support clinical
decision making and improve patient engagement, opportunities to generate and
leverage algorithms for automated medical image interpretation are currently
being explored [
        <xref ref-type="bibr" rid="ref5 ref6">6, 5</xref>
        ]. Since patients may now access structured and unstructured
data related to their health via patient portals, such access also motivates the
need to help them better understand their conditions regarding their available
data, including medical images.
      </p>
      <p>The clinicians' con dence in interpreting complex medical images can be
signi cantly enhanced by a \second opinion" provided by an automated system. In
addition, patients may be interested in the morphology/physiology and
diseasestatus of anatomical structures around a lesion that has been well characterized
by their healthcare providers and they may not necessarily be willing to pay
signi cant amounts for a separate o ce- or hospital visit just to address such
questions. Although patients often turn to web search engines to disambiguate
complex terms or obtain answers to confusing aspects of a medical image, results
from search engines may be nonspeci c, erroneous and misleading, or
overwhelming in terms of the volume of information.</p>
      <p>
        Visual Question Answering is a new and exciting problem that combines
natural language processing and computer vision techniques. Inspired by the
recent success of visual question answering in the general domain4 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we propose
a pilot task as part of the ImageCLEF 2018 evaluation campaign5 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to focus on
visual question answering in the medical domain (VQA-Med). Given a medical
image accompanied with a clinically relevant question, participating systems are
tasked with answering the question based on the visual image content.
      </p>
      <p>This paper presents an overview of the VQA-Med task at ImageCLEF 2018.
Section 2 introduces the task and Section 3 presents details of the provided
corpus. A description of the evaluation methodology is provided in Section 4.
We discuss the participant submissions with results in Section 5. Finally, we
conclude the paper in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Task</title>
      <p>
        In the inaugural edition we propose a pilot task of visual question answering
in the medical domain (VQA-Med) as part of the ImageCLEF 2018 evaluation
campaign [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Given medical images accompanied with clinically relevant
questions, participating systems are tasked with answering the questions based on
the visual image content. Figure 1 shows a few example images with associated
questions and ground truth answers.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Corpus</title>
      <p>
        To create the datasets for the proposed VQA-Med task, we consider medical
images along with their captions extracted from PubMed Central articles6
(essentially a subset of the ImageCLEF 2017 caption prediction task [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]).
      </p>
      <p>
        We use a semi-automatic approach to generate question-answer pairs from
captions of the medical images. First, we automatically generate all possible
question-answer pairs from captions using a rule-based question generation (QG)
system7 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The system consists of four modules to automate question
generation: 1) sentence simpli cation, which utilizes clauses, subject, predicate, and
verbs to split a long, complex sentence (i.e. the captions associated with the
medical images) into multiple simple sentences via lexical alternation and appositive
4 http://www.visualqa.org/
5 http://www.imageclef.org/2018
6 https://www.ncbi.nlm.nih.gov/pmc/
7 http://www.cs.cmu.edu/ ark/mheilman/questions/
      </p>
      <p>Question: What does the ct scan of thorax show?</p>
      <p>Answer: bilateral multiple pulmonary nodules
Question: Is the lesion associated with a mass e ect?</p>
      <p>Answer: no
identi cation, 2) answer phrase identi cation, which identi es relevant phrases
from the simple sentences such that corresponding questions can be generated,
3) question generation, where the answer phrases are used to generate
possible question phrases through decomposition of the main verb, inversion of the
subject and auxiliary verb, and inserting one of the possible question phrases in
place of the answer phrases, and 4) candidate questions ranking, where a ranking
model is trained to rank the generated candidate questions.</p>
      <p>The candidate questions generated via the automatic approach may be noisy
as the de ned rules may not adequately capture the complex characteristics of
medical domain terminologies (clinical concepts) and in particular, the unique
writing style of the medical image captions in biomedical articles. Therefore, two
expert human annotators manually check all generated question-answer pairs
associated with the medical images in two passes. In the rst pass, one annotator
proofreads all question-answer pairs and resolves related noises accrued by the
aforementioned four modules of the automatic QG system to ensure syntactic
and semantic correctness. In the second pass, the other annotator, an expert in
clinical medicine, veri ed all question-answer pairs to form well-curated
validation and test sets by ensuring their clinical relevance with respect to associated
medical images.</p>
      <p>The nal curated corpus is comprised of 6,413 question-answer pairs
associated with 2,866 medical images. The overall set is split into 5,413 question-answer
pairs (associated with 2,278 medical images) for training, 500 question-answer
pairs (associated with 324 medical images) for validation, and 500 questions
(associated with 264 medical images) for testing.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation Methodology</title>
      <p>The evaluation of the participant systems of the VQA-Med task is conducted
based on three metrics: BLEU, WBSS (Word-based Semantic Similarity), and
CBSS (Concept-based Semantic Similarity).</p>
      <p>
        BLEU [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is used to capture the similarity between a system-generated
answer and the ground truth answer. Each answer is converted to lower-case, all
punctuations are removed, and the answer is tokenized8 to individual words.
Stopwords are removed using NLTK's9 English stopword list. Snowball
stemming10 is applied to increase the coverage of overlaps. The overall methodology
and resources for the BLEU metric are essentially similar to the ImageCLEF
2017 caption prediction task11.
      </p>
      <p>
        Following a recent algorithm to calculate semantic similarity in the
biomedical domain [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we create the WBSS metric based on Wu-Palmer Similarity
(WUPS12) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] with WordNet ontology in the backend. WBSS computes a
similarity score between a system-generated answer and the ground truth answer
based on word-level similarity.
      </p>
      <p>CBSS is similar to WBSS, except that instead of tokenizing the
systemgenerated and ground truth answers into words, we use MetaMap13 via the
pymetamap wrapper14 to extract biomedical concepts from the answers, and
build a dictionary using these concepts. Then, we build one-hot vector
representations of the answers to calculate their semantic similarity using the cosine
similarity measure.
8 http://www.nltk.org/ modules/nltk/tokenize/punkt.html#PunktLanguageVars.word tokenize
9 http://nltk.org/
10 http://snowball.tartarus.org/texts/introduction.html
11 http://www.imageclef.org/2017/caption
12 https://datasets.d2.mpi-inf.mpg.de/mateusz14visualturing/calculate wups.py
13 https://metamap.nlm.nih.gov/
14 https://github.com/AnthonyMRios/pymetamap</p>
    </sec>
    <sec id="sec-5">
      <title>Results and Discussion</title>
      <p>We received a total of 17 result submissions by 5 di erent teams from across the
world. Table 1 gives an overview of all participants and the number of submitted
runs. Note that, there was a limit of maximum 5 run submissions per team. All
submitted runs were automatic runs denoting the fact that all participating
systems automatically generated answers to the provided questions in the test
set.</p>
      <p>
        Overall, most participants used deep learning techniques to build their
VQAMed systems. In particular, participant systems [14{16, 18] leveraged sequence to
sequence learning and encoder-decoder-based frameworks [9{11] utilizing deep
convolutional neural networks (CNN) to encode medical images (with or
without using pre-trained models such as VGG [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], ResNet [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] etc.) and recurrent
neural networks (RNN) to generate question encodings (with or without using
pre-trained word embeddings). Some participants formulated the VQA-Med task
as a multi-label multi-class classi cation problem [
        <xref ref-type="bibr" rid="ref14 ref17">14, 17</xref>
        ] while others considered
it as a generation task [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Participants also used attention-based mechanisms
[15{17] to identify relevant image features to answer the given questions. The
submitted runs also varied with the use of various VQA networks such as stacked
attention networks (SAN) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], the use of advanced techniques such as
multimodal compact bilinear (MCB) pooling [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] or multimodal factorized bilinear
(MFB) pooling [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] to combine multimodal features, the use of embedding based
topic modeling (ETM) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], and the use of di erent hyperparameters etc.
Participants did not use any additional datasets except the o cial training and
validation sets to train their models.
      </p>
      <p>The overall results of the participating systems are presented in Table 2 to
Table 4 for the three di erent metrics in a descending order of the scores (the
higher the better). The relatively low BLEU scores and WBSS scores of the
runs denote the di culty of the VQA-Med task in generating similar answers as
the ground truth, while higher CBSS scores suggest that some participants were
able to generate relevant clinical concepts in their answers similar to the clinical
concepts present in the ground truth answers.</p>
      <p>
        Team Institution #Runs
FSTT [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] Abdelmalek Essaadi University, Faculty of Sciences and Techniques, Tang- 2
ier, Morocco
JUST [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] Jordan University of Science and Technology, Jordan 3
NLM [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] Lister Hill National Center for Biomedical Communications, National Li- 5
brary of Medicine, Bethesda, MD, USA
TU [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] Tokushima University, Japan 3
UMMS [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] University of Massachusetts Medical School, Worcester, MA, USA 4
This paper presented an overview of the inaugural Medical Domain Visual
Question Answering (VQA-Med) challenge conducted as a part of the ImageCLEF
2018 evaluation campaign. We discussed participant submissions and results,
which demonstrated the challenges and complexities of the VQA-Med task. In
the future, we would consider the interesting data analyses and improvement
suggestions presented in [15{17] and plan to increase the dataset size to leverage
the power of advanced deep learning algorithms towards improving the
state-ofthe-art in visual question answering in the medical domain.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Papineni</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Roukos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Ward,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. J.</surname>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>BLEU: a method for automatic evaluation of machine translation (PDF). ACL-2002: 40th Annual meeting of the Association for Computational Linguistics</article-title>
          . pp.
          <fpage>311318</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Soancolu</surname>
          </string-name>
          , G., ztrk, H., &amp;
          <string-name>
            <surname>zgr</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>BIOSSES: a semantic sentence similarity estimation system for the biomedical domain</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>33</volume>
          (
          <issue>14</issue>
          ),
          <fpage>i49</fpage>
          -
          <lpage>i58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>1994</year>
          , June).
          <article-title>Verbs semantics and lexical selection</article-title>
          .
          <source>InProceedings of the 32nd annual meeting on Association for Computational Linguistics</source>
          (pp.
          <fpage>133</fpage>
          -
          <lpage>138</lpage>
          ).
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Heilman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Question Generation via Overgenerating Transformations and Ranking</article-title>
          . Language Technologies Institute, Carnegie Mellon University Technical Report CMU-LTI-
          <volume>09</volume>
          -013.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boato</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dang-Nguyen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Dicente</given-names>
            <surname>Cid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Eickho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Garcia Seco de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Liauchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Piras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            , and
            <surname>Schwall</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <source>Overview of ImageCLEF</source>
          <year>2017</year>
          :
          <article-title>Information extraction from images</article-title>
          .
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. 8th International Conference of the CLEF Association</source>
          , CLEF, Springer LNCS 10456.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Eickho</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwall</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , Garca Seco de Herrera,
          <string-name>
            <given-names>A.</given-names>
            , and
            <surname>Mller</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Overview of ImageCLEFcaption 2017 - Image Caption Prediction and Concept Detection for Biomedical Images</article-title>
          .
          <source>CLEF 2017 Labs Working Notes, CEUR Workshop Proceedings.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Antol</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            , J., Mitchell,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Lawrence Zitnick,
          <string-name>
            <given-names>C.</given-names>
            , and
            <surname>Parikh</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>VQA: Visual Question Answering</article-title>
          . ICCV.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garca Seco de Herrera</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eickho</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andrearczyk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Dicente</given-names>
            <surname>Cid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Liauchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            ,
            <surname>Ling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Farri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Lungren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Dang-Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Piras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            , and
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Overview of ImageCLEF 2018: Challenges, Datasets and Evaluation</article-title>
          .
          <source>Proceedings of the 9th International Conference of the CLEF Association</source>
          , CLEF.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          . ICLR.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q. V.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Sequence to Sequence Learning with Neural Networks</article-title>
          . NIPS:
          <fpage>3104</fpage>
          -
          <lpage>3112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Cho</surname>
          </string-name>
          , K.,
          <string-name>
            <surname>van Merrienboer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glehre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bougares</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation</article-title>
          . EMNLP:
          <fpage>1724</fpage>
          -
          <lpage>1734</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Very Deep Convolutional Networks for Large-Scale Image Recognition</article-title>
          .
          <source>arXiv:1409</source>
          .
          <fpage>1556</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Deep Residual Learning for Image Recognition</article-title>
          . CVPR:
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Allaouzi</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benamrou</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben</surname>
            <given-names>Ahmed</given-names>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Deep Neural Networks and Decision Tree classi er for Visual Question Answering in the medical domain</article-title>
          .
          <source>CLEF 2018 Labs Working Notes, CEUR Workshop Proceedings.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Gayen</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            <surname>Lau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Rajaraman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            , and
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>NLM at ImageCLEF 2018 Visual Question Answering in the Medical Domain</article-title>
          .
          <source>CLEF 2018 Labs Working Notes, CEUR Workshop Proceedings.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Employing Inception-Resnet-v2 and BiLSTM for Medical Domain Visual Question Answering</article-title>
          .
          <source>CLEF 2018 Labs Working Notes, CEUR Workshop Proceedings.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Rosen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>UMass at ImageCLEF Medical Visual Question Answering (Med-VQA) 2018 Task</article-title>
          .
          <article-title>CLEF 2018 Labs Working Notes</article-title>
          , CEUR Workshop Proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Talafha</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Al-Ayyoub</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>JUST at VQA-Med: A VGG-Seq2Seq Model</article-title>
          .
          <source>CLEF 2018 Labs Working Notes, CEUR Workshop Proceedings.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>