<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep Neural Networks and Decision Tree classifier for Visual Question Answering in the medical domain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Imane Allaouzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Badr Benamrou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed Benamrou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed Ben Ahmed</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Abdelmalek Essaâdi University Faculty of Sciences and Techniques</institution>
          ,
          <addr-line>Tangier</addr-line>
          ,
          <country country="MA">Morocco</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents our contribution to the problem of visual question answering in the medical domain using a combination of deep neural networks and the Decision tree classifier. In our proposed approach we consider the task of visual question answering as multi-label classification problem, where each label corresponds to a unique word in the answer dictionary that was built from the training set.</p>
      </abstract>
      <kwd-group>
        <kwd>CNN</kwd>
        <kwd>Bidirectional LSTM</kwd>
        <kwd>Decision Tree classifier</kwd>
        <kwd>Language modeling</kwd>
        <kwd>medical imaging</kwd>
        <kwd>Visual Question Answering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Visual question answering (VQA) is a new and challenging task that has witnessed a
surge interest from Artificial Intelligence (AI) community, since it combines the
fields of Computer Vision (CV) and Natural Language Processing (NLP). NLP and
CV are two branches of AI, where the former one enables computers to understand
and analyze human language, while the second enables computers to understand and
process images in the same way that a human does. The main idea of VQA systems is
to predict the right answer giving both image and question about this image in a
natural language. The VQA task can be treated as a classification problem if the answer is
chosen from among different choices or as a generation problem if the answer is a
comprehensive and well-formed textual description.</p>
      <p>
        In the last few years, Deep Neural Networks have achieved the state-of-the-art in a
wide range of NLP and CV applications including image recognition [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ], machine
translation[
        <xref ref-type="bibr" rid="ref3 ref4">3,4</xref>
        ],image caption[
        <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
        ] and Visual Question Answering[
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7,8,9</xref>
        ]. Following
this trend, this paper presents our contribution to the problem of visual question
answering in the medical domain [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ] using a combination of deep neural networks
(Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory) and
the Decision tree classifier. In our proposed approach we consider the task of VQA as
multi-label classification problem, where each label corresponds to a unique word in
the answer dictionary that was built from the training set.
      </p>
      <p>The paper’s arrangement is as follows: the dataset is described in Section 2, the
proposed model is described in Section 3, results are presented and discussed in Section
4, and finally Section 5 draws some conclusions and future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Dataset:</title>
      <p>
        VQA-Med [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is a dataset generated using images from PubMed Central articles
(essentially a subset of the ImageCLEF 2017 caption prediction task [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]). As shown
in the table 1 the VQA-Med dataset consists of 2278 training images and 324
validation images, accompanied respectively with 5413 and 500 of question-answer pairs,
and a test set of 264 medical images with 500 questions. The answer can be either “a
single word”, “a phrase containing around 2-28 words”, or “a yes/no”. The table 2
illustrates some examples of the training data with different types of questions and
answers.
The VQA in the medical domain involves providing a medical question-image pairs
to produce answers. In this work we assume that the answers are a concatenation of
one or more words, therefore we have treated the task as multi-label classification
problem.
      </p>
      <p>
        Our proposed model uses the pre-trained VGG-16[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] model to extract image
features and the word embedding [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] along with a Bidirectional Long Short-Term
Memory (LSTM) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] to embed the question and extract textual features. The image
and textual features are concatenated using two fully connected layers of 512 neurons
to get a fixed length feature vector. This vector is used as a new input for Decision
Tree Classifier in order to predict an answer.
      </p>
      <sec id="sec-2-1">
        <title>The model consists of 3 sub-models:</title>
        <p>
           Image Representation:
To extract prominent features from medical images, we have used the pre-trained
VGG-16 network that won the ImageNet 2014 challenge [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], by achieving
a 7.4% error rate on object classification. We have removed the last layer of this
network to obtain an output vector of 4096 elements, which in turn passed through a
fully connected layer to get image representation of size 512. The VGG-16
architecture is shown in the figure 1:
Recently recurrent neural networks (RNNs) have shown great success in diverse NLP
tasks [18, 19], motivated by this success we have used a bidirectional RNN with
LSTM for dealing with the medical questions. Bidirectional Long Short-Term
Memory (BDLSTM) is an extension of the traditional LSTM; its main idea consists of
processing sequence data in both forward and backward directions to avoid the
problem of limited context that applies to any feed-forward model.
        </p>
        <p>For that, first the question is converted to a matrix of one-hot vectors and passed
through an embedding layer (with a vocabulary of 3312 and a dense embedding of
521), in order to get their dense representation and their relative meanings. The
embedded question is then fed to a BDLSTM with 512 units followed by a fully
connected layer to get question representation of size 512.</p>
        <p> Answer prediction:
To predict an answer, we have modeled the VQA-Med task as multi-label
classification problem, since we have assumed that an answer is a concatenation of one or more
words. Therefore, we have used the multi-label Decision Tree classifier that takes as
input the output from both sub-models of image representation and question
representation and predicts one or more predefined labels. The total number of labels equals to
3109.Where, each label corresponds to a unique word in the answer dictionary that
was created from the training set.</p>
        <p>In the training phase, we have kept the CNN parameters frozen, and we have
trained the rest of our deep neural network using a fully connected layer with sigmoid
as activation function, Binary Cross-entropy as loss function and Adam as optimizer.
As well as, the dropout technique was used before the last fully connected layer and
after the BDLSTM layer with a probability of 0.5.</p>
        <p>The best parameters were selected based on the validation loss, with a mini-batch
of 20 and a number of epochs up to 10.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results:</title>
      <p>Three metrics are used to evaluate our proposed VQA-Med model, which are: BLEU
score [20], WBSS (Word-based Semantic Similarity), and CBSS (Concept-based
Semantic Similarity). The first one is one of the most commonly used metrics that
have been used to measure the similarity between two sentences, the second one aims
to calculate the semantic similarity in the biomedical domain [21], it was created
based on Wu-Palmer Similarity (WUPS) [22] with WordNet ontology in the backend,
while the third one is similar to the WBSS metric, except that instead of tokenizing
the predicted and ground truth answers into words, it uses MetaMap via
the pymetamap wrapper to extract biomedical concepts from the answers.</p>
      <p>Before applying the evaluation metrics, each answer undergoes the following
preprocessing techniques:
 Lower-case: Converts each answer to lower-case.
 Tokenization: Divides the answer into individual words.
 Stop-words: Removes punctuations and commonly encountered English
words.</p>
      <sec id="sec-3-1">
        <title>The following table shows the results obtained on the test set:</title>
        <p>As shown in the table above, our proposed model gives good results in term of CBSS
metric (0.27) comparing with BLEU score (0.054) and WBSS metric (0.10). This is
justified by the high number of labels that are not presented equally in the training set.
This is what is known as the label imbalance problem.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion:</title>
      <p>In this paper, we present our contribution to the task of visual question answering in
the medical domain. We have treated the task as a multi-label classification using the
decision tree classifier. However, the results on test set are totally unsatisfactory,
especially in term of BLEU metric with a score of 0.054. Therefore, we think to develop
an LSTM model to generate answers since the adopted classification approach ignores
words order in the answer which leads to a loss of information. We also think to
improve our visual model by using the attention technique .This technique allows to pay
more attention to specific regions that better represent the question instead of the
whole image.
17.
https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-learningmeetup-5/, last accessed 2018/04/22
18. Graves, M., Mohamed,A., Hinton, G., : Speech recognition with deep recurrent neural
networks. In: Proceedings of the 2013 IEEE International Conference on Acoustics,
Speech and Signal Processing, pp. 6645-6649. (2013)
19. Mikolov, T., Kombrink, S., Deoras, A., Burget, L., Cernocky, J. : RNNLM-recurrent
neural network language modeling toolkit. In Proceedings of the 2011 ASRU Workshop, pp.
196-201. (2011)
20. Papineni, K., Roukos, S., Ward, T., Zhu, W. J. : BLEU: a method for automatic evaluation
of machine translation (PDF). ACL-2002: 40th Annual meeting of the Association for
Computational Linguistics, pp. 311-318. Association for Computational Linguistics,
Pensylvania (2002)
21. Soğancıoğlu, G., Öztürk, H., &amp; Özgür, A.: BIOSSES: a semantic sentence similarity
estimation system for the biomedical domain. Bioinformatics, 33(14), (2017). i49-i58
22. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd
annual meeting on Association for Computational Linguistics Association for
Computational Linguistics, pp. 133-138. (1994)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          .
          <source>arXiv:1409</source>
          .
          <fpage>1556</fpage>
          ,.
          <volume>2</volume>
          ,
          <issue>8</issue>
          ,
          <issue>17</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G. E.</given-names>
          </string-name>
          ,
          <article-title>ImageNet classification with deep convolutional neural networks</article-title>
          .
          <source>In NIPS</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kyunghyun</surname>
          </string-name>
          ,Cho.,
          <string-name>
            <surname>van Merrienboer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gulcehre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation</article-title>
          ,
          <source>in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pp.
          <fpage>1724</fpage>
          -
          <lpage>1734</lpage>
          . Association for Computational Linguistics,
          <string-name>
            <surname>Doha</surname>
          </string-name>
          (
          <year>2014</year>
          ) .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>V</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q.</surname>
          </string-name>
          :
          <article-title>Sequence to Sequence Learning with Neural Networks</article-title>
          ,
          <source>In: the 27th International Conference on Neural Information Processing Systems</source>
          ,Vol.
          <volume>2</volume>
          , pp.
          <fpage>3104</fpage>
          -
          <lpage>3112</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toshev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erhan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Show and tell: A neural image caption generator</article-title>
          .
          <source>In CVPR</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiros</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zemel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Show, attend and tell: Neural image caption generation with visual attention. In ICML (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,:
          <article-title>Stacked attention networks for image question answering</article-title>
          .
          <source>In CVPR</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tian</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sukhbaatar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szlam</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fergus</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,:
          <article-title>Simple baseline for visual question answering</article-title>
          .
          <source>arXiv preprint arXiv:1512.02167</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Malinowski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rohrbach</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fritz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Ask your neurons: A deep learning approach to visual question answering</article-title>
          .
          <source>arXiv preprint arXiv:1605.02697</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>SA</given-names>
          </string-name>
          .,
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farri</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lungren</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
          </string-name>
          , H. :
          <article-title>Overview of ImageCLEF 2018 Medical Domain Visual Question Answering Task</article-title>
          , CLEF working notes, CEUR, (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>García Seco de Herrera</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eickhoff</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andrearczyk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Dicente</given-names>
            <surname>Cid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Liauchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>SA</given-names>
            .,
            <surname>Ling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Farri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Lungren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Dang-Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Piras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          : Overview of ImageCLEF 2018:
          <article-title>Challenges, Datasets and Evaluation</article-title>
          ,
          <source>Proceedings of the Ninth International Conference of the CLEF Association (CLEF</source>
          <year>2018</year>
          ), (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Eickhoff</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwall</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , Garc´ıa Seco de Herrera,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Muller</surname>
          </string-name>
          , H. :
          <article-title>Overview of ImageCLEFcaption 2017 - the image caption prediction and concept extraction tasks to under-stand biomedical images</article-title>
          .
          <source>CLEF working notes</source>
          ,
          <source>CEUR</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Very Deep Convolutional Networks for Large-Scale Image Recognition'</article-title>
          .
          <source>arXiv preprint arXiv:1409.1556</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            <given-names>G. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed Representations of Words and Phrases and their Compositionality</article-title>
          .
          <source>In NIPS, 4</source>
          ,
          <issue>8</issue>
          ,
          <issue>17</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Schuster</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paliwal</surname>
            .,
            <given-names>K. K. :</given-names>
          </string-name>
          <article-title>Bidirectional recurrent neural networks</article-title>
          .
          <source>In: IEEE Transactions on Signal Processing</source>
          , vol.
          <volume>4</volume>
          , pp.
          <fpage>2673</fpage>
          -
          <lpage>2681</lpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. http://www.image-net.org/challenges/LSVRC/2014/ , last accessed
          <year>2018</year>
          /05/25
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>