<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Review of Neural Approaches to the Question Answering Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>William Needham</string-name>
          <email>william.needham@city.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Neural Net-</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>City, University of London</institution>
          ,
          <addr-line>London, EC1V 0HB</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Question Answering task, whereby a system receives a plain language question from a user and returns a concise answer from a corpus of documents, has received considerable attention from academia and the commercial world since mid-way through the 20th century. This paper o ers a concise overview of this literature, focussing on recent advancements of the state-of-the-art achieved by neural network-based approaches. The rate of change of these advancements is considerable and has left a sparse landscape of analysis and research still to be conducted. My main contribution in this paper is to shine a light on these gaps in the literature, o ering inspiration for future research.</p>
      </abstract>
      <kwd-group>
        <kwd>Information Retrieval Question Answering works Word embeddings Pre-trained language models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        A Question Answering (QA) system receives a human-language question, seeks
to interpret large quantities of structured and unstructured data, and returns
a concise answer
        <xref ref-type="bibr" rid="ref22">(Hirschman and Gaizauskas, 2001)</xref>
        . QA systems have been a
vibrant eld of research since the release of the Baseball system
        <xref ref-type="bibr" rid="ref21">(Green Jr. et al.,
1961)</xref>
        . QA is an important task as typically users do not want to comprehend
multiple, long documents to nd an answer to their question
        <xref ref-type="bibr" rid="ref33">(Lin et al., 2003)</xref>
        .
Throughout this time, we have seen a myriad of approaches, from
knowledgebased approaches
        <xref ref-type="bibr" rid="ref21 ref5 ref6">(Berant et al., 2013; Bollacker et al., 2008; Green Jr. et al.,
1961)</xref>
        to information-retrieval based systems
        <xref ref-type="bibr" rid="ref23 ref31 ref7">(Brill et al., 2002; Hirschman et
al., 1999; Lin, 2007)</xref>
        , as well as hybrids of the two such as the DeepQA system
by IBM
        <xref ref-type="bibr" rid="ref19">(Ferucci et al., 2010)</xref>
        . Since the early 2000's, the introduction of
neuralnetwork-based approaches has resulted in marked success within the domain.
      </p>
      <p>
        More recently, advances to the state-of-the-art have been attributed to the
application of pre-trained language models; speci cally the BERT
        <xref ref-type="bibr" rid="ref16">(Devlin et
al., 2018)</xref>
        language model (Bi-directional Encoder Representations from
Transformers). On the SQuAD (Stanford Question Answering Dataset) benchmark
        <xref ref-type="bibr" rid="ref48">(Rajpurkar et al., 2018)</xref>
        , every one of the top 20 submissions claims to have used
a variation of BERT (as of April 2019). Just recently, human performance has
been surpassed for the rst time on this dataset.
      </p>
      <p>It is clear the domain has progressed signi cantly within a short space of
time. The speed of advancement has resulted in a shortage of published research
explaining the architectures of such systems and perhaps more critically,
understanding where these models are under-performing. This is important as without
a clear understanding of the weaknesses of each implementation, it is di cult
to improve the model with a subsequent iteration. This short paper seeks to
identify these most prominent gaps in the literature, o ering fruitful directions
for future research.</p>
      <p>The remainder of this paper is structured as follows: x 2 provides an overview
of the traditional (non-neural) methods for solving the QA task. Then, in x 3, the
core components of neural network architectures are described (speci c to the
IR/ QA task), followed by a comprehensive review of the neural architectures
which combine these components in x 4, including extensions to the previously
described traditional methods. x 5 o ers an overview of the datasets and metrics
used in the Question Answering task. Finally, a summary of the future research
directions is presented in x 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Traditional approaches</title>
      <p>
        How many games did the Yankees play in July? This question was asked of the
BASEBALL system
        <xref ref-type="bibr" rid="ref21">(Green Jr. et al., 1961)</xref>
        , one of the earliest QA systems in
the literature. Whilst ground-breaking in its approach, the paper was clear on
its limitations; limitations that set the path for decades of subsequent research
on QA systems.
      </p>
      <p>Before a detailed look at neural approaches to the QA task, a re ection on
the traditional approaches, that laid the foundations for current research, will
be presented.
2.1</p>
      <sec id="sec-2-1">
        <title>Knowledge-based approaches</title>
        <p>
          BASEBALL
          <xref ref-type="bibr" rid="ref21">(Green Jr. et al., 1961)</xref>
          can be described as a knowledge-based
question answering (KB-QA) system in that it seeks to build a structured semantic
representation of the question, upon which it can query a structured database to
return an answer. For example, a knowledge-based system would seek to parse
the input question \When did John F Kennedy die?" into a semantic query
representation such as Death-Year(\John F Kennedy", x), or similar, which can
then be used to query a structured knowledge base. More recently, this general
principle has evolved by focussing on the extraction of Resource Description
Framework (RDF) triplets from large-scale internet corpora, and storing these
in a knowledge base for querying later
          <xref ref-type="bibr" rid="ref18 ref30 ref6">(for examples, see Bollacker et al., 2008;
Fader et al., 2011; Lehmann et al., 2012)</xref>
          . However, the task of encoding
knowledge is expensive and time-consuming
          <xref ref-type="bibr" rid="ref14">(Clark and Porter, 1999)</xref>
          and KB-QA
systems typically fail on questions from unseen domains
          <xref ref-type="bibr" rid="ref1">(Abujabal et al., 2018)</xref>
          .
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 Information retrieval-based approaches</title>
        <p>
          Information retrieval-based question answering (IR-QA) systems search large
corpora of textual documents (for example, the web) for documents or passages
relevant to the input question. Once a relevant document has been retrieved
by the IR-QA system, reading comprehension algorithms are applied to
understand the text and return the most relevant answer
          <xref ref-type="bibr" rid="ref26">(Jurafsky and Martin, 2008)</xref>
          .
Put simply, IR-QA systems search raw text (extracted keywords, for example),
whereas KB-QA systems search knowledge bases
          <xref ref-type="bibr" rid="ref42">(Park et al., 2014)</xref>
          . An
advantage of IR-QA systems over KB-QA systems is that knowledge does not have to
be encoded prior to search, however their performance is somewhat dependant
of the competence of the IR algorithm.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Statistical machine learning-based approaches</title>
        <p>
          Machine learning-based (ML) approaches have garnered more success than the
preceding rule-based methods and have been applied to both the question
classi cation
          <xref ref-type="bibr" rid="ref35 ref39">(Metzler and Croft, 2005; Nguyen et al., 2007)</xref>
          and answer selection
          <xref ref-type="bibr" rid="ref54">(Suzuki et al., 2002)</xref>
          sub-problems. One drawback, however, is that they
typically require hand-crafted feature engineering in collaboration with a domain
expert.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Building blocks of neural architectures</title>
      <p>Having covered traditional methods in the previous section, this section will
explore neural architectures which perform the QA task in new ways. The section
begins with an overview of the core components in a neural network system
designed for the question answering task.
3.1</p>
      <sec id="sec-3-1">
        <title>Word embeddings</title>
        <p>
          In contemporary literature, neural networks have been successfully applied to
every part of the QA system. The introduction of word embeddings
          <xref ref-type="bibr" rid="ref37">(Mikolov et
al., 2013)</xref>
          brought the worlds of neural network research and natural language
processing closer together. This approach seeks to develop distributed
representations of words and phrases as numeric vectors, which allows them to be trained
using neural networks.
        </p>
        <p>
          Whilst word2Vec
          <xref ref-type="bibr" rid="ref37">(Mikolov et al., 2013)</xref>
          and the subsequent GloVe
embeddings
          <xref ref-type="bibr" rid="ref44">(Pennington et al., 2014)</xref>
          were successful for a variety of natural language
tasks, researchers began to understand their limitations. Peters et al. (2018)
noticed that the meaning of a word very much depends on the context in which it
is written. For example, in the two sentences (1) `Let's stick to improvisation in
this skit ', and (2) `The dog walker threw the stick far away ' the word `stick ' has
di erent meanings. In respect of this, Peters et al. proposed ELMo for deep
contextual word embeddings proposed by Peters et al (ELMo) which, when applied
to existing NLP models, outperformed the state-of-the-art results for every task
it was tested on, including the Stanford Question Answering dataset
          <xref ref-type="bibr" rid="ref48">(Rajpurkar
et al., 2018)</xref>
          .
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Convolutions</title>
        <p>Given a row vector, or matrix, a convolution is a sliding window (or lter or
kernel) applied across the input vector to produce an output. In a convolutional
neural network, the optimal values of the kernel are learnt from labelled input
data. Convolutional neural networks have proved extremely successful within
computer vision. In a natural language setting, a sliding window (kernel) is
passed over some prede ned number of words. Perhaps the most common use
of convolutions for textual data is character-level convolutional neural networks
(Char-CNNs) introduced by Zhang et al. (2015).
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Attention</title>
        <p>
          An attention mechanism in a standard encoder/decoder network allows the
decoder to look back at the hidden states from the input sequence, presented to the
decoder as a new input of weighted averages
          <xref ref-type="bibr" rid="ref3">(Bahdanau et al., 2015)</xref>
          . Since its
introduction, attention has received considerable research attention from the eld
          <xref ref-type="bibr" rid="ref34 ref56 ref8">(Britz et al., 2017; Luong et al., 2015; Vaswani et al., 2017)</xref>
          . Using the weighted
average of some hidden state is not only limited to the input sequence. In
selfattention
          <xref ref-type="bibr" rid="ref13">(Cheng et al., 2016)</xref>
          , relations between di erent positions of the same
inputs sequence are introduced. Vaswani et al. (2017) took this approach one
step further by introducing an architecture based solely on self-attention, with
no convolutions or recurrent properties. This foundations provide the building
blocks for general language models, such as BERT.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Neural architectures for the QA task</title>
      <p>The following sub-section will investigate speci c neural architectures which
make use of the above components speci cally for the QA task.
4.1</p>
      <sec id="sec-4-1">
        <title>Neural extensions to KB-QA</title>
        <p>
          The KB-QA concepts above have been extended to include aspects of neural
network architectures. Yin et al. (2016) propose a novel entity linking and
ranking method for relatively simple factoid question answering from Freebase
          <xref ref-type="bibr" rid="ref6">(Bollacker et al., 2008)</xref>
          . Then, neural networks are used to (1) match between
questions and fact candidates using a character-level Convolutional Neural Network
(char-CNN), and (2) match between Freebase predicate and question's pattern
using a word-level CNN (word-CNN).
        </p>
        <p>Also using Freebase, Dong et al. (2015) posit that to advance beyond simple
factoid QA, distributed representations of answer path, answer context, and
answer type must be learnt. To do this, they propose multi-column convolutional
neural networks (MCCNNs) which learn these representations from
questionanswer pairs.</p>
        <p>Finally, instead of semantic parsing to a vector representation, Sorokin and
Gurevych (2018) proposed a graph representation instead; this then enables
the use of Gated Graph Neural Networks (GGNNs) for the QA task. This
approach resulted in a 27.4% improvement (F1 score) over the best
non-graphbased model. GGNNs were rst proposed by Li et al. (2016) for sequential
modelling problems and extend the original GNN framework whereby a
neural network receives a graph as input, performs a computation over the nodes
and edges and returns a graph as output.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Neural extensions to IR-QA</title>
        <p>Important work by Burges et al. (2005) introduced RankNet; a pairwise method
for optimising a ranking of a list according to a traditional IR metric (such
as Mean Reciprocal Rank) using gradient descent. Technically, `RankNet' can
be any model for which the output is a di erentiable function, such as neural
networks or even boosted trees. Since publishing RankNet, the authors have
developed the idea further with LambdaRank and then LambdaMART. A
summary of each is available in Burges (2010). Researchers from IBM combined this
approach with Supervised Kemeny aggregation in Agarwal et al. (2012).</p>
        <p>
          Practical implementation of Learning-To-Rank is now widely available through
the TF-Ranking TensorFlow package
          <xref ref-type="bibr" rid="ref43">(Pasumarthi et al., 2018)</xref>
          .
4.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Language modelling</title>
        <p>Language modelling is the task of predicting the probability of the next word
in a sentence. Although the technique has developed away from the Question
Answering task speci cally, it is now a fundamental concept in many
state-ofthe-art approaches.</p>
        <p>
          Before Bengio et al. (2003), back-o tri-gam models
          <xref ref-type="bibr" rid="ref27">(Katz, 1987)</xref>
          and smoothed
tri-gram models
          <xref ref-type="bibr" rid="ref25 ref28">(Jelinek, and Mercer, 1980; Kneser and Ney, 1995)</xref>
          were favoured
among the research community. In his paper, `A Neural Probabilistic Language
Model', Bengio described how neural networks can be applied to learn this
distributed representation of words and kick-started a new direction for the eld.
4.4
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>Recurrent neural networks</title>
        <p>
          Since then, neural language modelling has developed from simple feed-forward
networks, to recurrent neural networks
          <xref ref-type="bibr" rid="ref36">(Mikolov et al., 2010)</xref>
          and Long
ShortTerm Memory architectures
          <xref ref-type="bibr" rid="ref20">(Graves, 2013)</xref>
          . Sequence-to-sequence models (Seq2Seq)
were introduced by Sutskever et al. (2014) and featured an encoder/decoder
architecture which successfully mapped input sequences of words/ tokens to an
output sequence. Seq2Seq have been successfully applied to machine
translation, natural language generation and QA tasks. One drawback of the Seq2Seq
model, however, is that between the encoder and decoder layers, information
is compressed into a xed-length `thought' vector. For short phrases this is
acceptable, but as length of input sequence increases, errors in the decoding step
surface. To overcome this, the concept of attention was introduced by Bahdanau
et al. (2015).
4.5
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>Pre-trained language models</title>
        <p>
          The current state-of-the-art for QA systems is based on pre-trained models which
combine many of the concepts discussed previously and were rst proposed by
Dai and Le (2015). Pre-trained models are trained over two distinct phases.
Firstly, an unsupervised model is trained on a very large open-domain corpus;
Wikipedia in the case of the Bidirectional Encoder Representations from
Transformer models, known as BERT
          <xref ref-type="bibr" rid="ref16">(Devlin et al., 2018)</xref>
          , and WebText corpus in
the case of Generalised Pre-Trained models, known as GPT
          <xref ref-type="bibr" rid="ref46 ref47">(Radford et al.,
2019, 2018)</xref>
          . Then this general-purpose language model can be ne-tuned to a
downstream task (like Question Answering) by means of a small-scale supervised
learning phase
          <xref ref-type="bibr" rid="ref50">(Ramachandran et al., 2017)</xref>
          .
        </p>
        <p>
          This approach has been hugely successful, especially when applied to the QA
task. For example, for the the widely used SQuAD 2.0 benchmark for QA systems
          <xref ref-type="bibr" rid="ref48">(Rajpurkar et al., 2018)</xref>
          , every one of the top 20 models on the leaderboard is
some variant on the BERT model
          <xref ref-type="bibr" rid="ref16">(Devlin et al., 2018)</xref>
          .
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Task</title>
      <p>5.1</p>
      <sec id="sec-5-1">
        <title>Datasets</title>
        <p>This section will explore the published datasets and benchmarks used to compare
and evaluate QA models.</p>
        <p>
          As QA systems become more competent, the benchmarks used to assess them
also need to change. Modern QA systems have surpassed the level required by
some of the earlier benchmarks, including WikiQA
          <xref ref-type="bibr" rid="ref59">(Yang et al., 2015)</xref>
          , NewsQA
          <xref ref-type="bibr" rid="ref55">(Trischler et al., 2017)</xref>
          and SQuAD 1.0
          <xref ref-type="bibr" rid="ref49">(Rajpurkar et al., 2016)</xref>
          . In response, the
community has proposed new datasets which demand more capable systems.
The bABi story dataset
          <xref ref-type="bibr" rid="ref58">(Weston et al., 2015)</xref>
          requires logical reasoning, the
SQuAD 2.0 dataset
          <xref ref-type="bibr" rid="ref48">(Rajpurkar et al., 2018)</xref>
          includes unanswerable questions,
and the CODAH dataset
          <xref ref-type="bibr" rid="ref12">(Chen et al., 2019)</xref>
          introduces adversarial
questioning. All of these datasets, however, were created through an arti cial
crowdsourcing methodology, which some have criticised as not being representative of
the kind of questions humans would ask. The Natural Questions (NQ) dataset
was recently released by Google
          <xref ref-type="bibr" rid="ref29">(Kwiatkowski et al., 2019)</xref>
          in response to this
criticism. The Natural Questions dataset consists of 307,372 training examples,
7,830 development examples and 7,842 examples in a hidden test set. For each
question, both a long answer (span) and a short answer are expected.
5.2
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Evaluation metrics</title>
        <p>
          Evaluation metrics for the QA task vary based on the benchmark being used.
WikiQA reported results for both MAP (Mean Averaged Precision) and MRR
(Mean Reciprocal Rank). Alternatively, the SQuAD benchmark uses both Exact
Match (EM) and macro-averaged F1 to assess submissions. NewsQA also uses
EM and the F1 score, and also evaluates the BLEU score
          <xref ref-type="bibr" rid="ref41">(Papineni et al., 2002)</xref>
          and CIDEr score
          <xref ref-type="bibr" rid="ref57">(Vedantam et al., 2015)</xref>
          . The more recent Natural Questions
dataset reports Precision, Recall and the F1 score. This diversity of metrics
reects the diversity of approaches, with researchers coming from both information
retrieval and machine learning backgrounds to tackle the QA task using metrics
they are familiar with from their respective elds.
6
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Future research directions</title>
      <p>
        Whilst it is encouraging to see consistent advancement of the state-of-the-art,
the rate of advancement has left considerable gaps in the research landscape. A
summary of the exposed gaps in the existing literature is presented:
1. Is the model fully understanding the question? Mudrakarta et al. (2018) have
begun to explore this direction, but more research is de nitely required.
2. Is SQuAD
        <xref ref-type="bibr" rid="ref48">(Rajpurkar et al., 2018)</xref>
        a suitable benchmark for the QA task?
Some criticism has been raised over the synthetic nature in which the
questions are produced
        <xref ref-type="bibr" rid="ref29">(Kwiatkowski et al., 2019)</xref>
        . Models need to be analysed
against new benchmarks, such as Kwiatkowski et al.'s Natural Questions
dataset and the CODAH benchmark
        <xref ref-type="bibr" rid="ref12">(Chen et al., 2019)</xref>
        .
3. The existing literature is lacking in an analysis on ensemble methods applied
to the QA task. How can we quantitatively and qualitatively reason about
why an ensemble of two models may or may not be e ective, based on their
individual strengths and weaknesses?
4. A degradation in performance is typical when assessing QA systems on
datasets that require an element of logical reasoning. Schlag and
Schmidhuber (2018) have began research in this direction but further research is
required.
5. Another powerful pre-trained language model has recently been released by
OpenAI - GPT2
        <xref ref-type="bibr" rid="ref47">(Radford et al., 2019)</xref>
        . This model shows promise but it is
yet to be properly analysed by the research community. OpenAI highlighted
lack of `world-knowledge' as a limitation of the GPT2 model and this is
another avenue to be explored further.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abujabal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roy</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yahya</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <year>2018</year>
          .
          <article-title>Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases</article-title>
          .
          <source>WWW 18 Proc. 2018 World Wide Web Conf</source>
          .
          <volume>1053</volume>
          {
          <fpage>1062</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subbian</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Melville</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lawrence</surname>
          </string-name>
          , R.D.,
          <string-name>
            <surname>Gondek</surname>
            ,
            <given-names>D.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <year>2012</year>
          .
          <article-title>Learning to Rank for Robust Question Answering</article-title>
          ,
          <source>in: Proceedings of the 21st ACM International Conference on Information and Knowledge Management</source>
          ,
          <source>CIKM '12. ACM</source>
          , New York, NY, USA, pp.
          <volume>833</volume>
          {
          <fpage>842</fpage>
          . https://doi.org/10.1145/2396761.2396867
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <year>2015</year>
          .
          <article-title>Neural Machine Translation by Jointly Learning to Align and Translate</article-title>
          . ICLR.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ducharme</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vincent</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jauvin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <year>2003</year>
          .
          <string-name>
            <given-names>A</given-names>
            <surname>Neural Probabilistic Language Model</surname>
          </string-name>
          .
          <source>J. Mach. Learn. Res</source>
          .
          <volume>3</volume>
          ,
          <issue>1137</issue>
          {
          <fpage>1155</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Berant</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frostig</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>Semantic Parsing on Freebase from Question-Answer Pairs</article-title>
          .
          <source>Proc. 2013 Conf. Empir. Methods Nat. Lang. Process</source>
          .
          <volume>1533</volume>
          {
          <fpage>1544</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bollacker</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paritosh</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sturge</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , J.,
          <year>2008</year>
          .
          <article-title>Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge</article-title>
          .
          <source>SIGMOD 08 Proc. 2008 ACM SIGMOD Int. Conf. Manag. Data</source>
          <volume>1247</volume>
          {
          <fpage>1250</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Brill</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumais</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <year>2002</year>
          .
          <article-title>An Analysis of the AskMSR QuestionAnswering System</article-title>
          .
          <source>Proc. Conf. Empir. Methods Nat. Lang. Process. EMNLP</source>
          <volume>257</volume>
          {
          <fpage>264</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Britz</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldie</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luong</surname>
          </string-name>
          , M.-T.,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <year>2017</year>
          .
          <source>Massive Exploration of Neural Machine Translation Architectures, in: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          , Copenhagen, Denmark, pp.
          <volume>1442</volume>
          {
          <fpage>1451</fpage>
          . https://doi.org/10.18653/v1/
          <fpage>D17</fpage>
          -1151
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Burges</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shaked</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Renshaw</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lazier</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deeds</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamilton</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hullender</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <year>2005</year>
          .
          <article-title>Learning to rank using gradient descent</article-title>
          ,
          <source>in: Proceedings of the 22nd International Conference on Machine Learning - ICML '05</source>
          . pp.
          <volume>89</volume>
          {
          <fpage>96</fpage>
          . https://doi.org/10.1145/1102351.1102363
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Burges</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <year>2010</year>
          .
          <article-title>From RankNet to LambdaRank to LambdaMART: An Overview (No. MSR-TR-</article-title>
          <year>2010</year>
          -82).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fisch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <year>2017</year>
          . Reading Wikipedia to Answer Open-Domain
          <string-name>
            <surname>Questions</surname>
          </string-name>
          .
          <article-title>Presented at the Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          , pp.
          <year>1870</year>
          {
          <year>1879</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>D'Arcy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Downey</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <year>2019</year>
          . CODAH:
          <article-title>An Adversarially Authored Question-Answer Dataset for Common Sense [WWW Document]</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Cheng, J.,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lapata</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <year>2016</year>
          .
          <article-title>Long Short-Term Memory-Networks for Machine Reading</article-title>
          ,
          <source>in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          , Austin, Texas, pp.
          <volume>551</volume>
          {
          <fpage>561</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Porter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <year>1999</year>
          .
          <article-title>A Knowledge-Based Approach to QuestionAnswering</article-title>
          . AAAI'99
          <string-name>
            <given-names>Fall</given-names>
            <surname>Symp</surname>
          </string-name>
          .
          <source>Quest. Answering Syst</source>
          .
          <volume>43</volume>
          {
          <fpage>51</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <year>2015</year>
          .
          <article-title>Semi-supervised Sequence Learning</article-title>
          , in: Cortes,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Lawrence</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.D.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.D.</given-names>
            ,
            <surname>Sugiyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Garnett</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          <volume>28</volume>
          . Curran Associates, Inc., pp.
          <volume>3079</volume>
          {
          <fpage>3087</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , M.-W.,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <year>2018</year>
          . BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <year>2015</year>
          .
          <article-title>Question Answering over Freebase with Multi-Column Convolutional Neural Networks, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics</article-title>
          . pp.
          <volume>260</volume>
          {
          <fpage>269</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Fader</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <year>2011</year>
          .
          <article-title>Identifying Relations for Open Information Extraction</article-title>
          .
          <source>Proc. 2011 Conf. Empir. Methods Nat. Lang. Process</source>
          .
          <volume>1535</volume>
          {
          <fpage>1545</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Ferucci</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Chu-Carroll</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gondek</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalyanpur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lally</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murdock</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nyberg</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prager</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <year>2010</year>
          .
          <article-title>Building Watson: An Overview of the DeepQA Project</article-title>
          .
          <source>Assoc. Adv. Artif. Intell</source>
          .
          <volume>59</volume>
          {
          <fpage>79</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>Generating Sequences With Recurrent Neural Networks</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Green</surname>
            Jr.,
            <given-names>B.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chomsky</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laughery</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <year>1961</year>
          .
          <article-title>Baseball: an automatic question-answerer</article-title>
          .
          <source>Proceeding IRE-AIEE-ACM 61 West</source>
          .
          <volume>219</volume>
          {
          <fpage>224</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Hirschman</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaizauskas</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <year>2001</year>
          .
          <article-title>Natural language question answering: the view from here</article-title>
          .
          <source>Nat. Lang. Eng</source>
          .
          <volume>7</volume>
          ,
          <issue>275</issue>
          {
          <fpage>300</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Hirschman</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Light</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Breck</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burger</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <year>1999</year>
          .
          <article-title>Deep Read: A Reading Comprehension System</article-title>
          .
          <source>Proc. 37th Annu. Meet. Assoc. Comput. Linguist. Comput. Linguist</source>
          .
          <volume>325</volume>
          {
          <fpage>332</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xhang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <year>2019</year>
          .
          <article-title>Knowledge Graph Embedding Based Question Answering</article-title>
          .
          <source>WSDM 19 Proc. Twelfth ACM Int. Conf. Web Search Data Min</source>
          .
          <volume>105</volume>
          {
          <fpage>113</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Jelinek</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mercer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <year>1980</year>
          .
          <article-title>Interpolated estimation of Markov source parameters from sparse data</article-title>
          ,
          <source>in: Proceedings of the Workshop on Pattern Recognition in Practice.</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <year>2008</year>
          . Speech and
          <string-name>
            <given-names>Language</given-names>
            <surname>Processing</surname>
          </string-name>
          , 2nd ed. Pearson International.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Katz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <year>1987</year>
          .
          <article-title>Estimation of probabilities from sparse data for the language model component of a speech recognizer</article-title>
          .
          <source>IEEE Trans. Acoust. Speech Signal Process</source>
          .
          <volume>35</volume>
          ,
          <issue>400</issue>
          {
          <fpage>401</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Kneser</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ney</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <year>1995</year>
          .
          <article-title>Improved backing-o for M-gram language modeling</article-title>
          .
          <source>Presented at the International Conference on Acoustics, Speech, and Signal Processing</source>
          , IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Kwiatkowski</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palomaki</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Red eld, O.,
          <year>2019</year>
          .
          <article-title>Natural Questions: a Benchmark for Question Answering Research</article-title>
          .
          <source>Trans. Assoc. Comput. Linguist.</source>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furche</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grasso</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <year>2012</year>
          .
          <article-title>deqa: Deep Web Extraction for Question Answering</article-title>
          .
          <source>Int. Semantic Web Conf</source>
          .
          <year>2012</year>
          131{
          <fpage>147</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <year>2007</year>
          .
          <article-title>Is Question Answering Better than Information Retrieval? Towards a Task-Based Evaluation Framework for Question Series</article-title>
          .
          <source>Hum. Lang. Technol. 2007 Conf. North Am. Chapter Assoc. Comput. Linguist. Proc. Main Conf</source>
          .
          <volume>212</volume>
          {
          <fpage>219</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tarlow</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brockschmidt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Zemel</surname>
            ,
            <given-names>R..</given-names>
          </string-name>
          <article-title>Gated graph sequence neural networks</article-title>
          .
          <source>In ICLR</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sinha</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bakshi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huynh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katz</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karger</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <year>2003</year>
          .
          <article-title>What Makes a Good Answer? The Role of Context in Question Answering</article-title>
          .
          <source>Proc. Ninth IFIP TC13 Int. Conf. Hum.-Comput. Interact. INTERACT</source>
          <year>2003</year>
          25{
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Luong</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pham</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <year>2015</year>
          . E ective Approaches to Attentionbased
          <source>Neural Machine Translation, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          , Lisbon, Portugal, pp.
          <volume>1412</volume>
          {
          <fpage>1421</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Metzler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croft</surname>
            ,
            <given-names>W.B.</given-names>
          </string-name>
          ,
          <year>2005</year>
          .
          <article-title>Analysis of Statistical Question Classi cation for Fact-Based Questions</article-title>
          .
          <source>Inf Retrieval</source>
          <volume>8</volume>
          ,
          <issue>481</issue>
          {
          <fpage>504</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , Kara at, M.,
          <string-name>
            <surname>Burget</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cernocky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khudanpur</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <year>2010</year>
          .
          <article-title>Recurrent Neural Network Based Language Model</article-title>
          .
          <source>Presented at the 1th Annual Conference of the International Speech Communication Association</source>
          , pp.
          <volume>1045</volume>
          {
          <fpage>1048</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>Distributed Representations of Words and Phrases and their Compositionality. ArXiv13104546 Cs Stat</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>Mudrakarta</surname>
            ,
            <given-names>P.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sundararajan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dhamdhere</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <year>2018</year>
          .
          <article-title>Did the Model Understand the Question?, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          .
          <source>Association for Computational Linguistics</source>
          , Melbourne, Australia, pp.
          <year>1896</year>
          {
          <year>1906</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <surname>M.L. Nguyen</surname>
            ,
            <given-names>T.T.</given-names>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
            and
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Shimazu</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Subtree Mining for Question Classi cation Problem</article-title>
          .
          <source>In Proc. of IJCAI 2007</source>
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <string-name>
            <surname>Otsuka</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nishida</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bessho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Asano</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomita</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <year>2018</year>
          .
          <article-title>Query Expansion with Neural Question-to-Answer Translation for FAQ-based Question Answering</article-title>
          ,
          <source>in: Companion Proceedings of the The Web Conference</source>
          <year>2018</year>
          , WWW '
          <fpage>18</fpage>
          . International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp.
          <volume>1063</volume>
          {
          <fpage>1068</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <surname>Papineni</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roukos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ward</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
          </string-name>
          , W.-J.,
          <year>2002</year>
          .
          <article-title>Bleu: a Method for Automatic Evaluation of Machine Translation, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics</article-title>
          .
          <source>Association for Computational Linguistics</source>
          , Philadelphia, Pennsylvania, USA, pp.
          <volume>311</volume>
          {
          <fpage>318</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          42.
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>G.G.</given-names>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>ISOFT at QALD-4: Semantic Similaritybased Question Answering System over Linked Data</article-title>
          .
          <source>CLEF CEUR Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          43.
          <string-name>
            <surname>Pasumarthi</surname>
            ,
            <given-names>R.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bruch</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bendersky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Najork</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfeifer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golbandi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <year>2018</year>
          . TF-Ranking:
          <article-title>Scalable TensorFlow Library for Learning-to-</article-title>
          <string-name>
            <surname>Rank</surname>
          </string-name>
          .
          <source>Presented at the 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '19), August</source>
          <volume>4</volume>
          {
          <fpage>8</fpage>
          ,
          <year>2019</year>
          , Anchorage,
          <string-name>
            <surname>AK</surname>
          </string-name>
          , USA.
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          44.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <year>2014</year>
          . Glove:
          <article-title>Global Vectors for Word Representation</article-title>
          .
          <source>Presented at the Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pp.
          <volume>1532</volume>
          {
          <fpage>1543</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          45.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <year>2018</year>
          .
          <article-title>Deep Contextualized Word Representations, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          , Volume
          <volume>1</volume>
          (Long Papers).
          <source>Association for Computational Linguistics</source>
          , New Orleans, Louisiana, pp.
          <volume>2227</volume>
          {
          <fpage>2237</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          46.
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narasimhan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salimans</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <year>2018</year>
          .
          <article-title>Improving Language Understanding by Generative Pre-Training.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          47.
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amodei</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <year>2019</year>
          .
          <article-title>Language Models are Unsupervised Multitask Learners</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          48.
          <string-name>
            <surname>Rajpurkar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <year>2018</year>
          .
          <string-name>
            <given-names>Know</given-names>
            <surname>What You Don't Know</surname>
          </string-name>
          :
          <article-title>Unanswerable Questions for SQuAD</article-title>
          .
          <source>Proc. 56th Annu. Meet. Assoc. Comput. Linguist</source>
          .
          <volume>2</volume>
          ,
          <issue>784</issue>
          {
          <fpage>789</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          49.
          <string-name>
            <surname>Rajpurkar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Zhang, J.,
          <string-name>
            <surname>Lopyrev</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <year>2016</year>
          . SQuAD:
          <volume>100</volume>
          ,000+
          <article-title>Questions for Machine Comprehension of Text</article-title>
          ,
          <source>in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          , Austin, Texas, pp.
          <volume>2383</volume>
          {
          <fpage>2392</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          50.
          <string-name>
            <surname>Ramachandran</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <surname>Q.</surname>
          </string-name>
          ,
          <year>2017</year>
          .
          <article-title>Unsupervised Pretraining for Sequence to Sequence Learning</article-title>
          ,
          <source>in: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          , Copenhagen, Denmark, pp.
          <volume>383</volume>
          {
          <fpage>391</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          51.
          <string-name>
            <surname>Schlag</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <year>2018</year>
          .
          <article-title>Learning to Reason with Third Order Tensor Products</article-title>
          , in: Bengio,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Larochelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Grauman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>CesaBianchi</surname>
          </string-name>
          , N.,
          <string-name>
            <surname>Garnett</surname>
            ,
            <given-names>R</given-names>
          </string-name>
          . (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          <volume>31</volume>
          . Curran Associates, Inc., pp.
          <volume>9981</volume>
          {
          <fpage>9993</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          52.
          <string-name>
            <surname>Sorokin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <year>2018</year>
          .
          <article-title>Modeling Semantics with Gated Graph Neural Networks for Knowledge Base Question Answering</article-title>
          ,
          <source>in: Proceedings of the 27th International Conference on Computational Linguistics</source>
          , pp.
          <volume>3306</volume>
          {
          <fpage>3317</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          53.
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>Sequence to Sequence Learning with Neural Networks</article-title>
          , in: Ghahramani,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Welling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Cortes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Lawrence</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.D.</given-names>
            ,
            <surname>Weinberger</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.Q</surname>
          </string-name>
          . (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          <volume>27</volume>
          . Curran Associates, Inc., pp.
          <volume>3104</volume>
          {
          <fpage>3112</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          54.
          <string-name>
            <surname>Suzuki</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sasaki</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maeda</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <year>2002</year>
          .
          <article-title>SVM Answer Selection for OpenDomain Question Answering</article-title>
          ,
          <source>in: COLING 2002: The 19th International Conference on Computational Linguistics.</source>
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          55.
          <string-name>
            <surname>Trischler</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sordoni</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bachman</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suleman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <year>2017</year>
          .
          <article-title>NewsQA: A Machine Comprehension Dataset</article-title>
          .
          <source>Presented at the Proceedings of the 2nd Workshop on Representation Learning for NLP</source>
          , pp.
          <volume>191</volume>
          {
          <fpage>200</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          56.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <year>2017</year>
          .
          <article-title>Attention is All you Need</article-title>
          , in: Guyon,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.V.</given-names>
            ,
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Vishwanathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Garnett</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          . Curran Associates, Inc., pp.
          <volume>5998</volume>
          {
          <fpage>6008</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          57.
          <string-name>
            <surname>Vedantam</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zitnick</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <year>2015</year>
          . CIDEr:
          <article-title>Consensus-based image description evaluation, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</article-title>
          .
          <source>Presented at the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , IEEE, Boston, MA, USA, pp.
          <volume>4566</volume>
          {
          <fpage>4575</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          58.
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chopra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rush</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          , van Merrienboer,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Joulin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          ,
          <year>2015</year>
          .
          <article-title>Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          59.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yih</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meek</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <year>2015</year>
          .
          <article-title>WikiQA: A Challenge Dataset for OpenDomain Question Answering</article-title>
          ,
          <source>in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          , Lisbon, Portugal, pp.
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          60. W. Yin,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Schutze</surname>
          </string-name>
          .
          <article-title>Simple question answering by attentive convolutional neural network</article-title>
          .
          <source>In COLING</source>
          <year>2016</year>
          , 26th International Conference on Computational Linguistics,
          <source>Proceedings of the Conference: Technical Papers.</source>
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          61.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , LeCun, Y.,
          <year>2015</year>
          .
          <article-title>Character-level Convolutional Networks for Text Classi cation</article-title>
          , in: Cortes,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Lawrence</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.D.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.D.</given-names>
            ,
            <surname>Sugiyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Garnett</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          <volume>28</volume>
          . Curran Associates, Inc., pp.
          <volume>649</volume>
          {
          <fpage>657</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>