<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improved Question Answering using Domain Prediction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Himani Srivastava</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prerna Khurana</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saurabh Srivastava</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vaibhav Varshney</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lovekesh Vig</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Puneet Agarwal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gautam Shrof</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>srivastava.himani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>prerna.khurana</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>sriv.saurabh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>varshney.v</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>lovekesh.vig</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>puneet.a</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>gautam.shrof }@tcs.com TCS Research New Delhi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>India</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Question answering over Knowledge Graph, Triple Input Siamese Network</institution>
          ,
          <addr-line>Domain Prediction</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>Question Answering over Knowledge Graphs has mainly utilised the mentioned entity and the relation to predict the answer. However, a key piece of contextual information that is missing in these approaches is the knowledge of the broad domain (such as sports or music) to which the answer belongs. The current paper proposes to infer the domain of the answer via a pre-trained BERT [10] Classification Model, and utilize the inferred domain as an additional input to yield state-of-the-art performance for single-relation (SimpleQuestions) and multi-relation (WebQSP) Question Answering bench-marks. We employ a triple input Siamese network architecture that learns to predict the semantic similarity between the question, the inferred domain, and the relation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Question answering (QA) over large scale knowledge graphs has
been the focus of much NLP research and in this paper, we focus
on natural language questions that are taken from the
SimpleQuestions [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and WebQSP [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] Datasets, that contain tuples of the form
(subject, relation, object, question). We tackle the problem of QA
in 3 steps : 1) Extraction of mentioned entities from the question
and linking to entities in the Knowledge Graph. 2) Detecting the
domain of the Object (answer). 3) Prediction of the most relevant
relation for answering the question.
      </p>
      <p>Prior deep learning approaches use relation as a class label only
and hence don’t capture the semantic level correlation between the
question and the relation. To overcome this limitation, we propose
a Triple Input Siamese Metric Learning Model (TISML), that scores
similarity between questions and candidate relations, and thereby
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).</p>
      <p>KDD Converse’20, August 2020,
© 2020 Copyright held by the owner/author(s).
indirectly predicts the relation most relevant to a given question.
But this approach was observed to have failed at times when words
of candidate relations are highly similar to the words present in the
question (discussed in section 8 Type-A). As a result, this tends to
mislead the model to predict the relation incorrectly. We, therefore,
propose that if the broad domain of the expected answer is also
input to the model, the model tends to select relevant relations
improving the relation prediction model and this results in state of
the art performance. Consider this question from SimpleQuestions
Dataset, i.e., “who is a production company that performed Othello”.
Here we first extract the mentioned entity “ Othello”, using a model
(referred to as Entity Tagging Model), and identify all the relations
of this entity in the knowledge graph as candidate relations.
Considering two of the candidate relations as (“theater/ theater_production/
producing_company”, “film/ film/ production_companies” ). A model
that takes only question and the candidate relation as input
predicts “film/ film/ production_companies” as the correct relationship,
which is actually wrong. However, if we also input the domain
of the answer, “theater”, it helps the model to score the candidate
relations appropriately and predicts “theater/ theater_production/
producing_company” as the correct relation. The main
contributions of this paper are : 1) We demonstrate that a metric learning
similarity scoring network along with the injected domain
knowledge, enhances Question Answering over the Knowledge Graph. 2)
We release the SimpleQuestions and WebQSP datasets 1 created for
our experiments to carry out further research.</p>
      <p>Terms mentioned entity, and subject name mean the same thing,
and may be used interchangeably.
2</p>
    </sec>
    <sec id="sec-2">
      <title>PROBLEM DESCRIPTION</title>
      <p>We assume that a background Knowledge Graph comprising of
a set of triples ( = {1, ...,  }) is available, here each triple 
is represented as a set of three terms {Subject, Relation, Object},
also referred to as {, , }. We are concerned with natural language
questions ( ∈ ), which mention an entity of the knowledge graph
(). We also assume that such questions can be answered using
single triple (for single relation questions) or multiple triples (for
multi-relation questions) of the knowledge graph. For this example,
the ground truth triple comprises of subject  =“Othello”, relation
 =“theater/ theater_production/ producing_company”, and object
 =“National Theatre of Great Britain”. In this context, the objective
of the Question Answering task is to retrieve the appropriate answer
(“National Theatre of Great Britain” ) from the knowledge graph.</p>
      <p>
        We formulate this problem as a supervised learning task. We
assume that a set of questions  = {1, ...,  } and corresponding
1https://drive.google.com/drive/folders/1vkyeg9JEIZBCkQrezguMwwgJDmje6Lq_
?usp=sharing
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 ,   ,  )) are
ground truth triples  = {1, ...,  } (with  = (  
available as training data. The underlying knowledge graph for our
work is Freebase [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. For Simple Questions dataset we have used a
smaller version of Freebase i.e., FB2M[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and for WebQSP dataset
we have used the full Freebase.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        Mapping a natural language question to a knowledge graph is a well
studied task and a significant amount of work has been done on this
topic over the last two decades [[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]]. As per recent
trends, answering natural language queries via knowledge graphs
follow two broad approaches namely, "Semantic Parsing based" and
"Information Extraction based" which are further explained below.
• Semantic Parsing based : These approaches [[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]]
involve expressing natural language queries into SPARQL
queries (logical forms) and then project these queries to
a knowledge base to extract relevant facts. The advent of
deep learning approaches, which captured the semantics of
a natural language query helped to further improve the
performance of these systems. The semantics captured through
these deep learning approaches are encoded in a fixed-length
vector and are projected on a knowledge graph
representation to extract relevant facts.
• Information Extraction based : Work by [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]2 claimed
that by using a simple RNN, they are able to obtain
better results for both entity tagging and relation detection on
SimpleQuestions dataset. Another work by [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] used
Hierarchical BiLSTM based Siamese network for relation prediction
and claimed that relation detection task has a direct impact
on Question Answering task on both the datsets. Using
attention with RNN along with a similarity matrix based CNN
has been able to achieve superior results in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] used a
BiLSTM-CRF tagger followed by a BiLSTM to capture
mention detection and relation classification respectively. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
were among the first ones to apply BERT [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for this task but
did not get any improvement over the previous state-of-art.
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] proposed a similar approach to ours using
similaritybased network for relation detection, however, they have
removed about 2% of data from the test set. To the best of
our knowledge, none of the cited approaches utilize domain
information to predict relations.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>DATASET DESCRIPTION</title>
      <p>
        SimpleQuestions dataset [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is split into the 75,910 train, 10,845 dev,
21,686 test sets and WebQSP dataset is taken from [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] has 3,116 in
train, 1,649 in test and 623 in dev set . We below explain our method
for extracting domain information from the Knowledge Graph and
creating an input dataset of (Question, Relation, Domain) triples
for the TISML model.
      </p>
      <p>
        Domain Data Creation : To extract domain information from
the Freebase Knowledge Graph, we observe that the relation of a
question represents three pieces of information. E.g., given a
relation people/person/ place_of_birth in a triple (S, R, O) of Freebase,
people represents the domain of the subject, person represents
2they have reported 86.8% accuracy but we, [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] have not been able to
replicate their results
the sub-type of the subject and place_of_birth is the property or
the attribute of that person. So for extracting the domain of the
subject for triple (S, R, O), we have to look at the first component
of the relation (R). Since we are tagging the domain of the
“answer" to every question in the dataset, we search for the domains
of the “object" in a (subject, relation, object, question) tuple by
finding reverse relation between the subject and object. The process
of domain data creation is depicted in figure 1. Questions which
are tagged by single domain, we refer to them as unambiguous
questions and questions tagged by None 3 or multiple domains are
referred as ambiguous questions. In SimpleQuestions Dataset there
were 57,421 unambiguous questions and 16,432 (None) and 2,057
(multiple domain) ambiguous questions.
      </p>
      <p>Domain Tagging for ambiguous questions: In order to tag
such questions with the appropriate domain, we referred to the
tagged domains of unambiguous questions, in the following steps :
(1) Create a One to One mapping between relations and
domains:
• Create a mapping table between a relation and a domain
for every unambiguous question.
• Select the most frequently occurring domain for a relation
among all the tagged domains.
• update the mapping table with a unique domain for a
relation
(2) Tag domain to ambiguous questions from the mapping table.</p>
      <p>Siamese Data Creation : To create a (question, relation,
domain) triplet for input into the TISML model, for every question
q, we extract all the candidate relations for the mentioned entity
from the Knowledge Graph and the corresponding inferred domain.
We then label the triplet consisting the actual relation as 1 and the
remaining triplets as 0. Resulting we have 586,953 in train, 82,864
in dev and 388,695 in test for SimpleQuestions dataset and 77,792
in train, 19,449 in dev and 49,912 in test for WebQSP dataset.
5</p>
    </sec>
    <sec id="sec-5">
      <title>PROPOSED APPROACH</title>
      <p>We present a schematic diagram of the proposed approach in figure
(2). Here, given a question  with ground truth triples as (, , ),
we first find the mentioned entity or the subject of the question via</p>
      <sec id="sec-5-1">
        <title>3Questions tagged by “None" indicate no domain has been tagged</title>
        <p>Formatted
Question
Model</p>
        <p>FreeBase
knowledge Graph</p>
        <p>Where was Sasha Vujacic born?</p>
        <p>Entity Detection(BiLSTM-CRF)</p>
        <p>
          Sasha Vujacic
an Entity Tagging Model. From the identified entity, we obtain all
the candidate subjects S = {1, ...,  } (in figure 2, S = { 1, 2}) and
also extract all the candidate relations R = {1, ...,  } connected
to  from the Knowledge Graph (in figure 2, R = { 1, 2}). We also
input this question q to another model which predicts the domain
of the expected answer for that question, this model is hereafter
referred to as Domain Prediction Model. Further, the question that
is input to the TISML Model is modified by inserting a string &lt;  &gt;
in place of the mentioned entity and yielding a formatted question
q’. This was done to ensure that the Siamese Model is agnostic to
the specific mentioned entity in the question while predicting the
triplet score and could also give the positional information to the
neural networks [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6 MODEL DESCRIPTION</title>
      <p>In this section, we discuss about all the three individual models in
detail.</p>
      <p>
        (1) Entity Tagging Model: It is a sequence labelling task (IO
tagging) which uses a BiLSTM and a Conditional Random
Field layer [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] for detecting the mentioned entity in the
question. K entity candidates are predicted using the top-K
Viterbi algorithm. Further, candidate aliases are extracted
from the Freebase SQL table by querying it using predicted
K candidates. Candidates having minimum Levenshtein
distance (between aliases and the detected mentioned entity)
will be the predicted subject names and their corresponding
machine ids will be retrieved as candidate machine ids4. This
model is used by [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]5, which is the state-of-the-art
algorithm in Question Answering task over Knowledge Graph
and is used as the baseline algorithm for comparing our
results, hence, we also used the same model for our task.
(2) Domain Prediction Model: It is a supervised classification
task, where the input is a question q and the output is the
predicted domain of the answer type of the question. For
this task, we use a pre-trained BERT Large [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] classification
model and fine tune on SimpleQuestions dataset by adding
an additional fully connected layer on top of BERT and learn
the weights of this layer to predict the correct domain for
the question. We fine-tune the model for 5 epochs and keep
sequence length as 40 and batch size as 64. This model
outperforms other classification models, namely, LSTM [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],
CNN [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], BiLSTM with attention [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Capsule Network
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], results for domain prediction are presented in table 1.
This is because of the fact that BERT is trained on a huge
corpus (Wikidata (2.5 billion words), BookCorpus (800 million
words)) and can thus leverage the knowledge it has learned
which results in better prediction of the domains.
(3) Triple Input Siamese Metric Learning Model: In order
to select the correct relation for the question q, we use a
TISML Model 6 (refer to figure 3) which captures the
semantics between all the inputs (question, relation, domain).
This network consists of 3 diferent embedding generator
networks - Glove Embedding Layer, 1D-CNN Layer and an
4While creating SQL tables for Freebase, along every machine id (MId)
different string aliases are mapped. for example:(MId| alias|
alias-normalizedpunctuation| alias-normalized-punctuation-stem| alias-preprocessed)::: (0c1n99q |
gulliver|gulliver|gulliv|gulliver)
      </p>
      <sec id="sec-6-1">
        <title>5https://github.com/PetrochukM/Simple-QA-EMNLP-2018</title>
        <p>6Our network is inspired from
https://www.linkedin.com/pulse/duplicate-quoraquestion-abhishek-thakur/, this model uses dual input, we add an extra input, i.e., the
inferred domain
LSTM Layer. Each input is passed through these networks
which further generate their respective embeddings. These
embeddings are then concatenated through a Merge layer
followed by multiple dense layers. The final embedding
computed is used to calculate a score between 0 to 1, that
indicates whether the triple has correct relation or not.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>EXPERIMENTAL SETUP</title>
      <p>Hyperparameters for TISML Model include sequence length as
40, batch size as 384, all dropouts (Variational, Recurrent) as 0.2.
CNN block uses 64 filters each of length 5, dense layers have 300
units along with PRelu and batch normalization layers. It has 11M
total parameters, with 6.4M as trainable parameters. The word
embeddings are initialized with Glove using 300 dimension vectors
and we use Adam as optimizer. Parameters were selected on the
basis of validation accuracy. We run our experiments on a 12 GB
Nvidia GPU. Average runtime for 1 epoch on SimpleQuestions is
300 s and for WebQSP is 130 s.
8</p>
    </sec>
    <sec id="sec-8">
      <title>RESULTS</title>
      <p>
        We have compared our approach with previous Deep Learning
approaches mentioned in Section 3. Evaluation metrics used are same
from the baseline approach [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] for SimpleQuestions dataset (i.e.
accuracy). For WebQSP dataset evalution is done similar to [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ],
where Top-1 accuracy is reported for answer prediction, i.e., among
multiple predicted relations we pick the top scored relation and
use it for answer prediction7. We have also compared with our
approach but without using domain information. According to Table
2, it can be seen that augmenting domain knowledge along with
7For Web Question dataset only 64 questions in test data have multiple answers rest
others have multiple relations with single tagged answers
the relation and question provides better accuracy than the other
baseline approaches. We have shown in Table 3 a few examples
from SimpleQuestions dataset in case of which the relations were
predicted wrongly by the baseline approach, however, with our
approach such questions could be answered correctly. In our analysis,
we found out that such errors were improved in our approach, due
to 2 main reasons, which are
(1) Type-A (Improvement due to Domain Prediction Model):
In the question "What high school is located in Hugo", the
baseline model predicts the relation as location/ location/
containedby which is not correct, this could be because of the
word "located" in the query or due to similar pattern
questions belonging to this relation, and hence their model, which
is a relation classification model, predicts a relation
containing "location". However, our model predicts the domain of
this question as "education", since the question is essentially
asking about the "high school" which belongs to the
education domain. This information pushes the Triple Input
Siamese Metric Learning Model to select the relation similar
to the education domain which is education/ school_category/
schools_of_this_kind.
(2) Type-B (Improvement due to Similarity Model): Another
type of error that is improved by our approach belongs to
the case when the relation predicted by both the approaches
is from the same domain, but still diferent due to varying
sub-domain. For instance, for the question "what is a chinese
album", the baseline model detects music/ release_track/
release, however our model predicts music/ album_release_type/
albums as the correct relation. This is because of the fact that
our model exploits the semantic level correlation between
the question and the relation and is able to match the two at
a literal level, which can be seen from the fact that there is a
presence the word "album" in both the question as well as
the relation.
      </p>
    </sec>
    <sec id="sec-9">
      <title>9 ERROR ANALYSIS</title>
      <p>While analysing the errors of the test set, we observed that most
errors can be broadly classified into 4 categories. These errors are
discussed below, and reported in Table 4, examples have been taken
from SimpleQuestions Dataset:
• Category-1 Error (Error due to Triple Input Siamese Metric</p>
      <p>
        Learning Model)
• Category-2 Error (Error due to Domain Prediction Model)
• Category-3 Error (Unanswerable Questions)
• Category-4 Error (Error due to Entity Tagging Model)
(1) Category-1 Error: There are plenty of erroneous questions
that fall under this category. Even though the Domain
Prediction Model predicts the domain correctly, these errors
occur due to the highly ambiguous structure of the relations
and their tagged questions. To illustrate, the query, "Whats
a track from dawn escapes" has music/ release/ track_list as
the actual relation while the predicted relation is music/
release/ track. Whereas, another question "What’s a track
from the release 9 seconds" has music/ release/ track as the
actual relation while the predicted relation is music/ release/
track_list, which clearly confuses the Triple Input Siamese
Metric Learning Model as the pattern of the questions are
identical in nature and the relations are also very similar.
(2) Category-2 Error: These type of errors occur because the
domain of the question given in Knowledge Graph is vague.
There are certain domains in Freebase which do not have a
clear definition, for instance, domains such as - Base,
Common, User and Type consists of questions that are similar
to questions from other domains and these type of questions
comprise about 4% of the test data. If we observe questions
in Table 4 from these domains, it is observed that they do not
have a common pattern. This misleads the Domain
Prediction Model which results in incorrect downstream relation
detection and thus the wrong answer. For example, given a
question "What is Andrew Deemer’s profession", the Domain
Prediction Model will predict "people" as the domain and
thus the Triple Input Siamese Metric Learning Model
predicts people/ person/ profession, whereas the ground relation
of this question is common/ topic/ subjects while the ground
domain is "common".
(3) Category-3 Error: There are 386 questions in the test set
that do not contain a head_entity. Previous work done by [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
have removed such questions from the evaluation of their
model, we, however, did not remove these questions from
the dataset. For example, the question "Who is an alumni
involved in IT" does not contain a mentioned entity, which
is a data creation error and cannot be solved and predicted
as None.
(4) Category-4 Error: These errors occur because Entity
Tagging Model is not able to identify the subject present in the
question correctly which results into a selection of wrong
candidate relations set from the knowledge graph. For
example, given a question “what’s the name of a popular Japanese
to Portuguese dictionary", has the ground truth mentioned
entity as “dictionary”, however, the Entity Tagging Model
predicts “Portuguese” as the subject, which leads to a wrong
set of candidate relations and hence results in wrong answer
prediction.
10 CONCLUSION
In this paper, we propose the use of domain information as an
additional information for predicting the correct relation for both
single relation and multi-relation datasets. Such information is
predicted from the question using a Domain Prediction model and
helps in strengthening the outcome of the TISML Model to select the
most appropriate relation for the question. Our proposed approach
outperforms previous approaches on Question Answering over
Knowledge Graph and achieves a new state-of-the-art results on
SimpleQuestions and WebQSP datasets. For future work we will also
explore datasets like GraphQuestions [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and ComplexQuestions
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to deal with more aspects of general Question Answering.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <fpage>2018</fpage>
          .
          <article-title>Question Answering over Freebase via Attentive RNN with Similarity Matrix based CNN</article-title>
          . CoRR abs/
          <year>1804</year>
          .03317 (
          <year>2018</year>
          ). arXiv:
          <year>1804</year>
          .03317 http://arxiv. org/abs/
          <year>1804</year>
          .03317 Withdrawn.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Junwei</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <surname>Nan</surname>
            <given-names>Duan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            <given-names>Yan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ming Zhou</surname>
            , and
            <given-names>Tiejun</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>ConstraintBased Question Answering with Knowledge Graph</article-title>
          .
          <source>In Proceedings of COLING</source>
          <year>2016</year>
          ,
          <source>the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee</source>
          , Osaka, Japan,
          <fpage>2503</fpage>
          -
          <lpage>2514</lpage>
          . https://www.aclweb.org/anthology/C16-1236
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Berant</surname>
          </string-name>
          , Andrew Chou, Roy Frostig, and
          <string-name>
            <given-names>Percy</given-names>
            <surname>Liang</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Semantic Parsing on Freebase from Question-Answer Pairs</article-title>
          .
          <source>In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          , Seattle, Washington, USA,
          <fpage>1533</fpage>
          -
          <lpage>1544</lpage>
          . https: //www.aclweb.org/anthology/D13-1160
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Berant</surname>
          </string-name>
          , Andrew Chou, Roy Frostig, and
          <string-name>
            <given-names>Percy</given-names>
            <surname>Liang</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Semantic Parsing on Freebase from Question-Answer Pairs</article-title>
          .
          <source>In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          , Seattle, Washington, USA,
          <fpage>1533</fpage>
          -
          <lpage>1544</lpage>
          . https: //www.aclweb.org/anthology/D13-1160
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Berant</surname>
          </string-name>
          and
          <string-name>
            <given-names>Percy</given-names>
            <surname>Liang</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Semantic Parsing via Paraphrasing</article-title>
          .
          <source>In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers).</given-names>
          </string-name>
          <article-title>Association for Computational Linguistics</article-title>
          , Baltimore, Maryland,
          <fpage>1415</fpage>
          -
          <lpage>1425</lpage>
          . https://doi.org/10.3115/v1/
          <fpage>P14</fpage>
          -1133
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Kurt</given-names>
            <surname>Bollacker</surname>
          </string-name>
          , Colin Evans, Praveen Paritosh, Tim Sturge, and
          <string-name>
            <given-names>Jamie</given-names>
            <surname>Taylor</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Freebase: a collaboratively created graph database for structuring human knowledge</article-title>
          .
          <source>In In SIGMOD Conference</source>
          .
          <volume>1247</volume>
          -
          <fpage>1250</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Antoine</given-names>
            <surname>Bordes</surname>
          </string-name>
          , Sumit Chopra, and
          <string-name>
            <given-names>Jason</given-names>
            <surname>Weston</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Question Answering with Subgraph Embeddings</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <article-title>Association for Computational Linguistics</article-title>
          , Doha, Qatar,
          <fpage>615</fpage>
          -
          <lpage>620</lpage>
          . https://doi.org/10.3115/v1/
          <fpage>D14</fpage>
          -1067
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Antoine</given-names>
            <surname>Bordes</surname>
          </string-name>
          , Sumit Chopra, and
          <string-name>
            <given-names>Jason</given-names>
            <surname>Weston</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Question Answering with Subgraph Embeddings</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <article-title>Association for Computational Linguistics</article-title>
          , Doha, Qatar,
          <fpage>615</fpage>
          -
          <lpage>620</lpage>
          . https://doi.org/10.3115/v1/
          <fpage>D14</fpage>
          -1067
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Antoine</given-names>
            <surname>Bordes</surname>
          </string-name>
          , Nicolas Usunier, Sumit Chopra, and
          <string-name>
            <given-names>Jason</given-names>
            <surname>Weston</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Largescale Simple Question Answering with Memory Networks</article-title>
          .
          <source>CoRR abs/1506</source>
          .
          <year>02075</year>
          (
          <year>2015</year>
          ).
          <source>arXiv:1506</source>
          .
          <year>02075</year>
          http://arxiv.org/abs/1506.02075
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Jacob</surname>
            <given-names>Devlin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          . CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ). arXiv:
          <year>1810</year>
          .04805 http://arxiv.org/abs/
          <year>1810</year>
          .04805
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Changshun</given-names>
            <surname>Du</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lei</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Text Classification Research with Attentionbased Recurrent Neural Networks</article-title>
          .
          <source>International Journal of Computers Communications Control</source>
          <volume>13</volume>
          (
          <issue>02</issue>
          <year>2018</year>
          ),
          <volume>50</volume>
          . https://doi.org/10.15837/ijccc.
          <year>2018</year>
          .
          <volume>1</volume>
          .
          <fpage>3142</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Vishal</surname>
            <given-names>Gupta</given-names>
          </string-name>
          , Manoj Chinnakotla, and
          <string-name>
            <given-names>Manish</given-names>
            <surname>Shrivastava</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Retrieve and Re-rank: A Simple and Efective IR Approach to Simple Question Answering over Knowledge Graphs</article-title>
          .
          <source>In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)</source>
          .
          <source>Association for Computational Linguistics</source>
          , Brussels, Belgium,
          <fpage>22</fpage>
          -
          <lpage>27</lpage>
          . https://doi.org/10.18653/v1/
          <fpage>W18</fpage>
          -5504
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Yanchao</surname>
            <given-names>Hao</given-names>
          </string-name>
          , Yuanzhe Zhang, Kang Liu, Shizhu He, Zhanyi Liu,
          <string-name>
            <surname>Hua Wu</surname>
            , and
            <given-names>Jun</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          .
          <source>Association for Computational Linguistics</source>
          , Vancouver, Canada,
          <fpage>221</fpage>
          -
          <lpage>231</lpage>
          . https://doi.org/10.18653/v1/
          <fpage>P17</fpage>
          -1021
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jürgen</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Long Short-term Memory</article-title>
          .
          <source>Neural computation 9 (12</source>
          <year>1997</year>
          ),
          <fpage>1735</fpage>
          -
          <lpage>80</lpage>
          . https://doi.org/10.1162/neco.
          <year>1997</year>
          .
          <volume>9</volume>
          .8.
          <fpage>1735</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Jaeyoung</surname>
            <given-names>Kim</given-names>
          </string-name>
          , Sion Jang, Sungchul Choi, and Eunjeong Lucy Park.
          <year>2018</year>
          .
          <article-title>Text Classification using Capsules</article-title>
          .
          <source>Neurocomputing</source>
          <volume>376</volume>
          (
          <year>2018</year>
          ),
          <fpage>214</fpage>
          -
          <lpage>221</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Yann</surname>
            <given-names>Lecun</given-names>
          </string-name>
          , Patrick Hafner, and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Object Recognition with Gradient-Based Learning</article-title>
          .
          <source>(08</source>
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Denis</surname>
            <given-names>Lukovnikov</given-names>
          </string-name>
          ,
          <source>Asja Fischer, and Jens Lehmann</source>
          .
          <year>2019</year>
          .
          <article-title>Pretrained Transformers for Simple Question Answering over Knowledge Graphs</article-title>
          .
          <source>In The Semantic Web - ISWC</source>
          <year>2019</year>
          ,
          <string-name>
            <given-names>Chiara</given-names>
            <surname>Ghidini</surname>
          </string-name>
          , Olaf Hartig, Maria Maleshkova, Vojtěch Svátek, Isabel Cruz, Aidan Hogan, Jie Song, Maxime Lefrançois, and Fabien Gandon (Eds.). Springer International Publishing, Cham,
          <fpage>470</fpage>
          -
          <lpage>486</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Denis</surname>
            <given-names>Lukovnikov</given-names>
          </string-name>
          ,
          <source>Asja Fischer, Jens Lehmann, and Sören Auer</source>
          .
          <year>2017</year>
          .
          <article-title>Neural Network-based Question Answering over Knowledge Graphs on Word and Character Level</article-title>
          . https://doi.org/10.1145/3038912.3052675
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Salman</surname>
            <given-names>Mohammed</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Peng</given-names>
            <surname>Shi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks</article-title>
          .
          <source>In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>2</volume>
          (Short Papers).
          <source>Association for Computational Linguistics</source>
          , New Orleans, Louisiana,
          <fpage>291</fpage>
          -
          <lpage>296</lpage>
          . https://doi.org/10.18653/v1/
          <fpage>N18</fpage>
          -2047
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Petrochuk</surname>
          </string-name>
          and
          <string-name>
            <given-names>Luke</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach</article-title>
          .
          <source>In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          , Brussels, Belgium,
          <fpage>554</fpage>
          -
          <lpage>558</lpage>
          . https://doi.org/10.18653/v1/
          <fpage>D18</fpage>
          - 1051
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Yu</surname>
            <given-names>Su</given-names>
          </string-name>
          , Huan Sun, Brian Sadler, Mudhakar Srivatsa, Izzeddin Gür, Zenghui Yan, and
          <string-name>
            <given-names>Xifeng</given-names>
            <surname>Yan</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>On Generating Characteristic-rich Question Sets for QA Evaluation</article-title>
          .
          <source>In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          , Austin, Texas,
          <fpage>562</fpage>
          -
          <lpage>572</lpage>
          . https://doi.org/10.18653/v1/
          <fpage>D16</fpage>
          -1054
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Ferhan</given-names>
            <surname>Türe</surname>
          </string-name>
          and
          <string-name>
            <given-names>Oliver</given-names>
            <surname>Jojic</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Simple and Efective Question Answering with Recurrent Neural Networks</article-title>
          .
          <source>CoRR abs/1606</source>
          .05029 (
          <year>2016</year>
          ). arXiv:
          <volume>1606</volume>
          .05029 http://arxiv.org/abs/1606.05029
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Wen-tau Yih</surname>
            , Xiaodong He, and
            <given-names>Christopher</given-names>
          </string-name>
          <string-name>
            <surname>Meek</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Semantic Parsing for Single-Relation Question Answering</article-title>
          .
          <source>In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers).</given-names>
          </string-name>
          <article-title>Association for Computational Linguistics</article-title>
          , Baltimore, Maryland,
          <fpage>643</fpage>
          -
          <lpage>648</lpage>
          . https://doi.org/10. 3115/v1/
          <fpage>P14</fpage>
          -2105
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Mo</surname>
            <given-names>Yu</given-names>
          </string-name>
          , Wenpeng Yin, Kazi Saidul Hasan, Cicero dos Santos, Bing Xiang, and
          <string-name>
            <given-names>Bowen</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Improved Neural Relation Detection for Knowledge Base Question Answering</article-title>
          .
          <source>In Proceedings of the 55th Annual</source>
          <article-title>Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          .
          <source>Association for Computational Linguistics</source>
          , Vancouver, Canada,
          <fpage>571</fpage>
          -
          <lpage>581</lpage>
          . https://doi.org/10.18653/v1/
          <fpage>P17</fpage>
          - 1053
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>