=Paper=
{{Paper
|id=Vol-2666/KDD_Converse20_paper_6
|storemode=property
|title=Improved Question Answering using Domain Prediction
|pdfUrl=https://ceur-ws.org/Vol-2666/KDD_Converse20_paper_6.pdf
|volume=Vol-2666
|authors=Himani Srivastava,Prerna Khurana,Saurabh Srivastava,Vaibhav Varshney,Lovekesh Vig,Puneet Agarwal,Gautam Shroff
|dblpUrl=https://dblp.org/rec/conf/kdd/SrivastavaKSVVA20
}}
==Improved Question Answering using Domain Prediction==
Improved Question Answering using Domain Prediction
Himani Srivastava, Prerna Khurana, Saurabh Srivastava, Vaibhav Varshney, Lovekesh Vig, Puneet
Agarwal, Gautam Shroff
{srivastava.himani,prerna.khurana2,sriv.saurabh,varshney.v,lovekesh.vig,puneet.a,gautam.shroff}@tcs.com
TCS Research
New Delhi, India
ABSTRACT indirectly predicts the relation most relevant to a given question.
Question Answering over Knowledge Graphs has mainly utilised But this approach was observed to have failed at times when words
the mentioned entity and the relation to predict the answer. How- of candidate relations are highly similar to the words present in the
ever, a key piece of contextual information that is missing in these question (discussed in section 8 Type-A). As a result, this tends to
approaches is the knowledge of the broad domain (such as sports mislead the model to predict the relation incorrectly. We, therefore,
or music) to which the answer belongs. The current paper proposes propose that if the broad domain of the expected answer is also
to infer the domain of the answer via a pre-trained BERT [10] Clas- input to the model, the model tends to select relevant relations
sification Model, and utilize the inferred domain as an additional improving the relation prediction model and this results in state of
input to yield state-of-the-art performance for single-relation (Sim- the art performance. Consider this question from SimpleQuestions
pleQuestions) and multi-relation (WebQSP) Question Answering Dataset, i.e., “who is a production company that performed Othello”.
bench-marks. We employ a triple input Siamese network archi- Here we first extract the mentioned entity “Othello”, using a model
tecture that learns to predict the semantic similarity between the (referred to as Entity Tagging Model), and identify all the relations
question, the inferred domain, and the relation. of this entity in the knowledge graph as candidate relations. Consid-
ering two of the candidate relations as (“theater/ theater_production/
KEYWORDS producing_company”, “film/ film/ production_companies”). A model
that takes only question and the candidate relation as input pre-
Question answering over Knowledge Graph, Triple Input Siamese
dicts “film/ film/ production_companies” as the correct relationship,
Network, Domain Prediction
which is actually wrong. However, if we also input the domain
ACM Reference Format: of the answer, “theater”, it helps the model to score the candidate
Himani Srivastava, Prerna Khurana, Saurabh Srivastava, Vaibhav Varshney, relations appropriately and predicts “theater/ theater_production/
Lovekesh Vig, Puneet Agarwal, Gautam Shroff. 2020. Improved Question
producing_company” as the correct relation. The main contribu-
Answering using Domain Prediction. In Proceedings of KDD Workshop on
tions of this paper are : 1) We demonstrate that a metric learning
Conversational Systems Towards Mainstream Adoption (KDD Converse’20).
ACM, New York, NY, USA, 6 pages. similarity scoring network along with the injected domain knowl-
edge, enhances Question Answering over the Knowledge Graph. 2)
We release the SimpleQuestions and WebQSP datasets 1 created for
our experiments to carry out further research.
1 INTRODUCTION Terms mentioned entity, and subject name mean the same thing,
Question answering (QA) over large scale knowledge graphs has and may be used interchangeably.
been the focus of much NLP research and in this paper, we focus
on natural language questions that are taken from the SimpleQues- 2 PROBLEM DESCRIPTION
tions [7] and WebQSP [3] Datasets, that contain tuples of the form We assume that a background Knowledge Graph comprising of
(subject, relation, object, question). We tackle the problem of QA a set of triples (𝑇 = {𝑡 1, ..., 𝑡𝑛 }) is available, here each triple 𝑡𝑖
in 3 steps : 1) Extraction of mentioned entities from the question is represented as a set of three terms {Subject, Relation, Object},
and linking to entities in the Knowledge Graph. 2) Detecting the also referred to as {𝑆, 𝑅, 𝑂}. We are concerned with natural language
domain of the Object (answer). 3) Prediction of the most relevant questions (𝑞𝑖 ∈ 𝑄), which mention an entity of the knowledge graph
relation for answering the question. (𝑆). We also assume that such questions can be answered using
Prior deep learning approaches use relation as a class label only single triple (for single relation questions) or multiple triples (for
and hence don’t capture the semantic level correlation between the multi-relation questions) of the knowledge graph. For this example,
question and the relation. To overcome this limitation, we propose the ground truth triple comprises of subject 𝑆𝑖 =“Othello”, relation
a Triple Input Siamese Metric Learning Model (TISML), that scores 𝑅𝑖 =“theater/ theater_production/ producing_company”, and object
similarity between questions and candidate relations, and thereby 𝑂𝑖 =“National Theatre of Great Britain”. In this context, the objective
of the Question Answering task is to retrieve the appropriate answer
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed (“National Theatre of Great Britain”) from the knowledge graph.
for profit or commercial advantage and that copies bear this notice and the full citation We formulate this problem as a supervised learning task. We
on the first page. Copyrights for third-party components of this work must be honored. assume that a set of questions 𝑄 𝑆 = {𝑞 1, ..., 𝑞𝑚 } and corresponding
For all other uses, contact the owner/author(s).
KDD Converse’20, August 2020,
1 https://drive.google.com/drive/folders/1vkyeg9JEIZBCkQrezguMwwgJDmje6Lq_
© 2020 Copyright held by the owner/author(s).
?usp=sharing
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
KDD Converse’20, August 2020, Himani Srivastava, Prerna Khurana, Saurabh Srivastava, Vaibhav Varshney, Lovekesh Vig, Puneet Agarwal, Gautam Shroff
ground truth triples 𝑇𝑆 = {𝑡 1, ..., 𝑡𝑚 } (with 𝑡𝑖 = (𝑠𝑖𝑆 , 𝑟𝑖𝑆 , 𝑜𝑖𝑆 )) are the sub-type of the subject and place_of_birth is the property or
available as training data. The underlying knowledge graph for our the attribute of that person. So for extracting the domain of the
work is Freebase [6]. For Simple Questions dataset we have used a subject for triple (S, R, O), we have to look at the first component
smaller version of Freebase i.e., FB2M[8] and for WebQSP dataset of the relation (R). Since we are tagging the domain of the “an-
we have used the full Freebase. swer" to every question in the dataset, we search for the domains
of the “object" in a (subject, relation, object, question) tuple by find-
3 RELATED WORK ing reverse relation between the subject and object. The process
Mapping a natural language question to a knowledge graph is a well of domain data creation is depicted in figure 1. Questions which
studied task and a significant amount of work has been done on this are tagged by single domain, we refer to them as unambiguous
topic over the last two decades [[4], [23], [7], [19]]. As per recent questions and questions tagged by None 3 or multiple domains are
trends, answering natural language queries via knowledge graphs referred as ambiguous questions. In SimpleQuestions Dataset there
follow two broad approaches namely, "Semantic Parsing based" and were 57,421 unambiguous questions and 16,432 (None) and 2,057
"Information Extraction based" which are further explained below. (multiple domain) ambiguous questions.
Domain Tagging for ambiguous questions: In order to tag
• Semantic Parsing based : These approaches [[4], [5], [13]]
such questions with the appropriate domain, we referred to the
involve expressing natural language queries into SPARQL
tagged domains of unambiguous questions, in the following steps :
queries (logical forms) and then project these queries to
a knowledge base to extract relevant facts. The advent of (1) Create a One to One mapping between relations and do-
deep learning approaches, which captured the semantics of mains:
a natural language query helped to further improve the per- • Create a mapping table between a relation and a domain
formance of these systems. The semantics captured through for every unambiguous question.
these deep learning approaches are encoded in a fixed-length • Select the most frequently occurring domain for a relation
vector and are projected on a knowledge graph representa- among all the tagged domains.
tion to extract relevant facts. • update the mapping table with a unique domain for a
• Information Extraction based : Work by [22]2 claimed relation
that by using a simple RNN, they are able to obtain bet- (2) Tag domain to ambiguous questions from the mapping table.
ter results for both entity tagging and relation detection on
SimpleQuestions dataset. Another work by [24] used Hierar-
chical BiLSTM based Siamese network for relation prediction
and claimed that relation detection task has a direct impact
on Question Answering task on both the datsets. Using at-
tention with RNN along with a similarity matrix based CNN
has been able to achieve superior results in [1]. [20] used a
BiLSTM-CRF tagger followed by a BiLSTM to capture men-
tion detection and relation classification respectively. [17]
were among the first ones to apply BERT [10] for this task but
did not get any improvement over the previous state-of-art.
[12] proposed a similar approach to ours using similarity-
based network for relation detection, however, they have
removed about 2% of data from the test set. To the best of
our knowledge, none of the cited approaches utilize domain
information to predict relations. Figure 1: Domain Data Creation
4 DATASET DESCRIPTION Siamese Data Creation : To create a (question, relation, do-
SimpleQuestions dataset [9] is split into the 75,910 train, 10,845 dev, main) triplet for input into the TISML model, for every question
21,686 test sets and WebQSP dataset is taken from [24] has 3,116 in q, we extract all the candidate relations for the mentioned entity
train, 1,649 in test and 623 in dev set . We below explain our method from the Knowledge Graph and the corresponding inferred domain.
for extracting domain information from the Knowledge Graph and We then label the triplet consisting the actual relation as 1 and the
creating an input dataset of (Question, Relation, Domain) triples remaining triplets as 0. Resulting we have 586,953 in train, 82,864
for the TISML model. in dev and 388,695 in test for SimpleQuestions dataset and 77,792
Domain Data Creation : To extract domain information from in train, 19,449 in dev and 49,912 in test for WebQSP dataset.
the Freebase Knowledge Graph, we observe that the relation of a
question represents three pieces of information. E.g., given a rela- 5 PROPOSED APPROACH
tion people/person/ place_of_birth in a triple (S, R, O) of Freebase, We present a schematic diagram of the proposed approach in figure
people represents the domain of the subject, person represents (2). Here, given a question 𝑞 with ground truth triples as (𝑠, 𝑟, 𝑜),
we first find the mentioned entity or the subject of the question via
2 they have reported 86.8% accuracy but we, [19], and [20] have not been able to
replicate their results 3 Questions tagged by “None" indicate no domain has been tagged
Improved Question Answering using Domain Prediction KDD Converse’20, August 2020,
Where was Sasha Vujacic born?
Entity Detection(BiLSTM-CRF)
FreeBase
knowledge Graph
S1, S2 are
Sasha Vujacic candidate
S1 S2 subjects
Formatted m.07f3jg m.027qk4 Domain
Question R1 R2 Detection
R1, R2 are
Model candidate Model
people/person/place_of_birth people/person/gender
relations
people/person/place_of_birth
people/person/gender Set of candidate
. relations
.
Siamese Metric
Learning Network
Where was born? people/person/place_of_birth location
Where was born? people/person/gender location
.
.
Similarity Scores [SS1,SS2]
Figure 2: This figure depicts the overall approach for Question Answering task over Knowledge Graph. Given a question, first,
we detect its domain using Domain Prediction Model, and extract the mentioned entity using Entity Tagging Model. The
extracted entity is then searched in the knowledge graph and all the relevant subjects (S1, S2) are considered as candidate
subjects, and all of the relations connected to these candidate subjects are candidate relations (R1, R2). The original question
is converted into formatted question by replacing the entity mention with a token . Our TISML Model (Bottom), calculates
the similarity score between the inputs - question, inferred domain, and the relation candidates.
an Entity Tagging Model. From the identified entity, we obtain all model is used by [20]5 , which is the state-of-the-art algo-
the candidate subjects S = {𝑆 1, ..., 𝑆𝑝 } (in figure 2, S = {𝑆 1, 𝑆 2 }) and rithm in Question Answering task over Knowledge Graph
also extract all the candidate relations R = {𝑅1, ..., 𝑅𝑞 } connected and is used as the baseline algorithm for comparing our re-
to 𝑆 from the Knowledge Graph (in figure 2, R = {𝑅1, 𝑅2 }). We also sults, hence, we also used the same model for our task.
input this question q to another model which predicts the domain
of the expected answer for that question, this model is hereafter (2) Domain Prediction Model: It is a supervised classification
referred to as Domain Prediction Model. Further, the question that task, where the input is a question q and the output is the
is input to the TISML Model is modified by inserting a string < 𝑒 > predicted domain of the answer type of the question. For
in place of the mentioned entity and yielding a formatted question this task, we use a pre-trained BERT Large [10] classification
q’. This was done to ensure that the Siamese Model is agnostic to model and fine tune on SimpleQuestions dataset by adding
the specific mentioned entity in the question while predicting the an additional fully connected layer on top of BERT and learn
triplet score and could also give the positional information to the the weights of this layer to predict the correct domain for
neural networks [24]. the question. We fine-tune the model for 5 epochs and keep
sequence length as 40 and batch size as 64. This model out-
6 MODEL DESCRIPTION performs other classification models, namely, LSTM [14],
In this section, we discuss about all the three individual models in CNN [16], BiLSTM with attention [11], Capsule Network
detail. [15], results for domain prediction are presented in table 1.
This is because of the fact that BERT is trained on a huge cor-
(1) Entity Tagging Model: It is a sequence labelling task (IO pus (Wikidata (2.5 billion words), BookCorpus (800 million
tagging) which uses a BiLSTM and a Conditional Random words)) and can thus leverage the knowledge it has learned
Field layer [20] for detecting the mentioned entity in the which results in better prediction of the domains.
question. K entity candidates are predicted using the top-K
Viterbi algorithm. Further, candidate aliases are extracted (3) Triple Input Siamese Metric Learning Model: In order
from the Freebase SQL table by querying it using predicted to select the correct relation for the question q, we use a
K candidates. Candidates having minimum Levenshtein dis- TISML Model 6 (refer to figure 3) which captures the se-
tance (between aliases and the detected mentioned entity) mantics between all the inputs (question, relation, domain).
will be the predicted subject names and their corresponding This network consists of 3 different embedding generator
machine ids will be retrieved as candidate machine ids4 . This networks - Glove Embedding Layer, 1D-CNN Layer and an
4 While 5 https://github.com/PetrochukM/Simple-QA-EMNLP-2018
creating SQL tables for Freebase, along every machine id (MId) dif-
ferent string aliases are mapped. for example:(MId| alias| alias-normalized- 6 Our network is inspired from https://www.linkedin.com/pulse/duplicate-quora-
punctuation| alias-normalized-punctuation-stem| alias-preprocessed)::: (0c1n99q | question-abhishek-thakur/, this model uses dual input, we add an extra input, i.e., the
gulliver|gulliver|gulliv|gulliver) inferred domain
KDD Converse’20, August 2020, Himani Srivastava, Prerna Khurana, Saurabh Srivastava, Vaibhav Varshney, Lovekesh Vig, Puneet Agarwal, Gautam Shroff
LSTM Layer. Each input is passed through these networks
which further generate their respective embeddings. These
embeddings are then concatenated through a Merge layer
followed by multiple dense layers. The final embedding com-
puted is used to calculate a score between 0 to 1, that indi-
cates whether the triple has correct relation or not.
Table 1: Domain Prediction Result
Approach Accuracy
LSTM 86
CNN 89
Bi-LSTM+Attention 91
Capsule 91
Fine Tuned Bert 93.16
Figure 3: Network Architecture of TISML
Table 2: : Accuracy on the SimpleQuestions (SQ) and WebQSP
(WQ) datasets (*Average Accuracy over 5 runs)
the relation and question provides better accuracy than the other
Approach SQ WQ
baseline approaches. We have shown in Table 3 a few examples
HR-BiLSTM & CNN &BiLSTM-CRF[24] 77 63.9
from SimpleQuestions dataset in case of which the relations were
GRU [18] 71.2 -
predicted wrongly by the baseline approach, however, with our ap-
BiGRU-CRF & BiGRU[19] 73.7 -
proach such questions could be answered correctly. In our analysis,
BiLSTM & BiGRU[19] 74.9 - we found out that such errors were improved in our approach, due
Attentive RNN & Similarity Matrix based CNN[1] 76.8 - to 2 main reasons, which are -
BiLSTM-CRF & BiLSTM[20] 78.1 -
Our Approach (Without Domain)* 76 63 (1) Type-A (Improvement due to Domain Prediction Model):
Our Approach (TISML)* 79.16 65.3 In the question "What high school is located in Hugo", the
baseline model predicts the relation as location/ location/ con-
tainedby which is not correct, this could be because of the
7 EXPERIMENTAL SETUP word "located" in the query or due to similar pattern ques-
Hyperparameters for TISML Model include sequence length as tions belonging to this relation, and hence their model, which
40, batch size as 384, all dropouts (Variational, Recurrent) as 0.2. is a relation classification model, predicts a relation contain-
CNN block uses 64 filters each of length 5, dense layers have 300 ing "location". However, our model predicts the domain of
this question as "education", since the question is essentially
units along with PRelu and batch normalization layers. It has 11M
asking about the "high school" which belongs to the edu-
total parameters, with 6.4M as trainable parameters. The word
cation domain. This information pushes the Triple Input
embeddings are initialized with Glove using 300 dimension vectors
Siamese Metric Learning Model to select the relation similar
and we use Adam as optimizer. Parameters were selected on the
to the education domain which is education/ school_category/
basis of validation accuracy. We run our experiments on a 12 GB
schools_of_this_kind.
Nvidia GPU. Average runtime for 1 epoch on SimpleQuestions is
300 s and for WebQSP is 130 s.
(2) Type-B (Improvement due to Similarity Model): Another
8 RESULTS type of error that is improved by our approach belongs to
the case when the relation predicted by both the approaches
We have compared our approach with previous Deep Learning ap- is from the same domain, but still different due to varying
proaches mentioned in Section 3. Evaluation metrics used are same sub-domain. For instance, for the question "what is a chinese
from the baseline approach [20] for SimpleQuestions dataset (i.e. album", the baseline model detects music/ release_track/ re-
accuracy). For WebQSP dataset evalution is done similar to [24], lease, however our model predicts music/ album_release_type/
where Top-1 accuracy is reported for answer prediction, i.e., among albums as the correct relation. This is because of the fact that
multiple predicted relations we pick the top scored relation and our model exploits the semantic level correlation between
use it for answer prediction7 . We have also compared with our ap- the question and the relation and is able to match the two at
proach but without using domain information. According to Table a literal level, which can be seen from the fact that there is a
2, it can be seen that augmenting domain knowledge along with presence the word "album" in both the question as well as
7 For Web Question dataset only 64 questions in test data have multiple answers rest the relation.
others have multiple relations with single tagged answers
Improved Question Answering using Domain Prediction KDD Converse’20, August 2020,
Table 3: Analysis of improvement due to domain prediction and similarity model over baseline model
Question Actual Relation Baseline Model Prediction Triple Input Siamese Metric Learning Prediction Predicted Domain
Type-A What high school is located in Hugo education/school_category/schools_of_this_kind location/location/containedby education/school_category/schools_of_this_kind education
whats a musical film from pakistan film/film_genre/films_in_this_genre music/genre/albums film/film_genre/films_in_this_genre film
Type-B what is a chinese album music/album_release_type/albums music/release_track/release music/album_release_type/albums music
what is the lyrics written by? music/lyricist/lyrics_written book/book_subject/works music/lyricist/lyrics_written music
Table 4: Error Analysis
Error Type Question Actual Relation Predicted Relation Actual Domain Predicted Domain Misclassification
Category-1 Whats a track from “dawn escapes" music/release/track_list music/release/track music music 104
Whats a track from the release 9 seconds music/release/track music/release/track_list music music 14
Which album was "brothers sisters" listed on music/release_track/release music/recording/release music music 112
which albums contain the track “song: starcandy" ? music/recording/release music/release_track/release music music 10
Category-2 What is Andrew deemers profession ? common/topic/notable_type people/person/profession common people 52
what is the occupation of hans krása ? people/person/profession common/topic/notable_type people common 14
is deena a male or female base/givennames/given_name/gender people/person/gender base people 106
Category-3 who is an alumni involved in IT(HEAD=nan) common/topic/subjects people/profession/people_with_this_profession common people 386
9 ERROR ANALYSIS thus the Triple Input Siamese Metric Learning Model pre-
While analysing the errors of the test set, we observed that most dicts people/ person/ profession, whereas the ground relation
errors can be broadly classified into 4 categories. These errors are of this question is common/ topic/ subjects while the ground
discussed below, and reported in Table 4, examples have been taken domain is "common".
from SimpleQuestions Dataset:
(3) Category-3 Error: There are 386 questions in the test set
• Category-1 Error (Error due to Triple Input Siamese Metric that do not contain a head_entity. Previous work done by [12]
Learning Model) have removed such questions from the evaluation of their
• Category-2 Error (Error due to Domain Prediction Model) model, we, however, did not remove these questions from
• Category-3 Error (Unanswerable Questions) the dataset. For example, the question "Who is an alumni
• Category-4 Error (Error due to Entity Tagging Model) involved in IT" does not contain a mentioned entity, which
is a data creation error and cannot be solved and predicted
as None.
(1) Category-1 Error: There are plenty of erroneous questions
that fall under this category. Even though the Domain Pre-
(4) Category-4 Error: These errors occur because Entity Tag-
diction Model predicts the domain correctly, these errors
ging Model is not able to identify the subject present in the
occur due to the highly ambiguous structure of the relations
question correctly which results into a selection of wrong
and their tagged questions. To illustrate, the query, "Whats
candidate relations set from the knowledge graph. For exam-
a track from dawn escapes" has music/ release/ track_list as
ple, given a question “what’s the name of a popular Japanese
the actual relation while the predicted relation is music/
to Portuguese dictionary", has the ground truth mentioned
release/ track. Whereas, another question "What’s a track
entity as “dictionary”, however, the Entity Tagging Model
from the release 9 seconds" has music/ release/ track as the
predicts “Portuguese” as the subject, which leads to a wrong
actual relation while the predicted relation is music/ release/
set of candidate relations and hence results in wrong answer
track_list, which clearly confuses the Triple Input Siamese
prediction.
Metric Learning Model as the pattern of the questions are
identical in nature and the relations are also very similar.
(2) Category-2 Error: These type of errors occur because the
domain of the question given in Knowledge Graph is vague. 10 CONCLUSION
There are certain domains in Freebase which do not have a In this paper, we propose the use of domain information as an
clear definition, for instance, domains such as - Base, Com- additional information for predicting the correct relation for both
mon, User and Type consists of questions that are similar single relation and multi-relation datasets. Such information is
to questions from other domains and these type of questions predicted from the question using a Domain Prediction model and
comprise about 4% of the test data. If we observe questions helps in strengthening the outcome of the TISML Model to select the
in Table 4 from these domains, it is observed that they do not most appropriate relation for the question. Our proposed approach
have a common pattern. This misleads the Domain Predic- outperforms previous approaches on Question Answering over
tion Model which results in incorrect downstream relation Knowledge Graph and achieves a new state-of-the-art results on
detection and thus the wrong answer. For example, given a SimpleQuestions and WebQSP datasets. For future work we will also
question "What is Andrew Deemer’s profession", the Domain explore datasets like GraphQuestions [21] and ComplexQuestions
Prediction Model will predict "people" as the domain and [2] to deal with more aspects of general Question Answering.
KDD Converse’20, August 2020, Himani Srivastava, Prerna Khurana, Saurabh Srivastava, Vaibhav Varshney, Lovekesh Vig, Puneet Agarwal, Gautam Shroff
REFERENCES [13] Yanchao Hao, Yuanzhe Zhang, Kang Liu, Shizhu He, Zhanyi Liu, Hua Wu, and
[1] 2018. Question Answering over Freebase via Attentive RNN with Similarity Jun Zhao. 2017. An End-to-End Model for Question Answering over Knowledge
Matrix based CNN. CoRR abs/1804.03317 (2018). arXiv:1804.03317 http://arxiv. Base with Cross-Attention Combining Global Knowledge. In Proceedings of the
org/abs/1804.03317 Withdrawn. 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:
[2] Junwei Bao, Nan Duan, Zhao Yan, Ming Zhou, and Tiejun Zhao. 2016. Constraint- Long Papers). Association for Computational Linguistics, Vancouver, Canada,
Based Question Answering with Knowledge Graph. In Proceedings of COLING 221–231. https://doi.org/10.18653/v1/P17-1021
2016, the 26th International Conference on Computational Linguistics: Technical [14] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-term Memory. Neural
Papers. The COLING 2016 Organizing Committee, Osaka, Japan, 2503–2514. computation 9 (12 1997), 1735–80. https://doi.org/10.1162/neco.1997.9.8.1735
https://www.aclweb.org/anthology/C16-1236 [15] Jaeyoung Kim, Sion Jang, Sungchul Choi, and Eunjeong Lucy Park. 2018. Text
[3] Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic Classification using Capsules. Neurocomputing 376 (2018), 214–221.
Parsing on Freebase from Question-Answer Pairs. In Proceedings of the 2013 [16] Yann Lecun, Patrick Haffner, and Y. Bengio. 2000. Object Recognition with
Conference on Empirical Methods in Natural Language Processing. Association Gradient-Based Learning. (08 2000).
for Computational Linguistics, Seattle, Washington, USA, 1533–1544. https: [17] Denis Lukovnikov, Asja Fischer, and Jens Lehmann. 2019. Pretrained Transform-
//www.aclweb.org/anthology/D13-1160 ers for Simple Question Answering over Knowledge Graphs. In The Semantic
[4] Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic Web – ISWC 2019, Chiara Ghidini, Olaf Hartig, Maria Maleshkova, Vojtěch Svátek,
Parsing on Freebase from Question-Answer Pairs. In Proceedings of the 2013 Isabel Cruz, Aidan Hogan, Jie Song, Maxime Lefrançois, and Fabien Gandon
Conference on Empirical Methods in Natural Language Processing. Association (Eds.). Springer International Publishing, Cham, 470–486.
for Computational Linguistics, Seattle, Washington, USA, 1533–1544. https: [18] Denis Lukovnikov, Asja Fischer, Jens Lehmann, and Sören Auer. 2017. Neu-
//www.aclweb.org/anthology/D13-1160 ral Network-based Question Answering over Knowledge Graphs on Word and
[5] Jonathan Berant and Percy Liang. 2014. Semantic Parsing via Paraphrasing. Character Level. https://doi.org/10.1145/3038912.3052675
In Proceedings of the 52nd Annual Meeting of the Association for Computational [19] Salman Mohammed, Peng Shi, and Jimmy Lin. 2018. Strong Baselines for Simple
Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Question Answering over Knowledge Graphs with and without Neural Networks.
Baltimore, Maryland, 1415–1425. https://doi.org/10.3115/v1/P14-1133 In Proceedings of the 2018 Conference of the North American Chapter of the Associa-
[6] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. tion for Computational Linguistics: Human Language Technologies, Volume 2 (Short
2008. Freebase: a collaboratively created graph database for structuring human Papers). Association for Computational Linguistics, New Orleans, Louisiana,
knowledge. In In SIGMOD Conference. 1247–1250. 291–296. https://doi.org/10.18653/v1/N18-2047
[7] Antoine Bordes, Sumit Chopra, and Jason Weston. 2014. Question Answering [20] Michael Petrochuk and Luke Zettlemoyer. 2018. SimpleQuestions Nearly Solved:
with Subgraph Embeddings. In Proceedings of the 2014 Conference on Empirical A New Upperbound and Baseline Approach. In Proceedings of the 2018 Conference
on Empirical Methods in Natural Language Processing. Association for Computa-
Methods in Natural Language Processing (EMNLP). Association for Computational
tional Linguistics, Brussels, Belgium, 554–558. https://doi.org/10.18653/v1/D18-
Linguistics, Doha, Qatar, 615–620. https://doi.org/10.3115/v1/D14-1067
1051
[8] Antoine Bordes, Sumit Chopra, and Jason Weston. 2014. Question Answering
[21] Yu Su, Huan Sun, Brian Sadler, Mudhakar Srivatsa, Izzeddin Gür, Zenghui Yan,
with Subgraph Embeddings. In Proceedings of the 2014 Conference on Empirical
and Xifeng Yan. 2016. On Generating Characteristic-rich Question Sets for QA
Methods in Natural Language Processing (EMNLP). Association for Computational
Evaluation. In Proceedings of the 2016 Conference on Empirical Methods in Natural
Linguistics, Doha, Qatar, 615–620. https://doi.org/10.3115/v1/D14-1067
Language Processing. Association for Computational Linguistics, Austin, Texas,
[9] Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston. 2015. Large-
562–572. https://doi.org/10.18653/v1/D16-1054
scale Simple Question Answering with Memory Networks. CoRR abs/1506.02075
[22] Ferhan Türe and Oliver Jojic. 2016. Simple and Effective Question Answering
(2015). arXiv:1506.02075 http://arxiv.org/abs/1506.02075
with Recurrent Neural Networks. CoRR abs/1606.05029 (2016). arXiv:1606.05029
[10] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT:
http://arxiv.org/abs/1606.05029
Pre-training of Deep Bidirectional Transformers for Language Understanding.
[23] Wen-tau Yih, Xiaodong He, and Christopher Meek. 2014. Semantic Parsing for
CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805
Single-Relation Question Answering. In Proceedings of the 52nd Annual Meeting of
[11] Changshun Du and Lei Huang. 2018. Text Classification Research with Attention-
the Association for Computational Linguistics (Volume 2: Short Papers). Association
based Recurrent Neural Networks. International Journal of Computers Communi-
for Computational Linguistics, Baltimore, Maryland, 643–648. https://doi.org/10.
cations Control 13 (02 2018), 50. https://doi.org/10.15837/ijccc.2018.1.3142
3115/v1/P14-2105
[12] Vishal Gupta, Manoj Chinnakotla, and Manish Shrivastava. 2018. Retrieve and
[24] Mo Yu, Wenpeng Yin, Kazi Saidul Hasan, Cicero dos Santos, Bing Xiang, and
Re-rank: A Simple and Effective IR Approach to Simple Question Answering
Bowen Zhou. 2017. Improved Neural Relation Detection for Knowledge Base
over Knowledge Graphs. In Proceedings of the First Workshop on Fact Extraction
Question Answering. In Proceedings of the 55th Annual Meeting of the Association
and VERification (FEVER). Association for Computational Linguistics, Brussels,
for Computational Linguistics (Volume 1: Long Papers). Association for Computa-
Belgium, 22–27. https://doi.org/10.18653/v1/W18-5504
tional Linguistics, Vancouver, Canada, 571–581. https://doi.org/10.18653/v1/P17-
1053