=Paper=
{{Paper
|id=Vol-2666/KDD_Converse20_paper_6
|storemode=property
|title=Improved Question Answering using Domain Prediction
|pdfUrl=https://ceur-ws.org/Vol-2666/KDD_Converse20_paper_6.pdf
|volume=Vol-2666
|authors=Himani Srivastava,Prerna Khurana,Saurabh Srivastava,Vaibhav Varshney,Lovekesh Vig,Puneet Agarwal,Gautam Shroff
|dblpUrl=https://dblp.org/rec/conf/kdd/SrivastavaKSVVA20
}}
==Improved Question Answering using Domain Prediction==
<pdf width="1500px">https://ceur-ws.org/Vol-2666/KDD_Converse20_paper_6.pdf</pdf>
<pre>
           Improved Question Answering using Domain Prediction
     Himani Srivastava, Prerna Khurana, Saurabh Srivastava, Vaibhav Varshney, Lovekesh Vig, Puneet
                                        Agarwal, Gautam Shroff
            {srivastava.himani,prerna.khurana2,sriv.saurabh,varshney.v,lovekesh.vig,puneet.a,gautam.shroff}@tcs.com
                                                          TCS Research
                                                        New Delhi, India
ABSTRACT                                                                                    indirectly predicts the relation most relevant to a given question.
Question Answering over Knowledge Graphs has mainly utilised                                But this approach was observed to have failed at times when words
the mentioned entity and the relation to predict the answer. How-                           of candidate relations are highly similar to the words present in the
ever, a key piece of contextual information that is missing in these                        question (discussed in section 8 Type-A). As a result, this tends to
approaches is the knowledge of the broad domain (such as sports                             mislead the model to predict the relation incorrectly. We, therefore,
or music) to which the answer belongs. The current paper proposes                           propose that if the broad domain of the expected answer is also
to infer the domain of the answer via a pre-trained BERT [10] Clas-                         input to the model, the model tends to select relevant relations
sification Model, and utilize the inferred domain as an additional                          improving the relation prediction model and this results in state of
input to yield state-of-the-art performance for single-relation (Sim-                       the art performance. Consider this question from SimpleQuestions
pleQuestions) and multi-relation (WebQSP) Question Answering                                Dataset, i.e., “who is a production company that performed Othello”.
bench-marks. We employ a triple input Siamese network archi-                                Here we first extract the mentioned entity “Othello”, using a model
tecture that learns to predict the semantic similarity between the                          (referred to as Entity Tagging Model), and identify all the relations
question, the inferred domain, and the relation.                                            of this entity in the knowledge graph as candidate relations. Consid-
                                                                                            ering two of the candidate relations as (“theater/ theater_production/
KEYWORDS                                                                                    producing_company”, “film/ film/ production_companies”). A model
                                                                                            that takes only question and the candidate relation as input pre-
Question answering over Knowledge Graph, Triple Input Siamese
                                                                                            dicts “film/ film/ production_companies” as the correct relationship,
Network, Domain Prediction
                                                                                            which is actually wrong. However, if we also input the domain
ACM Reference Format:                                                                       of the answer, “theater”, it helps the model to score the candidate
Himani Srivastava, Prerna Khurana, Saurabh Srivastava, Vaibhav Varshney,                    relations appropriately and predicts “theater/ theater_production/
Lovekesh Vig, Puneet Agarwal, Gautam Shroff. 2020. Improved Question
                                                                                            producing_company” as the correct relation. The main contribu-
Answering using Domain Prediction. In Proceedings of KDD Workshop on
                                                                                            tions of this paper are : 1) We demonstrate that a metric learning
Conversational Systems Towards Mainstream Adoption (KDD Converse’20).
ACM, New York, NY, USA, 6 pages.                                                            similarity scoring network along with the injected domain knowl-
                                                                                            edge, enhances Question Answering over the Knowledge Graph. 2)
                                                                                            We release the SimpleQuestions and WebQSP datasets 1 created for
                                                                                            our experiments to carry out further research.
1     INTRODUCTION                                                                             Terms mentioned entity, and subject name mean the same thing,
Question answering (QA) over large scale knowledge graphs has                               and may be used interchangeably.
been the focus of much NLP research and in this paper, we focus
on natural language questions that are taken from the SimpleQues-                           2    PROBLEM DESCRIPTION
tions [7] and WebQSP [3] Datasets, that contain tuples of the form                          We assume that a background Knowledge Graph comprising of
(subject, relation, object, question). We tackle the problem of QA                          a set of triples (𝑇 = {𝑡 1, ..., 𝑡𝑛 }) is available, here each triple 𝑡𝑖
in 3 steps : 1) Extraction of mentioned entities from the question                          is represented as a set of three terms {Subject, Relation, Object},
and linking to entities in the Knowledge Graph. 2) Detecting the                            also referred to as {𝑆, 𝑅, 𝑂}. We are concerned with natural language
domain of the Object (answer). 3) Prediction of the most relevant                           questions (𝑞𝑖 ∈ 𝑄), which mention an entity of the knowledge graph
relation for answering the question.                                                        (𝑆). We also assume that such questions can be answered using
   Prior deep learning approaches use relation as a class label only                        single triple (for single relation questions) or multiple triples (for
and hence don’t capture the semantic level correlation between the                          multi-relation questions) of the knowledge graph. For this example,
question and the relation. To overcome this limitation, we propose                          the ground truth triple comprises of subject 𝑆𝑖 =“Othello”, relation
a Triple Input Siamese Metric Learning Model (TISML), that scores                           𝑅𝑖 =“theater/ theater_production/ producing_company”, and object
similarity between questions and candidate relations, and thereby                           𝑂𝑖 =“National Theatre of Great Britain”. In this context, the objective
                                                                                            of the Question Answering task is to retrieve the appropriate answer
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed       (“National Theatre of Great Britain”) from the knowledge graph.
for profit or commercial advantage and that copies bear this notice and the full citation      We formulate this problem as a supervised learning task. We
on the first page. Copyrights for third-party components of this work must be honored.      assume that a set of questions 𝑄 𝑆 = {𝑞 1, ..., 𝑞𝑚 } and corresponding
For all other uses, contact the owner/author(s).
KDD Converse’20, August 2020,
                                                                                            1 https://drive.google.com/drive/folders/1vkyeg9JEIZBCkQrezguMwwgJDmje6Lq_
© 2020 Copyright held by the owner/author(s).
                                                                                            ?usp=sharing


Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
KDD Converse’20, August 2020,                      Himani Srivastava, Prerna Khurana, Saurabh Srivastava, Vaibhav Varshney, Lovekesh Vig, Puneet Agarwal, Gautam Shroff


ground truth triples 𝑇𝑆 = {𝑡 1, ..., 𝑡𝑚 } (with 𝑡𝑖 = (𝑠𝑖𝑆 , 𝑟𝑖𝑆 , 𝑜𝑖𝑆 )) are           the sub-type of the subject and place_of_birth is the property or
available as training data. The underlying knowledge graph for our                     the attribute of that person. So for extracting the domain of the
work is Freebase [6]. For Simple Questions dataset we have used a                      subject for triple (S, R, O), we have to look at the first component
smaller version of Freebase i.e., FB2M[8] and for WebQSP dataset                       of the relation (R). Since we are tagging the domain of the “an-
we have used the full Freebase.                                                        swer" to every question in the dataset, we search for the domains
                                                                                       of the “object" in a (subject, relation, object, question) tuple by find-
3     RELATED WORK                                                                     ing reverse relation between the subject and object. The process
Mapping a natural language question to a knowledge graph is a well                     of domain data creation is depicted in figure 1. Questions which
studied task and a significant amount of work has been done on this                    are tagged by single domain, we refer to them as unambiguous
topic over the last two decades [[4], [23], [7], [19]]. As per recent                  questions and questions tagged by None 3 or multiple domains are
trends, answering natural language queries via knowledge graphs                        referred as ambiguous questions. In SimpleQuestions Dataset there
follow two broad approaches namely, "Semantic Parsing based" and                       were 57,421 unambiguous questions and 16,432 (None) and 2,057
"Information Extraction based" which are further explained below.                      (multiple domain) ambiguous questions.
                                                                                          Domain Tagging for ambiguous questions: In order to tag
     • Semantic Parsing based : These approaches [[4], [5], [13]]
                                                                                       such questions with the appropriate domain, we referred to the
       involve expressing natural language queries into SPARQL
                                                                                       tagged domains of unambiguous questions, in the following steps :
       queries (logical forms) and then project these queries to
       a knowledge base to extract relevant facts. The advent of                           (1) Create a One to One mapping between relations and do-
       deep learning approaches, which captured the semantics of                               mains:
       a natural language query helped to further improve the per-                             • Create a mapping table between a relation and a domain
       formance of these systems. The semantics captured through                                 for every unambiguous question.
       these deep learning approaches are encoded in a fixed-length                            • Select the most frequently occurring domain for a relation
       vector and are projected on a knowledge graph representa-                                 among all the tagged domains.
       tion to extract relevant facts.                                                         • update the mapping table with a unique domain for a
     • Information Extraction based : Work by [22]2 claimed                                      relation
       that by using a simple RNN, they are able to obtain bet-                            (2) Tag domain to ambiguous questions from the mapping table.
       ter results for both entity tagging and relation detection on
       SimpleQuestions dataset. Another work by [24] used Hierar-
       chical BiLSTM based Siamese network for relation prediction
       and claimed that relation detection task has a direct impact
       on Question Answering task on both the datsets. Using at-
       tention with RNN along with a similarity matrix based CNN
       has been able to achieve superior results in [1]. [20] used a
       BiLSTM-CRF tagger followed by a BiLSTM to capture men-
       tion detection and relation classification respectively. [17]
       were among the first ones to apply BERT [10] for this task but
       did not get any improvement over the previous state-of-art.
       [12] proposed a similar approach to ours using similarity-
       based network for relation detection, however, they have
       removed about 2% of data from the test set. To the best of
       our knowledge, none of the cited approaches utilize domain
       information to predict relations.                                                                 Figure 1: Domain Data Creation

4     DATASET DESCRIPTION                                                                 Siamese Data Creation : To create a (question, relation, do-
SimpleQuestions dataset [9] is split into the 75,910 train, 10,845 dev,                main) triplet for input into the TISML model, for every question
21,686 test sets and WebQSP dataset is taken from [24] has 3,116 in                    q, we extract all the candidate relations for the mentioned entity
train, 1,649 in test and 623 in dev set . We below explain our method                  from the Knowledge Graph and the corresponding inferred domain.
for extracting domain information from the Knowledge Graph and                         We then label the triplet consisting the actual relation as 1 and the
creating an input dataset of (Question, Relation, Domain) triples                      remaining triplets as 0. Resulting we have 586,953 in train, 82,864
for the TISML model.                                                                   in dev and 388,695 in test for SimpleQuestions dataset and 77,792
   Domain Data Creation : To extract domain information from                           in train, 19,449 in dev and 49,912 in test for WebQSP dataset.
the Freebase Knowledge Graph, we observe that the relation of a
question represents three pieces of information. E.g., given a rela-                   5    PROPOSED APPROACH
tion people/person/ place_of_birth in a triple (S, R, O) of Freebase,                  We present a schematic diagram of the proposed approach in figure
people represents the domain of the subject, person represents                         (2). Here, given a question 𝑞 with ground truth triples as (𝑠, 𝑟, 𝑜),
                                                                                       we first find the mentioned entity or the subject of the question via
2 they have reported 86.8% accuracy but we, [19], and [20] have not been able to
replicate their results                                                                3 Questions tagged by “None" indicate no domain has been tagged
Improved Question Answering using Domain Prediction                                                                                                                KDD Converse’20, August 2020,


                                                                                Where was Sasha Vujacic born?


                                                                                    Entity Detection(BiLSTM-CRF)
                                                      FreeBase
                                                      knowledge Graph
                                                                                                                                          S1, S2 are
                                                                                           Sasha Vujacic                                  candidate
                                                                           S1                                        S2                   subjects
                                   Formatted                             m.07f3jg                               m.027qk4                                Domain
                                   Question                  R1                                                               R2                       Detection
                                                                                                                                         R1, R2 are
                                     Model                                                                                               candidate       Model
                                                              people/person/place_of_birth              people/person/gender
                                                                                                                                         relations


                                                                                 people/person/place_of_birth
                                                                                 people/person/gender                      Set of candidate
                                                                                 .                                             relations
                                                                                 .
                                                      Siamese Metric
                                                      Learning Network

                                                           Where was <e> born? people/person/place_of_birth location
                                                           Where was <e> born? people/person/gender         location
                                                           .
                                                           .


                                                                                    Similarity Scores [SS1,SS2]


Figure 2: This figure depicts the overall approach for Question Answering task over Knowledge Graph. Given a question, first,
we detect its domain using Domain Prediction Model, and extract the mentioned entity using Entity Tagging Model. The
extracted entity is then searched in the knowledge graph and all the relevant subjects (S1, S2) are considered as candidate
subjects, and all of the relations connected to these candidate subjects are candidate relations (R1, R2). The original question
is converted into formatted question by replacing the entity mention with a token <e>. Our TISML Model (Bottom), calculates
the similarity score between the inputs - question, inferred domain, and the relation candidates.


an Entity Tagging Model. From the identified entity, we obtain all                                                   model is used by [20]5 , which is the state-of-the-art algo-
the candidate subjects S = {𝑆 1, ..., 𝑆𝑝 } (in figure 2, S = {𝑆 1, 𝑆 2 }) and                                        rithm in Question Answering task over Knowledge Graph
also extract all the candidate relations R = {𝑅1, ..., 𝑅𝑞 } connected                                                and is used as the baseline algorithm for comparing our re-
to 𝑆 from the Knowledge Graph (in figure 2, R = {𝑅1, 𝑅2 }). We also                                                  sults, hence, we also used the same model for our task.
input this question q to another model which predicts the domain
of the expected answer for that question, this model is hereafter                                               (2) Domain Prediction Model: It is a supervised classification
referred to as Domain Prediction Model. Further, the question that                                                  task, where the input is a question q and the output is the
is input to the TISML Model is modified by inserting a string < 𝑒 >                                                 predicted domain of the answer type of the question. For
in place of the mentioned entity and yielding a formatted question                                                  this task, we use a pre-trained BERT Large [10] classification
q’. This was done to ensure that the Siamese Model is agnostic to                                                   model and fine tune on SimpleQuestions dataset by adding
the specific mentioned entity in the question while predicting the                                                  an additional fully connected layer on top of BERT and learn
triplet score and could also give the positional information to the                                                 the weights of this layer to predict the correct domain for
neural networks [24].                                                                                               the question. We fine-tune the model for 5 epochs and keep
                                                                                                                    sequence length as 40 and batch size as 64. This model out-
6    MODEL DESCRIPTION                                                                                              performs other classification models, namely, LSTM [14],
In this section, we discuss about all the three individual models in                                                CNN [16], BiLSTM with attention [11], Capsule Network
detail.                                                                                                             [15], results for domain prediction are presented in table 1.
                                                                                                                    This is because of the fact that BERT is trained on a huge cor-
    (1) Entity Tagging Model: It is a sequence labelling task (IO                                                   pus (Wikidata (2.5 billion words), BookCorpus (800 million
        tagging) which uses a BiLSTM and a Conditional Random                                                       words)) and can thus leverage the knowledge it has learned
        Field layer [20] for detecting the mentioned entity in the                                                  which results in better prediction of the domains.
        question. K entity candidates are predicted using the top-K
        Viterbi algorithm. Further, candidate aliases are extracted                                             (3) Triple Input Siamese Metric Learning Model: In order
        from the Freebase SQL table by querying it using predicted                                                  to select the correct relation for the question q, we use a
        K candidates. Candidates having minimum Levenshtein dis-                                                    TISML Model 6 (refer to figure 3) which captures the se-
        tance (between aliases and the detected mentioned entity)                                                   mantics between all the inputs (question, relation, domain).
        will be the predicted subject names and their corresponding                                                 This network consists of 3 different embedding generator
        machine ids will be retrieved as candidate machine ids4 . This                                              networks - Glove Embedding Layer, 1D-CNN Layer and an
4 While                                                                                                    5 https://github.com/PetrochukM/Simple-QA-EMNLP-2018
         creating SQL tables for Freebase, along every machine id (MId) dif-
ferent string aliases are mapped. for example:(MId| alias| alias-normalized-                               6 Our network is inspired from https://www.linkedin.com/pulse/duplicate-quora-
punctuation| alias-normalized-punctuation-stem| alias-preprocessed)::: (0c1n99q |                          question-abhishek-thakur/, this model uses dual input, we add an extra input, i.e., the
gulliver|gulliver|gulliv|gulliver)                                                                         inferred domain
KDD Converse’20, August 2020,                        Himani Srivastava, Prerna Khurana, Saurabh Srivastava, Vaibhav Varshney, Lovekesh Vig, Puneet Agarwal, Gautam Shroff


        LSTM Layer. Each input is passed through these networks
        which further generate their respective embeddings. These
        embeddings are then concatenated through a Merge layer
        followed by multiple dense layers. The final embedding com-
        puted is used to calculate a score between 0 to 1, that indi-
        cates whether the triple has correct relation or not.

                 Table 1: Domain Prediction Result

                    Approach                     Accuracy
                    LSTM                               86
                    CNN                                89
                    Bi-LSTM+Attention                  91
                    Capsule                            91
                    Fine Tuned Bert                 93.16
                                                                                                     Figure 3: Network Architecture of TISML

Table 2: : Accuracy on the SimpleQuestions (SQ) and WebQSP
(WQ) datasets (*Average Accuracy over 5 runs)

                                                                                         the relation and question provides better accuracy than the other
    Approach                                                           SQ       WQ
                                                                                         baseline approaches. We have shown in Table 3 a few examples
    HR-BiLSTM & CNN &BiLSTM-CRF[24]                                     77      63.9
                                                                                         from SimpleQuestions dataset in case of which the relations were
    GRU [18]                                                          71.2         -
                                                                                         predicted wrongly by the baseline approach, however, with our ap-
    BiGRU-CRF & BiGRU[19]                                             73.7         -
                                                                                         proach such questions could be answered correctly. In our analysis,
    BiLSTM & BiGRU[19]                                                74.9         -     we found out that such errors were improved in our approach, due
    Attentive RNN & Similarity Matrix based CNN[1]                    76.8         -     to 2 main reasons, which are -
    BiLSTM-CRF & BiLSTM[20]                                           78.1         -
    Our Approach (Without Domain)*                                      76        63         (1) Type-A (Improvement due to Domain Prediction Model):
    Our Approach (TISML)*                                            79.16      65.3             In the question "What high school is located in Hugo", the
                                                                                                 baseline model predicts the relation as location/ location/ con-
                                                                                                 tainedby which is not correct, this could be because of the
7     EXPERIMENTAL SETUP                                                                         word "located" in the query or due to similar pattern ques-
Hyperparameters for TISML Model include sequence length as                                       tions belonging to this relation, and hence their model, which
40, batch size as 384, all dropouts (Variational, Recurrent) as 0.2.                             is a relation classification model, predicts a relation contain-
CNN block uses 64 filters each of length 5, dense layers have 300                                ing "location". However, our model predicts the domain of
                                                                                                 this question as "education", since the question is essentially
units along with PRelu and batch normalization layers. It has 11M
                                                                                                 asking about the "high school" which belongs to the edu-
total parameters, with 6.4M as trainable parameters. The word
                                                                                                 cation domain. This information pushes the Triple Input
embeddings are initialized with Glove using 300 dimension vectors
                                                                                                 Siamese Metric Learning Model to select the relation similar
and we use Adam as optimizer. Parameters were selected on the
                                                                                                 to the education domain which is education/ school_category/
basis of validation accuracy. We run our experiments on a 12 GB
                                                                                                 schools_of_this_kind.
Nvidia GPU. Average runtime for 1 epoch on SimpleQuestions is
300 s and for WebQSP is 130 s.
                                                                                             (2) Type-B (Improvement due to Similarity Model): Another
8     RESULTS                                                                                    type of error that is improved by our approach belongs to
                                                                                                 the case when the relation predicted by both the approaches
We have compared our approach with previous Deep Learning ap-                                    is from the same domain, but still different due to varying
proaches mentioned in Section 3. Evaluation metrics used are same                                sub-domain. For instance, for the question "what is a chinese
from the baseline approach [20] for SimpleQuestions dataset (i.e.                                album", the baseline model detects music/ release_track/ re-
accuracy). For WebQSP dataset evalution is done similar to [24],                                 lease, however our model predicts music/ album_release_type/
where Top-1 accuracy is reported for answer prediction, i.e., among                              albums as the correct relation. This is because of the fact that
multiple predicted relations we pick the top scored relation and                                 our model exploits the semantic level correlation between
use it for answer prediction7 . We have also compared with our ap-                               the question and the relation and is able to match the two at
proach but without using domain information. According to Table                                  a literal level, which can be seen from the fact that there is a
2, it can be seen that augmenting domain knowledge along with                                    presence the word "album" in both the question as well as
7 For Web Question dataset only 64 questions in test data have multiple answers rest             the relation.
others have multiple relations with single tagged answers
Improved Question Answering using Domain Prediction                                                                                                                 KDD Converse’20, August 2020,


                  Table 3: Analysis of improvement due to domain prediction and similarity model over baseline model

                                 Question                                 Actual Relation Baseline Model Prediction Triple Input Siamese Metric Learning Prediction Predicted Domain
Type-A What high school is located in Hugo education/school_category/schools_of_this_kind location/location/containedby  education/school_category/schools_of_this_kind      education
        whats a musical film from pakistan             film/film_genre/films_in_this_genre          music/genre/albums               film/film_genre/films_in_this_genre          film
Type-B            what is a chinese album                music/album_release_type/albums     music/release_track/release               music/album_release_type/albums          music
             what is the lyrics written by?                    music/lyricist/lyrics_written  book/book_subject/works                        music/lyricist/lyrics_written      music


                                                                                 Table 4: Error Analysis

Error Type                       Question                              Actual Relation                        Predicted Relation                   Actual Domain Predicted Domain Misclassification
Category-1                  Whats a track from “dawn escapes"              music/release/track_list                         music/release/track            music             music              104
                       Whats a track from the release 9 seconds                music/release/track                      music/release/track_list           music             music               14
                   Which album was "brothers sisters" listed on        music/release_track/release                     music/recording/release             music             music              112
             which albums contain the track “song: starcandy" ?            music/recording/release                  music/release_track/release            music             music               10
Category-2               What is Andrew deemers profession ?           common/topic/notable_type                      people/person/profession           common              people              52
                         what is the occupation of hans krása ?          people/person/profession                  common/topic/notable_type               people           common               14
                                      is deena a male or female base/givennames/given_name/gender                        people/person/gender                base            people             106
Category-3        who is an alumni involved in IT(HEAD=nan)                common/topic/subjects people/profession/people_with_this_profession           common              people             386


9    ERROR ANALYSIS                                                                                              thus the Triple Input Siamese Metric Learning Model pre-
While analysing the errors of the test set, we observed that most                                                dicts people/ person/ profession, whereas the ground relation
errors can be broadly classified into 4 categories. These errors are                                             of this question is common/ topic/ subjects while the ground
discussed below, and reported in Table 4, examples have been taken                                               domain is "common".
from SimpleQuestions Dataset:
                                                                                                           (3) Category-3 Error: There are 386 questions in the test set
     • Category-1 Error (Error due to Triple Input Siamese Metric                                              that do not contain a head_entity. Previous work done by [12]
       Learning Model)                                                                                         have removed such questions from the evaluation of their
     • Category-2 Error (Error due to Domain Prediction Model)                                                 model, we, however, did not remove these questions from
     • Category-3 Error (Unanswerable Questions)                                                               the dataset. For example, the question "Who is an alumni
     • Category-4 Error (Error due to Entity Tagging Model)                                                    involved in IT" does not contain a mentioned entity, which
                                                                                                               is a data creation error and cannot be solved and predicted
                                                                                                               as None.
    (1) Category-1 Error: There are plenty of erroneous questions
        that fall under this category. Even though the Domain Pre-
                                                                                                           (4) Category-4 Error: These errors occur because Entity Tag-
        diction Model predicts the domain correctly, these errors
                                                                                                               ging Model is not able to identify the subject present in the
        occur due to the highly ambiguous structure of the relations
                                                                                                               question correctly which results into a selection of wrong
        and their tagged questions. To illustrate, the query, "Whats
                                                                                                               candidate relations set from the knowledge graph. For exam-
        a track from dawn escapes" has music/ release/ track_list as
                                                                                                               ple, given a question “what’s the name of a popular Japanese
        the actual relation while the predicted relation is music/
                                                                                                               to Portuguese dictionary", has the ground truth mentioned
        release/ track. Whereas, another question "What’s a track
                                                                                                               entity as “dictionary”, however, the Entity Tagging Model
        from the release 9 seconds" has music/ release/ track as the
                                                                                                               predicts “Portuguese” as the subject, which leads to a wrong
        actual relation while the predicted relation is music/ release/
                                                                                                               set of candidate relations and hence results in wrong answer
        track_list, which clearly confuses the Triple Input Siamese
                                                                                                               prediction.
        Metric Learning Model as the pattern of the questions are
        identical in nature and the relations are also very similar.

    (2) Category-2 Error: These type of errors occur because the
        domain of the question given in Knowledge Graph is vague.                                      10       CONCLUSION
        There are certain domains in Freebase which do not have a                                      In this paper, we propose the use of domain information as an
        clear definition, for instance, domains such as - Base, Com-                                   additional information for predicting the correct relation for both
        mon, User and Type consists of questions that are similar                                      single relation and multi-relation datasets. Such information is
        to questions from other domains and these type of questions                                    predicted from the question using a Domain Prediction model and
        comprise about 4% of the test data. If we observe questions                                    helps in strengthening the outcome of the TISML Model to select the
        in Table 4 from these domains, it is observed that they do not                                 most appropriate relation for the question. Our proposed approach
        have a common pattern. This misleads the Domain Predic-                                        outperforms previous approaches on Question Answering over
        tion Model which results in incorrect downstream relation                                      Knowledge Graph and achieves a new state-of-the-art results on
        detection and thus the wrong answer. For example, given a                                      SimpleQuestions and WebQSP datasets. For future work we will also
        question "What is Andrew Deemer’s profession", the Domain                                      explore datasets like GraphQuestions [21] and ComplexQuestions
        Prediction Model will predict "people" as the domain and                                       [2] to deal with more aspects of general Question Answering.
KDD Converse’20, August 2020,                        Himani Srivastava, Prerna Khurana, Saurabh Srivastava, Vaibhav Varshney, Lovekesh Vig, Puneet Agarwal, Gautam Shroff


REFERENCES                                                                               [13] Yanchao Hao, Yuanzhe Zhang, Kang Liu, Shizhu He, Zhanyi Liu, Hua Wu, and
 [1] 2018. Question Answering over Freebase via Attentive RNN with Similarity                 Jun Zhao. 2017. An End-to-End Model for Question Answering over Knowledge
     Matrix based CNN. CoRR abs/1804.03317 (2018). arXiv:1804.03317 http://arxiv.             Base with Cross-Attention Combining Global Knowledge. In Proceedings of the
     org/abs/1804.03317 Withdrawn.                                                            55th Annual Meeting of the Association for Computational Linguistics (Volume 1:
 [2] Junwei Bao, Nan Duan, Zhao Yan, Ming Zhou, and Tiejun Zhao. 2016. Constraint-            Long Papers). Association for Computational Linguistics, Vancouver, Canada,
     Based Question Answering with Knowledge Graph. In Proceedings of COLING                  221–231. https://doi.org/10.18653/v1/P17-1021
     2016, the 26th International Conference on Computational Linguistics: Technical     [14] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-term Memory. Neural
     Papers. The COLING 2016 Organizing Committee, Osaka, Japan, 2503–2514.                   computation 9 (12 1997), 1735–80. https://doi.org/10.1162/neco.1997.9.8.1735
     https://www.aclweb.org/anthology/C16-1236                                           [15] Jaeyoung Kim, Sion Jang, Sungchul Choi, and Eunjeong Lucy Park. 2018. Text
 [3] Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic               Classification using Capsules. Neurocomputing 376 (2018), 214–221.
     Parsing on Freebase from Question-Answer Pairs. In Proceedings of the 2013          [16] Yann Lecun, Patrick Haffner, and Y. Bengio. 2000. Object Recognition with
     Conference on Empirical Methods in Natural Language Processing. Association              Gradient-Based Learning. (08 2000).
     for Computational Linguistics, Seattle, Washington, USA, 1533–1544. https:          [17] Denis Lukovnikov, Asja Fischer, and Jens Lehmann. 2019. Pretrained Transform-
     //www.aclweb.org/anthology/D13-1160                                                      ers for Simple Question Answering over Knowledge Graphs. In The Semantic
 [4] Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic               Web – ISWC 2019, Chiara Ghidini, Olaf Hartig, Maria Maleshkova, Vojtěch Svátek,
     Parsing on Freebase from Question-Answer Pairs. In Proceedings of the 2013               Isabel Cruz, Aidan Hogan, Jie Song, Maxime Lefrançois, and Fabien Gandon
     Conference on Empirical Methods in Natural Language Processing. Association              (Eds.). Springer International Publishing, Cham, 470–486.
     for Computational Linguistics, Seattle, Washington, USA, 1533–1544. https:          [18] Denis Lukovnikov, Asja Fischer, Jens Lehmann, and Sören Auer. 2017. Neu-
     //www.aclweb.org/anthology/D13-1160                                                      ral Network-based Question Answering over Knowledge Graphs on Word and
 [5] Jonathan Berant and Percy Liang. 2014. Semantic Parsing via Paraphrasing.                Character Level. https://doi.org/10.1145/3038912.3052675
     In Proceedings of the 52nd Annual Meeting of the Association for Computational      [19] Salman Mohammed, Peng Shi, and Jimmy Lin. 2018. Strong Baselines for Simple
     Linguistics (Volume 1: Long Papers). Association for Computational Linguistics,          Question Answering over Knowledge Graphs with and without Neural Networks.
     Baltimore, Maryland, 1415–1425. https://doi.org/10.3115/v1/P14-1133                      In Proceedings of the 2018 Conference of the North American Chapter of the Associa-
 [6] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor.             tion for Computational Linguistics: Human Language Technologies, Volume 2 (Short
     2008. Freebase: a collaboratively created graph database for structuring human           Papers). Association for Computational Linguistics, New Orleans, Louisiana,
     knowledge. In In SIGMOD Conference. 1247–1250.                                           291–296. https://doi.org/10.18653/v1/N18-2047
 [7] Antoine Bordes, Sumit Chopra, and Jason Weston. 2014. Question Answering            [20] Michael Petrochuk and Luke Zettlemoyer. 2018. SimpleQuestions Nearly Solved:
     with Subgraph Embeddings. In Proceedings of the 2014 Conference on Empirical             A New Upperbound and Baseline Approach. In Proceedings of the 2018 Conference
                                                                                              on Empirical Methods in Natural Language Processing. Association for Computa-
     Methods in Natural Language Processing (EMNLP). Association for Computational
                                                                                              tional Linguistics, Brussels, Belgium, 554–558. https://doi.org/10.18653/v1/D18-
     Linguistics, Doha, Qatar, 615–620. https://doi.org/10.3115/v1/D14-1067
                                                                                              1051
 [8] Antoine Bordes, Sumit Chopra, and Jason Weston. 2014. Question Answering
                                                                                         [21] Yu Su, Huan Sun, Brian Sadler, Mudhakar Srivatsa, Izzeddin Gür, Zenghui Yan,
     with Subgraph Embeddings. In Proceedings of the 2014 Conference on Empirical
                                                                                              and Xifeng Yan. 2016. On Generating Characteristic-rich Question Sets for QA
     Methods in Natural Language Processing (EMNLP). Association for Computational
                                                                                              Evaluation. In Proceedings of the 2016 Conference on Empirical Methods in Natural
     Linguistics, Doha, Qatar, 615–620. https://doi.org/10.3115/v1/D14-1067
                                                                                              Language Processing. Association for Computational Linguistics, Austin, Texas,
 [9] Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston. 2015. Large-
                                                                                              562–572. https://doi.org/10.18653/v1/D16-1054
     scale Simple Question Answering with Memory Networks. CoRR abs/1506.02075
                                                                                         [22] Ferhan Türe and Oliver Jojic. 2016. Simple and Effective Question Answering
     (2015). arXiv:1506.02075 http://arxiv.org/abs/1506.02075
                                                                                              with Recurrent Neural Networks. CoRR abs/1606.05029 (2016). arXiv:1606.05029
[10] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT:
                                                                                              http://arxiv.org/abs/1606.05029
     Pre-training of Deep Bidirectional Transformers for Language Understanding.
                                                                                         [23] Wen-tau Yih, Xiaodong He, and Christopher Meek. 2014. Semantic Parsing for
     CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805
                                                                                              Single-Relation Question Answering. In Proceedings of the 52nd Annual Meeting of
[11] Changshun Du and Lei Huang. 2018. Text Classification Research with Attention-
                                                                                              the Association for Computational Linguistics (Volume 2: Short Papers). Association
     based Recurrent Neural Networks. International Journal of Computers Communi-
                                                                                              for Computational Linguistics, Baltimore, Maryland, 643–648. https://doi.org/10.
     cations Control 13 (02 2018), 50. https://doi.org/10.15837/ijccc.2018.1.3142
                                                                                              3115/v1/P14-2105
[12] Vishal Gupta, Manoj Chinnakotla, and Manish Shrivastava. 2018. Retrieve and
                                                                                         [24] Mo Yu, Wenpeng Yin, Kazi Saidul Hasan, Cicero dos Santos, Bing Xiang, and
     Re-rank: A Simple and Effective IR Approach to Simple Question Answering
                                                                                              Bowen Zhou. 2017. Improved Neural Relation Detection for Knowledge Base
     over Knowledge Graphs. In Proceedings of the First Workshop on Fact Extraction
                                                                                              Question Answering. In Proceedings of the 55th Annual Meeting of the Association
     and VERification (FEVER). Association for Computational Linguistics, Brussels,
                                                                                              for Computational Linguistics (Volume 1: Long Papers). Association for Computa-
     Belgium, 22–27. https://doi.org/10.18653/v1/W18-5504
                                                                                              tional Linguistics, Vancouver, Canada, 571–581. https://doi.org/10.18653/v1/P17-
                                                                                              1053

</pre>