Improving Knowledge Base Question Answering
   with Question Understanding Augment

                           Peiyun Wu and Xiaowang Zhang

     College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
                         {wupeiyun,xiaowangzhang}@tju.edu.cn ??


         Abstract. The basement of knowledge base question answering (KBQA)
         is to understand the given question and extract the meaning from it. Ex-
         isting works largely focus on generating query graphs to represent the
         semantics of the question while ignoring understand the real meaning of
         it. To augment question understanding, in this paper, we leverage rich
         external linguistic knowledge to enhance question semantics. First, we in-
         tegrate the sememe and gloss information into words, where sememe (the
         minimum semantic units of word meanings) and gloss (sense definition)
         are used to disambiguate the word sense and enrich questions informa-
         tion. Moreover, we present a co-attention network to build co-dependent
         representations for the sememe and gloss. Experiments evaluated on two
         data sets show that our model outperforms existing approaches.


1      Introduction
Semantic parsing is an important approach to KBQA, which constructs a query
structure (called query graph) that represents the semantics of questions. Seman-
tic parsing based approaches effectively transform questions into logical forms
where the reliability of logical forms can ensure the correctness of answering
questions. The success of semantics parsing lies in representing the semantics of
questions to better capture users’ intention.
    However, in recent years, many semantic parsing approaches focus on com-
plex query graph generation and re-ranking[1, 2, 5] with paying little attention
to understanding the meanings of questions accurately. They aim to leverage
a ranking model to score and find the best query graph. As a result, exist-
ing works without processing ambiguous questions cannot always perform query
graph ranking better.
    In this paper, we propose an augmented question representation method by
leveraging the sememe and gloss information. Specifically, we take advantage
of the gloss information from WordNet and the sememe [3] information from
Hownet into word embeddings from the given question. A word may have mul-
tiple senses and a sense consists of several sememes and a gloss. To highlight
important information in the sememe and gloss, we present a co-attention net-
work to generate better representations.
??
     Copyright 2020 for this paper by its authors. Use permitted under Creative Com-
     mons License Attribution 4.0 International (CC BY 4.0).
                                                                              cos similarity


                                                                                                                                     pooling
       RGCN
        layer                                   ...                                                                                              ...
                                                                                                                                       Relation2

                                                                                                                +                    Relation1
      Question：where         was    president                    from?
                                                                                                                          avg

                                       Co-Attention

                         g                                  s
                       U                                U                                            Head of
                                                                                                   government
                                                                                                                     head of government

                                                                               context
          1.the person who holds the
         office of head of state of the
          United States government
                                                          president           president                               president
        2.an executive officer of a firm              (state president)      (principal)              ...              (CEO)
                                                                                                                                       Senses
                                                                                                                                       Layer
                or corporation
           3.the head administrative
             officer of a college or
                                                      human/occupation/
                                                      politics/HeadOfStat
                                                                            human/occupation
                                                                            /education/official/
                                                                                                      ...       human/occupation
                                                                                                                /economy/official/    Sememes
                   university                                                                                                          Layer
                      ...                             e/manage/country...      study/teach...                   manage/primary...

                Gloss selector                                                                                         Sense selector


                                                Fig. 1: Diagram for Our Model.

2    Our Approach
Given a question Q = {w1 , . . . , wn }, we generate its candidate query graph set
by method in [1]. We measure the semantic similarities between the question
and each query graph to find the optimal one. For word wi in Q, we denote the
set of its senses as S wi , and E wi = {ei1 , . . . , eik } represents an unordered set of
all sememes contained in wi . We assume for word wi that we have a gloss set
Gwi . Our model is shown in Fig. 1.
Sense Selector: This selector selects each word sense that is most relevant
to the context. We first feed Q into a bi-directional long short-term memory
network (Bi-LSTM) to generate the hidden representation {h1 , . . . , hn }. Then
we generate the context representation of Q as below:
                               n                                 n
                              X                               1X
                 context =        softmax(tanh(h>      i  · (       hi ))) · hi        (1)
                              i=1
                                                              n i=1

   To calculate the correlation between each sememe and context, we use Sigmoid
function to obtain probability value by:
           p(eij |context) = Sigmoid context · e>
                                                  
                                                ij , ∀ j ∈ (1, . . . , k)   (2)
For each sense in S wi , its probability is calculated by the average of all the
sememe probabilities that it contains. In this way, we can select the sense with
                                                                                       wi
the highest probability value under the current context and denote it as Smax             =
{e1 , . . . , ek }, where {e1 , . . . , ek } represents all sememe vectors it contains.

Gloss Selector: This selector selects the gloss that is most relevant to the
selected word sense. Analogously, we use Sigmoid function to obtain the highest
                                                                                   wi
probability of gloss that is most relevant to the average embedding of Smax           . We
                               wi
denote the selected gloss as Gmax = {o1 , . . . , om }, where {o1 , . . . , om } represents
all word embeddings it contains.
Co-Attention: To model the mutual influence and highlight the important
information in the sememe and gloss, we introduce a co-attention network to
dynamically combine the sememe and gloss representation as:
                                      >                                  wi >
               U s = tanh(Smax
                           wi
                               · Gw
                                  max ),
                                    i
                                                      U g = tanh(Gw
                                                                  max · Smax )
                                                                   i
                                                                                                  (3)
                    k
                    X                                             m
                                                                  X
          wi                   s
      SG       =λ         [U       · softmax(U:s )]:j + (1 − λ)         [U g · softmax(U:g )]:j   (4)
                    j=1                                           j=1

where SGwi is the combination representation of the selected sememe and gloss
of word wi , λ is the parameter and λ ∈ [0, 1]. softmax(U:s ) and softmax(U:g ) are
attention weight matrix for softmax function across each column of U s and U g ,
respectively. []:j denotes j-th column of []. Finally, we concatenate SGwi to the
word embedding of wi to enrich the semantics and reduce ambiguity.
Question Representation: We treat {x1 , . . . , xn } as the initial representa-
tions of the given question which has been integrated with sememe and gloss.
To further augment the contextual embeddings of the question, we parse the
question into its syntactic dependencies graph DG and adopt relational graph
convolutional network (RGCN) to digest this structural information:
                                                                
                (l+1)
                               X X 1                 (l) (l) (l)
              xi      = ReLU                 W (l) x + W0 xi              (5)
                                      r
                                        |Nir | r j
                                          r∈R j∈Ni

Here R is a set of dependency-relation. l is l-th layer, Nir is the set of all r-
neighbors of i-th node in DG . Note that W0 and Wr are weighted matrixes.
Finally, we apply a pooling operation after the last RGCN layer to get the
representation of the question.
Relation Representation We represent relations from different granularity in
a query graph. For each relation, we take its relation-level and word-level repre-
sentations into consideration. The word-level relation is calculated by its average
word embeddings. The relation-level representation is the vector of unique token
of relation name. Then, each relation is represented by the sum operation and we
perform max pooling over relations to obtain the final relation representation.

3   Experiments and Evaluations
We use Wikidata as our KB and conduct on two data sets, namely, WebQSP-
WD(WSPWD)[1] and QALD-7 (Task 4, English), both support for Wikidata.
We use F1-score as metrics, where all results are macro-averaged scores.
   By Table 1, we show that our model outperforms all datasets. Our model
achieves 54.2%, 23.3%, 8.9%, 11.9% higher F1-score compared to STAGG, HR-
BiLSTM, GGNN, Slot-Matching on WSPWD. Analogously, we achieve 59.3%,
45.7%, 39.1%,21.7% higher F1-score on QALD-7. We observe that if we only inte-
grate sememe information “+sememe” or gloss information “+gloss”, our model
performs worse but still keep competitive. We can conclude the effectiveness of
our augmented question representation with sememe and gloss integration.
                                               Table 1: Overall Average Results over
                                               Wikidata
                                                      Model          WSPWD QALD-7
                                                  STAGG(2015) [2]     0.1828 0.1861
                                               HR-BiLSTM(2017) [5] 0.2287 0.2035
                                                   GGNN(2018)[1]      0.2588 0.2131
                                               Slot-Matching(2019)[4] 0.2519 0.2436
                                                     +sememe          0.2459 0.2546
                                                       +gloss         0.2597 0.2743
                                                        Our           0.2819 0.2965
Fig. 2: The number of relations to find cor-
rect answers


    To measure the performance across questions of different complexity, we
break down the performance by the number of relations that are needed to
find the correct answer on WSPWD, and results are shown in Fig. 2, we can see
that our model is effective in dealing with different complexity questions.

4    Conclusion
In this paper, we augment question understanding to improve KBQA. The se-
meme and gloss information can benefit from each other and enhance question
semantics. In this way, our approach provides a new method of usage of exter-
nal knowledge in question representation. In future work, we are interested in
extending our model for more complex practical questions.
5    Acknowledgments
This work is supported by the National Key Research and Development Program
of China (2017YFC0908401) and the National Natural Science Foundation of
China (61972455). Xiaowang Zhang is supported by the Peiyang Young Scholars
in Tianjin University (2019XRX-0032).

References
1. Sorokin, D., Gurevych, I.: Modeling semantics with gated graph neural networks
   for knowledge base question answering. In: COLING’2018, pp.3306–3317.
2. Yih, W., Chang, M., He, X., Gao, J.: Semantic parsing via staged query graph
   generation: question answering with knowledge base. In: ACL’2015, pp.1321–1331.
3. Bloomfield, L.: A set of postulates for the science of language. Language, 1926, 2(3):
   153-164.
4. Maheshwari, G., Trivedi, P., Lukovnikov, D., Chakraborty, N., Fischer, A.,
   Lehmann, J. (2019). Learning to rank query graphs for complex question answering
   over knowledge graphs. In: ISWC’19, pp.487-504.
5. Yu, M., Yin, W., Hasan, K.S., Santos, C.N., Xiang,B., Zhou,B.: Improved neural
   relation detection for knowledge base question answering. In: ACL’2017, pp.571–
   581.