Improving Knowledge Base Question Answering with Question Understanding Augment Peiyun Wu and Xiaowang Zhang College of Intelligence and Computing, Tianjin University, Tianjin 300350, China {wupeiyun,xiaowangzhang}@tju.edu.cn ?? Abstract. The basement of knowledge base question answering (KBQA) is to understand the given question and extract the meaning from it. Ex- isting works largely focus on generating query graphs to represent the semantics of the question while ignoring understand the real meaning of it. To augment question understanding, in this paper, we leverage rich external linguistic knowledge to enhance question semantics. First, we in- tegrate the sememe and gloss information into words, where sememe (the minimum semantic units of word meanings) and gloss (sense definition) are used to disambiguate the word sense and enrich questions informa- tion. Moreover, we present a co-attention network to build co-dependent representations for the sememe and gloss. Experiments evaluated on two data sets show that our model outperforms existing approaches. 1 Introduction Semantic parsing is an important approach to KBQA, which constructs a query structure (called query graph) that represents the semantics of questions. Seman- tic parsing based approaches effectively transform questions into logical forms where the reliability of logical forms can ensure the correctness of answering questions. The success of semantics parsing lies in representing the semantics of questions to better capture users’ intention. However, in recent years, many semantic parsing approaches focus on com- plex query graph generation and re-ranking[1, 2, 5] with paying little attention to understanding the meanings of questions accurately. They aim to leverage a ranking model to score and find the best query graph. As a result, exist- ing works without processing ambiguous questions cannot always perform query graph ranking better. In this paper, we propose an augmented question representation method by leveraging the sememe and gloss information. Specifically, we take advantage of the gloss information from WordNet and the sememe [3] information from Hownet into word embeddings from the given question. A word may have mul- tiple senses and a sense consists of several sememes and a gloss. To highlight important information in the sememe and gloss, we present a co-attention net- work to generate better representations. ?? Copyright 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). cos similarity pooling RGCN layer ... ... Relation2 + Relation1 Question:where was president from? avg Co-Attention g s U U Head of government head of government context 1.the person who holds the office of head of state of the United States government president president president 2.an executive officer of a firm (state president) (principal) ... (CEO) Senses Layer or corporation 3.the head administrative officer of a college or human/occupation/ politics/HeadOfStat human/occupation /education/official/ ... human/occupation /economy/official/ Sememes university Layer ... e/manage/country... study/teach... manage/primary... Gloss selector Sense selector Fig. 1: Diagram for Our Model. 2 Our Approach Given a question Q = {w1 , . . . , wn }, we generate its candidate query graph set by method in [1]. We measure the semantic similarities between the question and each query graph to find the optimal one. For word wi in Q, we denote the set of its senses as S wi , and E wi = {ei1 , . . . , eik } represents an unordered set of all sememes contained in wi . We assume for word wi that we have a gloss set Gwi . Our model is shown in Fig. 1. Sense Selector: This selector selects each word sense that is most relevant to the context. We first feed Q into a bi-directional long short-term memory network (Bi-LSTM) to generate the hidden representation {h1 , . . . , hn }. Then we generate the context representation of Q as below: n n X 1X context = softmax(tanh(h> i · ( hi ))) · hi (1) i=1 n i=1 To calculate the correlation between each sememe and context, we use Sigmoid function to obtain probability value by: p(eij |context) = Sigmoid context · e>  ij , ∀ j ∈ (1, . . . , k) (2) For each sense in S wi , its probability is calculated by the average of all the sememe probabilities that it contains. In this way, we can select the sense with wi the highest probability value under the current context and denote it as Smax = {e1 , . . . , ek }, where {e1 , . . . , ek } represents all sememe vectors it contains. Gloss Selector: This selector selects the gloss that is most relevant to the selected word sense. Analogously, we use Sigmoid function to obtain the highest wi probability of gloss that is most relevant to the average embedding of Smax . We wi denote the selected gloss as Gmax = {o1 , . . . , om }, where {o1 , . . . , om } represents all word embeddings it contains. Co-Attention: To model the mutual influence and highlight the important information in the sememe and gloss, we introduce a co-attention network to dynamically combine the sememe and gloss representation as: > wi > U s = tanh(Smax wi · Gw max ), i U g = tanh(Gw max · Smax ) i (3) k X m X wi s SG =λ [U · softmax(U:s )]:j + (1 − λ) [U g · softmax(U:g )]:j (4) j=1 j=1 where SGwi is the combination representation of the selected sememe and gloss of word wi , λ is the parameter and λ ∈ [0, 1]. softmax(U:s ) and softmax(U:g ) are attention weight matrix for softmax function across each column of U s and U g , respectively. []:j denotes j-th column of []. Finally, we concatenate SGwi to the word embedding of wi to enrich the semantics and reduce ambiguity. Question Representation: We treat {x1 , . . . , xn } as the initial representa- tions of the given question which has been integrated with sememe and gloss. To further augment the contextual embeddings of the question, we parse the question into its syntactic dependencies graph DG and adopt relational graph convolutional network (RGCN) to digest this structural information:   (l+1) X X 1 (l) (l) (l) xi = ReLU  W (l) x + W0 xi  (5) r |Nir | r j r∈R j∈Ni Here R is a set of dependency-relation. l is l-th layer, Nir is the set of all r- neighbors of i-th node in DG . Note that W0 and Wr are weighted matrixes. Finally, we apply a pooling operation after the last RGCN layer to get the representation of the question. Relation Representation We represent relations from different granularity in a query graph. For each relation, we take its relation-level and word-level repre- sentations into consideration. The word-level relation is calculated by its average word embeddings. The relation-level representation is the vector of unique token of relation name. Then, each relation is represented by the sum operation and we perform max pooling over relations to obtain the final relation representation. 3 Experiments and Evaluations We use Wikidata as our KB and conduct on two data sets, namely, WebQSP- WD(WSPWD)[1] and QALD-7 (Task 4, English), both support for Wikidata. We use F1-score as metrics, where all results are macro-averaged scores. By Table 1, we show that our model outperforms all datasets. Our model achieves 54.2%, 23.3%, 8.9%, 11.9% higher F1-score compared to STAGG, HR- BiLSTM, GGNN, Slot-Matching on WSPWD. Analogously, we achieve 59.3%, 45.7%, 39.1%,21.7% higher F1-score on QALD-7. We observe that if we only inte- grate sememe information “+sememe” or gloss information “+gloss”, our model performs worse but still keep competitive. We can conclude the effectiveness of our augmented question representation with sememe and gloss integration. Table 1: Overall Average Results over Wikidata Model WSPWD QALD-7 STAGG(2015) [2] 0.1828 0.1861 HR-BiLSTM(2017) [5] 0.2287 0.2035 GGNN(2018)[1] 0.2588 0.2131 Slot-Matching(2019)[4] 0.2519 0.2436 +sememe 0.2459 0.2546 +gloss 0.2597 0.2743 Our 0.2819 0.2965 Fig. 2: The number of relations to find cor- rect answers To measure the performance across questions of different complexity, we break down the performance by the number of relations that are needed to find the correct answer on WSPWD, and results are shown in Fig. 2, we can see that our model is effective in dealing with different complexity questions. 4 Conclusion In this paper, we augment question understanding to improve KBQA. The se- meme and gloss information can benefit from each other and enhance question semantics. In this way, our approach provides a new method of usage of exter- nal knowledge in question representation. In future work, we are interested in extending our model for more complex practical questions. 5 Acknowledgments This work is supported by the National Key Research and Development Program of China (2017YFC0908401) and the National Natural Science Foundation of China (61972455). Xiaowang Zhang is supported by the Peiyang Young Scholars in Tianjin University (2019XRX-0032). References 1. Sorokin, D., Gurevych, I.: Modeling semantics with gated graph neural networks for knowledge base question answering. In: COLING’2018, pp.3306–3317. 2. Yih, W., Chang, M., He, X., Gao, J.: Semantic parsing via staged query graph generation: question answering with knowledge base. In: ACL’2015, pp.1321–1331. 3. Bloomfield, L.: A set of postulates for the science of language. Language, 1926, 2(3): 153-164. 4. Maheshwari, G., Trivedi, P., Lukovnikov, D., Chakraborty, N., Fischer, A., Lehmann, J. (2019). Learning to rank query graphs for complex question answering over knowledge graphs. In: ISWC’19, pp.487-504. 5. Yu, M., Yin, W., Hasan, K.S., Santos, C.N., Xiang,B., Zhou,B.: Improved neural relation detection for knowledge base question answering. In: ACL’2017, pp.571– 581.