=Paper=
{{Paper
|id=Vol-2980/paper379
|storemode=property
|title=BERT-based Semantic
Query Graph Extraction for Knowledge Graph Question Answering
|pdfUrl=https://ceur-ws.org/Vol-2980/paper379.pdf
|volume=Vol-2980
|authors=Zhicheng Liang, Zixuan Peng, Xuefeng Yang, Fubang Zhao, Yunfeng Liu, Deborah McGuinness
|dblpUrl=https://dblp.org/rec/conf/semweb/LiangPYZLM21
}}
==BERT-based Semantic
Query Graph Extraction for Knowledge Graph Question Answering==
BERT-based Semantic Query Graph Extraction for Knowledge Graph Question Answering Zhicheng Liang* ,†,1 , Zixuan Peng†,2 , Xuefeng Yang2 , Fubang Zhao2 , Yunfeng Liu2 , and Deborah L. McGuinness1 1 Department of Computer Science, Rensselaer Polytechnic Institute, USA 2 Zhuiyi Technology, China Abstract. Answering complex questions involving multiple entities and relations remains a challenging Knowledge Graph Question Answering (KGQA) task. To extract a Semantic Query Graph (SQG), we propose a BERT-based decoder that is capable of jointly performing multi-tasks for SQG construction, such as entity detection, relation prediction, output variable selection, query type classification and ordinal constraint detection. The outputs of our model can be seamlessly integrated with downstream components (e.g. entity linking) of a KGQA pipeline to construct a formal query. The results of our experiments show that our proposed BERT-based semantic query graph extractor achieves better performance than traditional recurrent neural network based extractors. Meanwhile, the KGQA pipeline based on our model outperforms baseline approaches on two benchmark datasets (LC-QuAD, WebQSP) containing complex questions. § 1 Introduction Semantic parsing (SP) based approaches to knowledge graph question answering (KGQA) aim at building a semantic parser that first converts natural language questions into some logical forms, and then into formal queries like SRARQL that can be executed on the un- derlying KG to retrieve answers. For these approaches, constructing the semantic query graph (SQG) plays a vital role. For example, the SQG of a natural language query (NLQ) “What awards have been won by the executive producer of Fraggle Rock?” involves three nodes and two labeled edges, i.e. {(Fraggle Rock, dbo:executiveProducer, ?x), (?x, dbo:award, ?uri)} if represented using triples, where ?x and ?uri are some free variables. The answers to this query should be the grounded KG nodes for the output variable ?uri. Despite some work on abstract query graph prediction [1, 8], there is yet to be an end-to-end model that jointly performs query graph identification along with entity mention detection and relation prediction. To this end, we propose a novel BERT-based neural network to extract SQG in an end-to-end manner for answering complex questions with multiple triple patterns. We evaluate our approach on two KGQA benchmark datasets containing complex questions. The experimental results demonstrate that our approach, by using a simple pipeline built on top of our proposed SQG extractor, improves the overall KGQA performance outperforming the baseline approaches. * Work partially done during an internship at Zhuiyi Technology. † Equal contribution. Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). § Our code and data are available at: https://github.com/gychant/BERT-NL2SPARQL Stage 1 Multi-Task Learning Target SPARQL Query SELECT FFNN Query Type Classi cation COUNT SELECT DISTINCT ?uri WHERE ASK { ?x dbr:Fraggle_Rock dbo:executiveProducer ?x . FFNN Output Variable Selection ?uri ?x dbo:award ?uri . …… } First FFNN Ordinal Constraint Detection Second Sentence-Level Last Representation Subject of Triple 1 Subject of Triple 2 …… [CLS] What awards have been won by the executive producer of Fraggle Rock ? [SEP] ?x ?uri [SEP] 0 0 0 1 0 0 0 1 0 0 Subject Head Tagging Subject Tail Tagging 0 0 0 …… Token-Level Logistic Regression Layer … 0 1 0 0 1 0 0 BERT Encoder BERT Encoder Stage 2 0 0 0 dbo:award 0 0 0 0 0 0 dbo:executiveProducer 1 0 0 … Token-Level Logistic … …… …… …… …… …… …… …… Relation Tagging Regression Layer 0 0 0 dbo:director 0 0 0 Object Tail Tagging 0 0 0 1 0 0 [CLS] Fraggle Rock [SEP] What awards have been won by the executive producer of Fraggle Rock ? [SEP] ?x ?uri [SEP] Subject of Triple 1 Object of Triple 1 SQG-Decoder Entity Linking & Extracted Triple Patterns: Query Construction SPARQL: Query Execution Answers: SELECT DISTINCT ?uri WHERE (Fraggle Rock, dbo:executiveProducer, ?x) { dbr:Emmy_Award dbr:Fraggle_Rock dbo:executiveProducer ?x . dbr:Disney_Legends (?x, dbo:award, ?uri) KG ?x dbo:award ?uri . KG } fi Fig. 1: An overview of the KGQA pipeline with our Semantic Query Graph Decoder (SQG-Decoder) illustrated by a running example. 2 Our approach An overview of our approach for KGQA is given in Fig. 1. We propose a semantic query graph decoder (SQG-Decoder) that, given an NLQ, aims to predict a set of triple patterns, each in the form of (Subject, Relation, Object), where Subject and Object are either some text spans for entities, or some introduced free variables, while Relation is a grounded relation/predicate defined by the KG schema. As opposed to single-relation questions, complex questions involve multiple triple patterns that form a directed acyclic graph (DAG). SQG-Decoder uses BERT [2] as an encoder to obtain both sentence-level and token-level contextual embeddings of the input NLQ. Specifically, we construct the input of the BERT encoder in the format of “[CLS] NLQ [SEP] Free variables [SEP]”. This enables free variables to interact with the entire NLQ sentence via attention mechanisms used in BERT. With BERT as an encoder, we propose a novel decoding scheme to extract triple patterns given an NLQ. It predicts each triple pattern (sj , pj , oj ) in two stages, as shown in Fig. 1. The first stage is to determine the span of a subject, which denotes an entity mention in the NLQ or a free variable. Given the token-level vector representations obtained from BERT, i.e. H = {h0 , h1 , ..., hl−1 }, hi is used to predict whether the i-th token is the head/tail token of the subject span sj by applying a linear feed-forward layer (or stacking multiple such layers with non-linearity) with sigmoid activation to perform binary classification with logistic regression. The second stage is to determine the span of the object and the type of relation for the triple pattern, conditioned on the predicted subject in the first stage. To achieve this, the predicted sj and the NLQ are concatenated (with {0, 1} masks to distinguish the two different parts) and fed into the encoder again to obtain the subject-aware token-level representations H0 = {h00 , h01 , ..., h0l−1 }. Specifically, the relation pj is jointly predicted with the head token of the object mention oj considering that they are dependent on each other. The tail token of the object mention oj is identified in a similar way as that of the subject span sj . In this way, the encoder layer is shared across the described two stages along with task-specific output layers for subject and object-relation extraction, respectively. Note that our SQG-Decoder is capable of predicting multiple triple patterns for a query. While doing this, we need to handle the issue of overlapping text spans since an entity mention may appear in more than one triple pattern given an NLQ, e.g. in Fig. 1 the variable ?x is involved in two triple patterns. Our solution is, instead of labeling the text span of an entity by using 1 for all involved tokens and 0 otherwise, we label each token in the NLQ as follows: (1, 0) for the head of entity mention, (0, 1) for the tail of entity mention, (1, 1) if a token is both the start and the end of an entity mention, and (0, 0) otherwise. To train our SQG-Decoder, we need annotations of text spans in an NLQ that correspond to the canonical entity names of the subject/object of a triple pattern. To automate this process, we adopt a reverse linking algorithm mainly based on n-gram fuzzy matching to annotate the spans of entities mentioned in the corresponding SPARQL query. To further improve linking performance, we also leverage pre-trained word embeddings for measuring the similarity between entity names from the KG and their surface forms in text. Given an NLQ, the model parameters are learned by maximizing the likelihood of a set of ground truth triple patterns. Practically, our decoding scheme can be adapted to various neural-net-based encoders (e.g. BiLSTMs, RNNs, Transformers) from which token-level representations of the encoder input can be obtained. To ground extracted text spans to the KG, we employ the same entity linking tools used by the baselines in our experiments for fair comparison. Specifically, we built an Apache Lucene Index to retrieve entity candidates, via string similarity between their names and entity mention queries following the baselines [3, 5] for LC-QuAD, and used the entity linking results [6] for WebQSP. In addition to triple pattern extraction, we leverage BERT to perform three auxiliary classification tasks that are important to constructing the query graph as follows. 1) Output Variable Selection. Since a query graph may have multiple variables, we need to determine which one represents the returned answer. 2) Query Type Classification. Query type refers to the type of information that is asked upon the output variable. Here we focus on three types of queries: SELECT, COUNT, and ASK (boolean) query. 3) Ordinal Constraint Detection. In some cases, ordinal constraints are imposed on the output variables. For instance, to answer the question “What is the first book Sherlock Holmes appeared in?”, we need to sort the books in which the detective appeared by publication date and then return the first as the answer. To perform the above three tasks, we feed the transformer output for the [CLS] token into separate classification layers. Since these tasks are related to each other, we adopt a multi-task learning strategy to train the SQG-Decoder jointly by combining the cross-entropy losses of these tasks together with the binary cross-entropy losses of the principal task, i.e. triple pattern extraction. 3 Experiments & Results Datasets. We evaluate our approach on two datasets, LC-QuAD [4] and WebQSP [6], both targeting complex questions and having ground truth SPARQL annotations. Evaluation Metrics. We use accuracy of the answer set as the main evaluation metric (i.e. the percentage of test questions for which the predicted answer set exactly matches the ground-truth answer set). Precision, recall and F1 scores are also reported. We use the official evaluation script for WebQSP and the average F1 is reported by [6]. Experimental Setup. Our SQG-Decoder is fine-tuned based on the pre-trained lan- guage model BERT [2]. We compare BERTBASE (12-layers, 768-hidden, 12-heads) with BERTL ARGE (24-layers, 1024-hidden, 16-heads) to examine the performance with dif- ferent model sizes. We use the Adam optimizer with a learning rate of 2 × 10−5 and a batch size of 32. The best model is selected using the development set. End-to-end KGQA Performance. We only compare with previous work that reported end-to-end KGQA performance, i.e. those starting with raw input query without linked entities or relations given. Table 1 & 2 show that we achieve the best F1 score of 54.9 on LC-QuAD and the best accuracy of 66.0 on WebQSP, both using BERTL ARGE . Table 1: LC-QuAD results Table 2: WebQSP results Methods Pre. Rec. F1 Acc. Methods Pre. Rec. Avg. F1 Acc. QAmp[5] 25.0 50.0 33.0 - STAGG[6] 70.9 80.3 71.7 63.9 WDAqua-core1[3] 59.0 38.0 46.0 - HR-BiLSTM[7] - - - 63.9 Ours w/ BERTL ARGE 51.1 59.3 54.9 46.6 Ours w/ BERTL ARGE 85.9 75.5 70.3 66.0 Ours w/ BERTBASE 49.9 57.3 53.3 45.6 Ours w/ BERTBASE 85.1 73.1 69.2 65.5 Evaluation of Triple Pattern Extraction. This evaluates the main contribution of our work. We compare the precision, recall and F1 scores of extracted triple patterns given that the SQG-Decoder is built on top of two types of encoders: Bi-LSTM encoder with GloVe Embeddings, and BERTBASE , as summarized in Table 3. The results demonstrate that the latter always outperforms the former, mainly due to higher recall, but stays close in terms of precision. This shows that our proposed SQG-Decoder can be adapted to other types of encoders besides those based on transformers. Error Analysis. We randomly select 100 incorrectly answered questions from the LC- QuAD test set. Major errors observed include: 1)Missing triple patterns (54%). Errors of this type are attributed to some predicates that are in the test set but are never seen in the training set, as well as the data sparcity issue. 2) Mispredicting semantically Table 3: Triple pattern extraction performance LC-QuAD WebQSP Methods Pre. Rec. F1 Pre. Rec. F1 BiLSTM Encoder + GloVe 58.7 33.5 42.6 74.6 58.8 65.8 BERTBASE Encoder 62.8 56.1 59.2 75.3 70.9 73.0 close predicates (32%). This is typically because some predicates have very similar semantic meanings while the NLQ is not informative enough to distinguish between them. 3) Entity span misidentification & redundant predicates, etc (14%). Errors of this type include incorrect entity span detection that may cause failures in entity linking, and mispredicting redundant predicates, etc. 4 Conclusions and Future Work We present a BERT-based Semantic Query Graph Decoder (SQG-Decoder) for answering complex natural language queries, using knowledge graphs, which jointly learns multiple subtasks in an end-to-end trainable manner. Our approach remarkably outperforms the baselines on two KGQA benchmarks that contain complex questions. As future work, we would like to explore answering questions with more complex temporal and spatial constraints using neural networks. Acknowledgments This work was partially supported by the DARPA MCS program award number N660011924033 to RPI under USC-ISI West. References 1. Chen, Y., Li, H., Hua, Y., Qi, G.: Formal query building with query structure prediction for complex question answering over knowledge base. In: IJCAI. pp. 3751–3758 (2020) 2. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL. pp. 4171–4186 (2019) 3. Diefenbach, D., Both, A., Singh, K., Maret, P.: Towards a question answering system over the semantic web. Semantic Web (Preprint), 1–19 4. Trivedi, P., Maheshwari, G., Dubey, M., Lehmann, J.: LC-QuAD: A corpus for complex question answering over knowledge graphs. In: ISWC. pp. 210–218. Springer (2017) 5. Vakulenko, S., Fernandez Garcia, J.D., Polleres, A., de Rijke, M., Cochez, M.: Message passing for complex question answering over knowledge graphs. In: CIKM. pp. 1431–1440. ACM (2019) 6. Yih, W.t., Richardson, M., Meek, C., Chang, M.W., Suh, J.: The value of semantic parse labeling for knowledge base question answering. In: ACL. pp. 201–206 (2016) 7. Yu, M., Yin, W., Hasan, K.S., Santos, C.d., Xiang, B., Zhou, B.: Improved neural relation detection for knowledge base question answering. In: ACL. pp. 571–581 (2017) 8. Zheng, W., Zhang, M.: Question answering over knowledge graphs via structural query patterns. arXiv preprint arXiv:1910.09760 (2019)