=Paper=
{{Paper
|id=Vol-3196/paper5
|storemode=property
|title=From Graph to Graph: AMR to SPARQL
|pdfUrl=https://ceur-ws.org/Vol-3196/paper5.pdf
|volume=Vol-3196
|authors=Kanchan Shivashankar,Khaoula Benmaarouf,Nadine Steinmetz
|dblpUrl=https://dblp.org/rec/conf/esws/SantanaCRB22a
}}
==From Graph to Graph: AMR to SPARQL==
From Graph to Graph: AMR to SPARQL Kanchan Shivashankar, Khaoula Benmaarouf, and Nadine Steinmetz Technische Universität Ilmenau, Germany firstname.lastname@tu-ilmenau.de Abstract. We propose a graph to graph based transformation for KBQA systems. AMR graphs have proven promising for Question Answering (QA) systems and generating SPARQL queries. In this paper, we discuss using AMR graph for multilingual QA systems to generate SPARQL queries for Wikidata. The approach shows promising results and has scope for further improvement. Keywords: Question Answering · MultilingualQA · Semantic Web · AMR · SPARQL · Wikidata 1 Introduction The field of Knowledge Base Question Answering (KBQA) has seen a huge influx in research with many benchmark datasets being introduced. Question Answering over Linked Data (QALD) has been one such prominent dataset. The QALD 10 challenge deals with multilingual QA for Wikidata. With this paper, we propose a generalized solution for generating SPARQL queries from natural language questions using the Abstract Meaning Representation (AMR) combined with rules to generate the SPARQL query. Question answering system poses challenges such as handling complex ques- tions and multilingual questions. QA systems built on knowledge bases add the complexities of natural language to query mapping, entity and relation mapping and knowledge base generalizations. A solution that is robust and flexible to handle these complexities is the current challenge. AMR is a graph-based representation of the semantic information of a lan- guage. Its ability to abstract makes it a simple yet powerful form of natural language representation. It is widely used in simplifying NLP tasks and for mul- tilingual data. It captures the semantic representation of a language ignoring the syntactic information. Thus, a sentence with the same meaning which can be worded in multiple ways in natural language, will have a single AMR represen- tation. AMR has been used previously in KBQA tasks for DBpedia knowledge base [3]. Although AMRs favor and are designed for the English language, it has shown promising results for processing multilingual data [1]. Our approach takes into account the AMR graphs of the English and German language version of a question and extracts the SPARQL graph for the query. Copyright © 2022 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 K. Shivashankar et al. Graph based transformation methods and rule based approaches are widely and successfully used in query generation and question answering systems. With our limited understanding of AMR syntax and predicted answer types, we attempt to transform the AMR graph into query paths and define rules to generate SPARQL queries. 2 Data Analysis Data analysis is performed to extract and visualize different features of the data. It provides insights into the data and design an approach that can easily be in- terfaced with the data. Our dataset consists of multilingual question and answer (QA) pairs. The data analysis step was performed on the training (QALD9 Plus) and test (QALD10) data. The training data consists of 412 question-answer pairs across 9 languages (English, German, Russian, French, Armenian, Belarusian, Lithuanian, Bashkir, and Ukrainian). Test data contains 394 QA pairs across 4 different languages (English, German, Russian and Chinese). The datasets are provided in json format and contains language - question pairs, SPARQL query for Wikidata and Answers. 3 From Text to Graph to Graph Our approach consists of two main steps: (1)generation of the AMR graph and (2)deduction of the SPARQL query from the graph. Our pipeline for dataflow is depicted in Figure 1. Fig. 1: Dataflow using our approach 3.1 AMR Graph Generation We use a pre-trained multilingual AMR parser[1], trained on 4 different lan- guages (English, German, Italian, Spanish and Chinese), to generate AMRs for English and German sentences from the QALD10 test dataset. This step is followed by generating alignments to the AMR using JAMR alignment[2]. The JAMR model annotates alignment information to the output of the AMR parser from the previous step. Alignment provides edge and node information, which can be used to plot AMR graphs. The AMR graphs act as the first step in generating SPARQL queries. From Graph to Graph: AMR to SPARQL 3 3.2 SPARQL Query Generation In this step, we generate the SPARQL query graph from the AMR graph. In theory, the AMR graphs for the same input question in different languages should look the same. But, in fact, they often differ. In many cases, these differences stem from erroneous entity detection. Thereby, parts of an actual entity surface form are detected as part of the question, which results in incorrect dependencies and nodes. Therefore, we decided to take into account the AMR graphs for the English and German language version of the input question. The subsequent pre- processing steps are performed on both AMR graphs. The procedure is described in detail in the following sections. (a) Initial graph. (b) Simplified graph. Fig. 2: AMR graph for the question Give me all actors starring in movies directed by William Shatner (a) initial state and (b) after simplification AMR Graph Simplification Figure 2a shows the initial AMR graph for the question Give me all actors starring in movies directed by William Shatner. Firstly, we simplify the graph by removing unnecessary nodes and merging nodes that belong together. We remove nodes that are empty or contain stop words as well as accompanying :name edges, if a :wiki mapping is given. References of relations are often split into several nodes in the AMR graph. For instance, for the question How many grand-children did Jacques Cousteau have?, the relation reference have grand-children is split into two nodes. We merge those nodes to get a complete label for the property mapping. And we identify the amr-unknown node in the graph. Figure 2b shows the simplified AMR graph. Path Extraction We extract all paths from the AMR graph starting at the amr-unknown node to all ending nodes. We split a path at an unknown entity node, such as movie in our sample graph. At the position of the split, we intro- duce a new variable for the SPARQL query. For each path, the beginning node 4 K. Shivashankar et al. (amr-unknown) constitutes the subject and the ending node constitutes the ob- ject of a triple. All edge and node labels on the path between start and end are concatenated as property label. Query Generation Each path is transferred to two RDF triples: both options of using the first node as subject or object and the last node as object and subject respectively. For n triples, we generate 2n different queries per question. For entity and property identification, we utilize the linkings from the AMR graph generation, fuzzy search on properties of the train dataset and the Falcon 2.0 API1 . In addition, we utilize the predicted answer category [4] to add a COUNT operator if required, and identify ASK questions. Finally, we use :quant edges from the AMR graph to add quantification restrictions in a FILTER clause to the query. Query Execution All queries per question are executed on a local instance of the Wikidata SPARQL endpoint2 . If a query produces results, the categories of these results are compared with the predicted answer type category and accepted only if they match. 4 Evaluation We evaluate our approach on the QALD 10 test dataset using the GERBIL framework3 as shown in Table 1. The test dataset contains 394 questions. Our algorithm provides queries and answers for 146 questions. The remaining ques- tions are cases that we do not take into account resp. cannot handle at this stage of our approach, such as comparative questions, questions containing a superla- tive, boolean questions with either one or more than two entities, and questions that require a property path in the SPARQL query. Micro Micro Micro Macro Macro Macro Macro F1 F1 Precision Recall F1 Precision Recall QALD Baseline 0.0385 0.0204 0.3293 0.507 0.5068 0.5238 0.5776 Our Approach 0.3839 0.7153 0.2624 0.3215 0.3206 0.3312 0.4909 C2KB 0.3706 0.7594 0.2451 0.3252 0.3411 0.3362 P2KB 0.3568 0.75 0.2341 0.4006 0.417 0.4006 RE2KB 0.2739 0.5756 0.1797 0.3478 0.3594 0.3498 Table 1: Evaluation results of our approach on the QALD 10 test dataset. 1 https://labs.tib.eu/falcon/falcon2/api-use 2 as provided by: https://hub.docker.com/r/qacompany/hdt-query-service 3 https://gerbil-qa.aksw.org/gerbil/ From Graph to Graph: AMR to SPARQL 5 5 Conclusion Using AMR graphs for query generation in QA systems has provided promising results with our approach. We have achieved good results on the questions our system is able to handle (approx. 40 % of the questions). AMR graphs use nu- merous edges and node labels to represent different aspects of natural language. Understanding these keywords can help prepare and create rules to generate more complex queries. Especially additional operators, such as LIMIT, ORDER, GROUP BY, or FILTER are required in many complex questions. Future work in- cludes the comprehension of the AMR graphs and transformation to SPARQL query (operators). Another area of focus would be the entity and relation linking processes for the Wikidata knowledge base. Some of the parts of our approach can be performed agnostic from the knowledge base. But, as the representation of facts might be quite different in the various knowledge bases, this informa- tion must be involved in the query generation process. This includes information about domain and range of properties and types of entities, among others. References 1. Cai, D., Li, X., Ho, J.C.S., Bing, L., Lam, W.: Multilingual amr parsing with noisy knowledge distillation (2021), https://arxiv.org/abs/2109.15196 2. Flanigan, J., Thomson, S., Carbonell, J., Dyer, C., Smith, N.A.: A discriminative graph-based parser for the Abstract Meaning Representation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1426–1436. Association for Computational Linguistics, Baltimore, Maryland (Jun 2014), https://aclanthology.org/P14-1134 3. Kapanipathi, P., Abdelaziz, I., Ravishankar, S., Roukos, S., Gray, A., Astudillo, R., Chang, M., Cornelio, C., Dana, S., Fokoue, A., Garg, D., Gliozzo, A., Gu- rajada, S., Karanam, H., Khan, N., Khandelwal, D., Lee, Y.S., Li, Y., Luus, F., Makondo, N., Mihindukulasooriya, N., Naseem, T., Neelam, S., Popa, L., Reddy, R., Riegel, R., Rossiello, G., Sharma, U., Bhargav, G.P.S., Yu, M.: Leveraging abstract meaning representation for knowledge base question answering (2020), https://arxiv.org/abs/2012.01707 4. Shivashankar, K., Benmaarouf, K., Steinmetz, N.: Reaching out for the answer: Answer type prediction. In: Proceedings of the SeMantic AnsweR Type prediction task (SMART) co-located with the 20th International Semantic Web Conference (ISWC 2021). CEUR-WS (2021)