<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>From Graph to Graph: AMR to SPARQL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kanchan Shivashankar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Khaoula Benmaarouf</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nadine Steinmetz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technische Universiatt ̈ Ilmenau</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose a graph to graph based transformation for KBQA systems. AMR graphs have proven promising for Question Answering (QA) systems and generating SPARQL queries. In this paper, we discuss using AMR graph for multilingual QA systems to generate SPARQL queries for Wikidata. The approach shows promising results and has scope for further improvement.</p>
      </abstract>
      <kwd-group>
        <kwd>Question Answering</kwd>
        <kwd>MultilingualQA</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>AMR</kwd>
        <kwd>SPARQL</kwd>
        <kwd>Wikidata</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Graph based transformation methods and rule based approaches are widely and
successfully used in query generation and question answering systems. With our
limited understanding of AMR syntax and predicted answer types, we attempt
to transform the AMR graph into query paths and define rules to generate
SPARQL queries.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data Analysis</title>
      <p>Data analysis is performed to extract and visualize diferent features of the data.
It provides insights into the data and design an approach that can easily be
interfaced with the data. Our dataset consists of multilingual question and answer
(QA) pairs. The data analysis step was performed on the training (QALD9 Plus)
and test (QALD10) data. The training data consists of 412 question-answer pairs
across 9 languages (English, German, Russian, French, Armenian, Belarusian,
Lithuanian, Bashkir, and Ukrainian). Test data contains 394 QA pairs across 4
diferent languages (English, German, Russian and Chinese). The datasets are
provided in json format and contains language - question pairs, SPARQL query
for Wikidata and Answers.
3</p>
    </sec>
    <sec id="sec-3">
      <title>From Text to Graph to Graph</title>
      <p>
        Our approach consists of two main steps: (1)generation of the AMR graph and
(2)deduction of the SPARQL query from the graph. Our pipeline for dataflow is
depicted in Figure 1.
We use a pre-trained multilingual AMR parser[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], trained on 4 diferent
languages (English, German, Italian, Spanish and Chinese), to generate AMRs
for English and German sentences from the QALD10 test dataset. This step
is followed by generating alignments to the AMR using JAMR alignment[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
The JAMR model annotates alignment information to the output of the AMR
parser from the previous step. Alignment provides edge and node information,
which can be used to plot AMR graphs. The AMR graphs act as the first step
in generating SPARQL queries.
      </p>
      <p>SPARQL Query Generation
In this step, we generate the SPARQL query graph from the AMR graph. In
theory, the AMR graphs for the same input question in diferent languages should
look the same. But, in fact, they often difer. In many cases, these diferences
stem from erroneous entity detection. Thereby, parts of an actual entity surface
form are detected as part of the question, which results in incorrect dependencies
and nodes. Therefore, we decided to take into account the AMR graphs for the
English and German language version of the input question. The subsequent
preprocessing steps are performed on both AMR graphs. The procedure is described
in detail in the following sections.</p>
      <p>(a) Initial graph.</p>
      <p>(b) Simplified graph.
AMR Graph Simplification Figure 2a shows the initial AMR graph for the
question Give me all actors starring in movies directed by William Shatner.
Firstly, we simplify the graph by removing unnecessary nodes and merging nodes
that belong together. We remove nodes that are empty or contain stop words as
well as accompanying :name edges, if a :wiki mapping is given. References of
relations are often split into several nodes in the AMR graph. For instance, for
the question How many grand-children did Jacques Cousteau have?, the relation
reference have grand-children is split into two nodes. We merge those nodes to
get a complete label for the property mapping. And we identify the amr-unknown
node in the graph. Figure 2b shows the simplified AMR graph.</p>
      <p>Path Extraction We extract all paths from the AMR graph starting at the
amr-unknown node to all ending nodes. We split a path at an unknown entity
node, such as movie in our sample graph. At the position of the split, we
introduce a new variable for the SPARQL query. For each path, the beginning node
(amr-unknown) constitutes the subject and the ending node constitutes the
object of a triple. All edge and node labels on the path between start and end are
concatenated as property label.</p>
      <p>
        Query Generation Each path is transferred to two RDF triples: both options
of using the first node as subject or object and the last node as object and
subject respectively. For n triples, we generate 2n diferent queries per question.
For entity and property identification, we utilize the linkings from the AMR
graph generation, fuzzy search on properties of the train dataset and the Falcon
2.0 API1. In addition, we utilize the predicted answer category [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to add a COUNT
operator if required, and identify ASK questions. Finally, we use :quant edges
from the AMR graph to add quantification restrictions in a FILTER clause to the
query.
      </p>
      <p>Query Execution All queries per question are executed on a local instance of
the Wikidata SPARQL endpoint2. If a query produces results, the categories of
these results are compared with the predicted answer type category and accepted
only if they match.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>We evaluate our approach on the QALD 10 test dataset using the GERBIL
framework3 as shown in Table 1. The test dataset contains 394 questions. Our
algorithm provides queries and answers for 146 questions. The remaining
questions are cases that we do not take into account resp. cannot handle at this stage
of our approach, such as comparative questions, questions containing a
superlative, boolean questions with either one or more than two entities, and questions
that require a property path in the SPARQL query.</p>
      <p>Micro
F1</p>
      <p>Micro
Precision</p>
      <p>Micro
Recall</p>
      <p>Macro
F1</p>
      <p>Macro
Precision</p>
      <p>Macro
Recall</p>
      <p>Macro F1
QALD
1 https://labs.tib.eu/falcon/falcon2/api-use
2 as provided by: https://hub.docker.com/r/qacompany/hdt-query-service
3 https://gerbil-qa.aksw.org/gerbil/
Using AMR graphs for query generation in QA systems has provided promising
results with our approach. We have achieved good results on the questions our
system is able to handle (approx. 40 % of the questions). AMR graphs use
numerous edges and node labels to represent diferent aspects of natural language.
Understanding these keywords can help prepare and create rules to generate
more complex queries. Especially additional operators, such as LIMIT, ORDER,
GROUP BY, or FILTER are required in many complex questions. Future work
includes the comprehension of the AMR graphs and transformation to SPARQL
query (operators). Another area of focus would be the entity and relation linking
processes for the Wikidata knowledge base. Some of the parts of our approach
can be performed agnostic from the knowledge base. But, as the representation
of facts might be quite diferent in the various knowledge bases, this
information must be involved in the query generation process. This includes information
about domain and range of properties and types of entities, among others.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ho</surname>
            ,
            <given-names>J.C.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bing</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lam</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Multilingual amr parsing with noisy knowledge distillation (</article-title>
          <year>2021</year>
          ), https://arxiv.org/abs/2109.15196
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Flanigan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Carbonell, J.,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>A discriminative graph-based parser for the Abstract Meaning Representation</article-title>
          . In:
          <article-title>Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          . pp.
          <fpage>1426</fpage>
          -
          <lpage>1436</lpage>
          . Association for Computational Linguistics, Baltimore,
          <source>Maryland (Jun</source>
          <year>2014</year>
          ), https://aclanthology.org/P14-1134
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kapanipathi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abdelaziz</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ravishankar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roukos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Astudillo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cornelio</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dana</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fokoue</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garg</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gliozzo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurajada</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karanam</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khandelwal</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>Y.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luus</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Makondo</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mihindukulasooriya</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naseem</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neelam</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reddy</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rossiello</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhargav</surname>
            ,
            <given-names>G.P.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Leveraging abstract meaning representation for knowledge base question answering (</article-title>
          <year>2020</year>
          ), https://arxiv.org/abs/
          <year>2012</year>
          .01707
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Shivashankar</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benmaarouf</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steinmetz</surname>
          </string-name>
          , N.:
          <article-title>Reaching out for the answer: Answer type prediction</article-title>
          . In:
          <article-title>Proceedings of the SeMantic AnsweR Type prediction task (SMART) co-located with the 20th International Semantic Web Conference (ISWC</article-title>
          <year>2021</year>
          ).
          <article-title>CEUR-WS (</article-title>
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>