<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generating Grammars from lemon lexica for Questions Answering over Linked Data: a Preliminary Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Viktoria Benz</string-name>
          <email>vbenz@techfak.uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philipp Cimiano</string-name>
          <email>cimiano@techfak.uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammad Fazleh Elahi</string-name>
          <email>melahi@techfak.uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Basil Ell</string-name>
          <email>bell@techfak.uni-bielefeld.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Semantic Computing Group, CITEC, Universität Bielefeld</institution>
          ,
          <addr-line>Inspiration 1, 33619 Bielefeld</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Most approaches to question answering over linked data (QALD) frame the task as a machine learning problem, consisting in learning a mapping from natural language questions into SPARQL queries by parametrizing a model from training data given in the form of pairs of natural language (NL) question and SPARQL query. In this preliminary work we present an alternative approach to developing a QA system using machine learning that relies on the automatic generation of a QA grammar from a lemon lexicon. This model-based approach comes with a number of advantages compared to a machine learning approach. First, our approach gives maximum control over the QA interface to the developer of the system as every entry added to the lexicon increases the coverage of the grammar and thus of the QA system in a predictable way. This is in contrast to machine learning approaches where the impact of the addition of a single training example is difficult to predict. A further advantage of our approach is that the QA system operates on the basis of a symbolic grammar that can be used to provide guidance and auto-completion functionality to users. Our system is indeed intended to be used in the context of an auto-completion interface that allows users to ask only questions that the grammar can cover. We present very preliminary results showing that a large percentage of the questions of the training set of QALD-7 can be rephrased in terms of questions that our grammar can parse. We show that with a hand-crafted lexicon, we can in principle get very high micro-F1 scores of 62.5% on the training data of QALD-7 when questions are manually rephrased to fit our grammar. Although these preliminary results do not constitute a proper evaluation of our approach, they hint at the fact that an approach as we propose seems feasible.</p>
      </abstract>
      <kwd-group>
        <kwd>grammar generation</kwd>
        <kwd>question answering over linked data</kwd>
        <kwd>lemon</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Most approaches to question answering over linked data (QALD) follow a
machine learning (ML) approach where a QA system is learned by
parametrizing a model on the basis of training data given in the form of pairs of NL
question and SPARQL query. In this preliminary work, we explore an
alternative paradigm that consists in generating a lexicalized grammar that can
be used to parse questions into SPARQL. The method relies on the
availability of a lemon lexicon [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] for the given ontology / knowledge graph that is
queried. In our view this does not represent a limitation as the lexicon can be
also used in other tasks, e.g. to verbalize the ontology. Further, the approach
of developing such lexica can be scaled up by a collaborative approach [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
An advantage of our proposed paradigm is that it provides a level of control
over the QA interface that ML methods can not provide. In fact, for ML
systems in general, it is unclear what the impact of adding one example is in
terms of behaviour of the system. Generally, several examples of the same
or similar type need to be provided for the ML system to learn a pattern,
leading to high redundancy in training data. In contrast, in our model-based
approach, several lexicalized grammar rules are generated automatically
for each entry in the lemon lexicon and the impact of each lexicon entry on
the behaviour of the QA system can be clearly predicted and monitored. A
further limitation of machine learning approaches is that they can not be
directly used in a guided interface that offers auto-completion features as this
requires access to a grammar or a correspondingly powerful look-ahead.
Finally, adopting a QA system to different domains requires the creation of
training datasets with hundreds if not thousands of questions to train a
system from scratch for the new domain or ontology. We shift this effort to the
creation of a lexicon that can represent a cost-effective approach compared
to creating large amount of redundant examples the impact of which is not
clear a priori.
      </p>
      <p>
        For these reasons, we have developed an approach that can automatically
generate a lexicalized grammar from a lemon lexicon. The approach is
inspired by earlier work on generating LTAG grammars [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Here we revisit
this approach in the context of QALD interfaces and provide a preliminary
evaluation of our approach that shows that, if a corresponding lexicon is
available, our approach can reach micro F-1 values of 62.5% on the English
dataset of QALD-73 [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], provided that queries are rephrased with our
grammar.
      </p>
      <p>
        This paper is structured as follows: in the next Section we describe our
approach and in particular how regular lexicalized grammars are generated
from lemon lexica. These grammars can be used to parse questions and map
them into corresponding SPARQL queries. We provide a preliminary
feasi3 http://qald.aksw.org/
bility study for our approach by evaluating the approach on QALD-7 training
data. We manually rephrase questions in QALD-7 so that they can be
analysed with our grammar. While the rephrasing might be seen as a critical
limitation, we regard this rephrasing as legitimate given that we see the
application of our grammar-based approach in the context of a guided NL
interface with auto-completion functionality such as proposed by Rico et al.
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. So our main question is not whether people would be able to ask the
questions from QALD-7 as they are given, but whether they would be able
to ask a question that satisfies their information need using our grammar.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2 Generating QA grammars from lemon lexica</title>
      <p>
        Our approach automatically generates lexicalized regular grammars from
lexical entries in a lemon lexicon and is inspired in previous work that showed
how to generate LTAG grammars from a specification of the ontology-lexicon
interface [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. The grammar generation approach works for four basic types
of entries corresponding to:
– relational nouns subcategorizing a prepositional phrase (NounPPFrame
in Lexinfo [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]), e.g. ‘capital (of)’
– transitive verbs (TransitiveFrame), e.g. (to) ‘direct’
– intransitive verbs subcategorizing a prepositional phrase, (IntransitivePPFrame),
e.g. ‘flow through’
– adjectives (AdjectiveAttributiveFrame), e.g. ‘spanish’
We describe the grammar entries generated for each of these four types in
the following subsections
      </p>
      <sec id="sec-2-1">
        <title>2.1 NounPPFrame</title>
        <p>Consider the lemon lexical entry in Figure 1 for the relational noun
‘capital’ (of). The entry states that the canonical written form of the entry is
“capital”. It states that the entry has a NounPPFrame as syntactic behaviour
that corresponds to a copulative construction ‘X is the capital of Y’ with two
arguments, where copulativeArg corresponds to the copula subject X and
the prepositionalAdjunct corresponds to the prepositional object Y. The
semantics of the relational noun is captured with respect to the property
http://dbpedia.org/ontology/capital, where the subject of the property
is realized by the prepositionalAdjunct and the object of the property is
realized by the copulativeArg. This essentially captures the fact that the
meaning of ‘Berlin is the capital of Germany’ is expressed by the triple
&lt;Germany&gt; &lt;http://dbpedia.org/ontology/capital&gt; &lt;Berlin&gt;
1 :lexicon_en a lemon:Lexicon ;
2 lemon:language "en" ;
3 lemon:entry :capital_of ;
4 lemon:entry :of .
5
6 :capital_of a lemon:LexicalEntry ;
7 lexinfo:partOfSpeech lexinfo:noun ;
8 lemon:canonicalForm :capital_form ;
9 lemon:synBehavior :capital_of_nounpp ;
10 lemon:sense :capital_sense_ontomap .
11
12 :capital_form a lemon:Form ;
13 lemon:writtenRep "capital"@en .
14
15 :capital_of_nounpp a lexinfo:NounPPFrame ;
16 lexinfo:copulativeArg :arg1 ;
17 lexinfo:prepositionalAdjunct :arg2 .
18
19 :capital_sense_ontomap a lemon:OntoMap, lemon:LexicalSense ;
20 lemon:ontoMapping :capital_sense_ontomap ;
21 lemon:reference &lt;http://dbpedia.org/ontology/capital&gt; ;
22 lemon:subjOfProp :arg2 ;
23 lemon:objOfProp :arg1 ;
24 lemon:condition :capital_condition .
25
26 :capital_condition a lemon:condition ;
27 lemon:propertyDomain &lt;http://dbpedia.org/ontology/Country&gt; ;
28 lemon:propertyRange &lt;http://dbpedia.org/ontology/City&gt; .
29
30 :arg2 lemon:marker :of .
31
32 :of a lemon:SynRoleMarker ;
33 lemon:canonicalForm [ lemon:writtenRep "of"@en ] ;
34 lexinfo:partOfSpeech lexinfo:preposition .
The entry also captures that the default domain of the property is http:
//dbpedia.org/ontology/Country in the context of the lexical entry (other
lexical entries might induce other defaults for the domain/range).
Correspondingly, the default range of the property is http://dbpedia.org/on
tology/City. From this lexical entry, two grammar rules are automatically
generated in our approach.</p>
        <p>The 1st grammar rule (Figure 2) is a rule for the English language that is
of type “SENTENCE”, that is, it ‘generates’ full sentences (in our case a full
question). The rule is generated from an entry following the NounPPFrame.
The grammar entry is lexicalized as it refers to specific lexical elements.
It ‘generates’ the following four proto-questions: 1) ‘What is the capital of
X?’, 2) ‘What was the capital of X?’, 3) ‘Which city is the capital of X?’, 4)
‘Which city was the capital of X?’. These are proto-questions in the sense
that some elements need to be inserted at the X position to be a complete
sentence or question. Two different types of elements can be inserted into
the X position. On the one hand, we can insert labels denoting a particular
country, e.g. ‘Germany’. This is referred to as $x in the grammar entry and
any label of a country can be inserted into the position. On the other hand,
a complete noun phrase (NP) denoting a country can be inserted, such as
‘country where German is spoken’, ‘country were Einstein was born’,
‘country governed by Angela Merkel’, etc. $x can be regarded as a preterminal
symbol and COUNTRY_NP as a non-terminal symbol. The grammar entry
de"id": "107",
"language": "EN",
"type": "SENTENCE",
"bindingType": "COUNTRY",
"returnType": "CITY",
"frameType": "NPP",
"sentences": [
"What is the capital of ($x | COUNTRY_NP)?",
"What was the capital of ($x | COUNTRY_NP)?",
"Which city is the capital of ($x | COUNTRY_NP)?",
],"Which city was the capital of ($x | COUNTRY_NP)?"
"queryType": "SELECT",
"sparqlQuery": "(bgp (triple ?subjOfProp &lt;http://dbpedia.org/ontology/capital&gt; ?objOfProp))",
"sentenceToSparqlParameterMapping": {
},"$x": "subjOfProp"
"returnVariable": "objOfProp",
"sentenceBindings": {
"bindingVariableName": "$x",
"bindingList": [{
"label": "Democratic Republic of Afghanistan",
"uri": "http://dbpedia.org/resource/Democratic_Republic_of_Afghanistan"
},}],...
"combination": false
fines the semantics of the questions by way of basic graph patterns (bgps),
in this case the pattern:
(bgp (triple ?subjOfProp &lt;http://dbpedia.org/ontology/capital&gt; ?objOfProp))</p>
        <p>In the entry it is further specified that the fillers of the placeholder $x
are to be substituted into the subjOfProp position, and the objOfProp
represents the return variable of the query. The “sentenceBindings” element
lists all the possible entities that $x can be bound to. This binding list is
only abbreviated in the Figure and is obtained by querying the
corresponding knowledge graph. The flag “combination: false” indicates that this is a
grammar rule with the start symbol at the left side that can not be combined
with another rule.</p>
        <p>The 2nd grammar rule generated for the lemon entry in Figure 1 is given
in Figure 3. This rule generates the noun phrase ‘the capital of $x’ where
any country can be inserted into the $x position. The “returnType” is CITY,
meaning that this noun phrase can be inserted into any other rule requiring
a CITY_NP. The rest of the grammar rule is similar to the case above.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Transitive Verbs</title>
        <p>The second example entry we discuss is the entry given for the verb (to)
‘direct’ given in Figure 4. The lexical entry states that the canonical form
has the written representation ‘direct’. The second person singular written
form is ‘directs’, and the (simple) past form is ’directed’. The semantics of
the verb (to) ‘direct’ is expressed by the property http://dbpedia.org/on
The entry specifies that the subject of the property is realized by the direct
object of the verb ‘direct’ while the object of the property is realized by the
syntactic subject of the verb ‘direct’. The rule in Figure 5 is automatically
generated for the above lemon entry. This rule generates the following four
proto-questions: 1) ‘Which person directed X?’, 2) ‘Which person directs X?’,
3) ‘Who directed X?’, 4) ‘Who directs X?’. At the position X, either labels
of individuals of type Film can be inserted or noun phrases denoting films
(FILM_NP), e.g. ‘films starring Bruce Willis’, ‘films produced before 1999’
etc.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3 Intransitive Verbs with a prepositional adjunct</title>
        <p>As an example of an intransitive verb with a prepositional adjunct we discuss
the verb (to) ‘flow through’. The corresponding lemon entry is given in
Figure 6. According to the lemon entry, the verb has a subject flow_subj and
a prepositional adjunct flow_pobj. The semantics of ‘X flows through Y ’ is
captured by the property http://dbpedia.org/ontology/city, where the
subject of the property is realized by the flow_subj, and the object of the
property is realized by flow_pobj. The entry states that the default domain
1 :lexicon_en a lemon:Lexicon ;
2 lemon:language "en" ;
3 lemon:entry :to_direct .
4
5 :to_direct a lemon:LexicalEntry ;
6 lexinfo:partOfSpeech lexinfo:verb ;
7 lemon:canonicalForm :form_direct ;
8 lemon:otherForm :form_directs ;
9 lemon:otherForm :form_directed ;
10 lemon:synBehavior :direct_frame_transitive ;
11 lemon:sense :direct_ontomap .
12
13 :form_direct a lemon:Form ;
14 lemon:writtenRep "direct"@en ;
15 lexinfo:verbFormMood lexinfo:infinitive .
16
17 :form_directs a lemon:Form ;
18 lemon:writtenRep "directs"@en ;
19 lexinfo:person lexinfo:thirdPerson .
20
21 :form_directed a lemon:Form ;
22 lemon:writtenRep "directed"@en ;
23 lexinfo:tense lexinfo:past .
24
25 :direct_frame_transitive a lexinfo:TransitiveFrame ;
26 lexinfo:subject :direct_subj ;
27 lexinfo:directObject :direct_obj .
28
29 :direct_ontomap a lemon:OntoMap, lemon:LexicalSense ;
30 lemon:ontoMapping :direct_ontomap ;
31 lemon:reference &lt;http://dbpedia.org/ontology/director&gt; ;
32 lemon:subjOfProp :direct_obj ;
33 lemon:objOfProp :direct_subj ;
34 lemon:condition :direct_condition .
35
36 :direct_condition a lemon:condition ;
37 lemon:propertyDomain &lt;http://dbpedia.org/ontology/Film&gt; ;
38 lemon:propertyRange &lt;http://dbpedia.org/ontology/Person&gt; .
of the property is http://dbpedia.org/ontology/River and the default
range of the property is http://dbpedia.org/ontology/City.</p>
        <p>Two grammar rules are generated from the lemon entry given in Figure
6. We do not discuss the grammar rules in detail as the rules follow the
same principles as those described already. The first grammar rule is given
in Figure 7 and generates sentences such as 1) ‘What flows through X?’, 2)
‘What river flows through X?’, 3) ‘Which rivers flow through X?’. The second
rule is given in Figure 7 and generates the following sentences: 1) ‘What
does X flow through?’, 2) ‘Which cities does X flow through?’, 3) ‘Which city
does X flow through?’
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Adjectives</title>
        <p>Adjectives have a slightly different behaviour in terms of grammar rules
generated compared to the entries discussed before. Let’s consider the example
entry for the adjective ‘spanish’ in Figure 8.</p>
        <p>The entry states that the adjective can be used in an attributive frame
(e.g. ‘spanish movie’ ) as well as in a predicative frame (e.g. ‘the movie is
"id": "141",
"language": "EN",
"type": "SENTENCE",
"bindingType": "FILM",
"returnType": "PERSON",
"frameType": "VP",
"sentences": [
"Which person directed ($x | FILM_NP)?",
"Which person directs ($x | FILM_NP)?",
"Who directed ($x | FILM_NP)?",
"Who directs ($x | FILM_NP)?"
],
"queryType": "SELECT",
"sparqlQuery": "(bgp (triple ?subjOfProp &lt;http://dbpedia.org/ontology/director&gt; ?
objOfProp))\n",
"sentenceToSparqlParameterMapping": {</p>
        <p>"$x": "subjOfProp"
},
"returnVariable": "objOfProp",
"sentenceBindings": {
"bindingVariableName": "$x",
"bindingList": [{
"label": "12 Monkeys",
"uri": "http://dbpedia.org/resource/12_Monkeys"
}]
},
"combination": false
spanish’ ). The entry states that the semantics of the adjective ‘spanish’ can
be expressed through a restriction on property http://dbpedia.org/onto
logy/country for the value http://dbpedia.org/resource/Spain. From
this entry, two grammar rules are generated for each class in the ontology
that has instances that are related to http://dbpedia.org/resource/Spain
via the property http://dbpedia.org/ontology/country. In order to
determine these classes, the following query is executed over the knowledge
graph:
SELECT ?y (count(?y) as ?f) ?label WHERE {
?x &lt;http://dbpedia.org/ontology/country&gt;
&lt;http://dbpedia.org/resource/Spain&gt; ;
&lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&gt;
?y . ?y rdfs:label ?label . FILTER ( lang(?label)="en" ) .}
GROUP BY ?y ?label having (count(?y) &gt; 9) order by desc(?f)</p>
        <p>We thus consider only classes that have at least nine instances related to
http://dbpedia.org/resource/Spain via the property http://dbpedia.
org/ontology/country. The class http://dbpedia.org/ontology/movie
is one of the classes that is returned by the above query. We generate two
rules from the adjective lexical entry for ‘Spanish’ for the class http://db
pedia.org/ontology/movie. The first rule is given in Figure 9 and
generates the following full-fledged questions: 1) ‘Which are Spanish movies?’,
‘Which is a Spanish movie?’, ‘Which was a Spanish movie?’, ‘Which were
0/question-grammar-generator. A demo of the system can be found here:</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Preliminary Results</title>
      <p>We have applied our approach to the training dataset of QALD-74 and
created a lexicon to cover the content words in these entries. This yielded a
lexicon with a distribution of frame types as given in Table 1. From this
lexicon, we automatically generated 5269 grammar rules using the approach
described in Section 2.</p>
      <p>4 https://project-hobbit.eu/challenges/qald2017/
1 :lexicon_en a lemon:Lexicon ;
3245 llleeemmmooonnn:::eelnnattnrrgyyuag::essppaa"nneiinss"hh;_r;es .
6 :spanish a lemon:LexicalEntry ;
1101879 lllleeeexmmmiooonnnnf:::ocss:ayepnnnaoBsrneetihOca:favsSlippFoaeornericm:shshp:_slasepennxaisinsenhi_fsa.hot_:tlaredFmjrmeaacmte;i,ve:s;panish_predFrame ;
12 :spanish_lemma lemon:writtenRep "Spanish"@en .
13
14 :spanish_predFrame a lexinfo:AdjectivePredicateFrame ;
1156 lexinfo:copulativeSubject :spanish_PredSynArg .
17 :spanish_attrFrame a lexinfo:AdjectiveAttributiveFrame ;
1189 lexinfo:attributiveArg :spanish_AttrSynArg .
20 :spanish_sense a lemon:LexicalSense ;
222321 lleemmoonn::irseAfe:resnpacneis:hs_pAatntirsShy_nrAersg,; :spanish_PredSynArg .
24 :spanish_res a owl:Restriction ;
2256 oowwll::hoansPVraolpueert&lt;yh&lt;tthpt:tp/:/d//bdpebpdeiad.ioar.ogr/rge/osnotuorcloeg/Syp/caoinun&gt;t.ry&gt; ;</p>
      <p>
        For the questions in the training dataset of QALD-7, we manually rephrased
the questions that did not follow our grammar into the closest grammar
rule capturing the same meaning, if possible. 96 questions were
reformulated in total, this corresponds to a percentage of 44.65% rephrased queries.
While only 30.25% of questions in original form could be transformed into
a SPARQL query yielding at least one answer, with the reformulated
question this number raised to 54.88%. In order to use the grammar to parse
NL queries, we automatically translated our grammar rules into regular
expressions that parse the question and substitute the variables in the basic
graph patterns by the corresponding elements matched by the regular
expression. This is accomplished by looking up the values matched in the NL
question with the labels in the binding lists, replacing the correct URI into
the corresponding variable of the SPARQL query composed out of the
basic graph patterns. Disambiguation happens in this case implicitly as the
URIs selected are only those that ‘fit’ into the corresponding SPARQL
template and are mentioned in the binding list. When parsing the QALD-7
training data queries using our grammar and instantiating the corresponding
SPARQL query, we get the results in Table 2. The table shows the results
of our grammar-based parsing for the original questions in comparison to
the rephrased questions, both for the cases including and excluding the ASK
questions. We see that the results are higher for the case where the ASK
questions are not included as our approach does not support them so far.
We see that with the rephrased queries we get overall a decent result of a
micro F-Measure of 62.5%. Restricting the evaluation only to the SELECT
queries without ASK queries, which our grammar can currently not cover,
we get a micro F-measure of 64.26%.5.
5 The table with all rephrasings for each question can be found here:
https://docs.google.com/spreadsheets/d/1Jjt_ZlDD1zbBXs3Mhdf8kIJCAuv2H
9EMlJKSqyjKabU/edit#gid=2076204763
Most work on question answering over linked data currently builds on
systems that use machine learning to learn a model to map natural language
questions into SPARQL queries (see [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for an overview of deep learning
methods applied to QALD and [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for an overview of recent work on natural
language interfaces to databases). Examples of recent systems that use
machine learning techniques are systems using probabilistic graphical models
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], Bi-directional LSTMs [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Tree-LSTMs [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and more recently
transformers [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] as well as neural machine translation models [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. The
drawback of such systems is that they can only be used for knowledge graphs or
ontologies for which training data is available, limiting the applicability of
such systems in the long tail.
      </p>
      <p>
        Besides systems using machine learning techniques, there are rule-based
systems that rely on a set of rules to map NL questions into a representation
that can be either directly evaluated over the knowledge graph or rely on
graph exploration to map the representation of the NL question into a
fullfledged SPARQL query. An early system relying on rules to map dependency
parse trees of questions into SPARQL queries is Aqualog [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and its
successor Poweraqua [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. For this, Aqualog and Poweraqua rely on a number of
similarity metrics to match the elements in the question to elements in the
knowledge graph and/or RDF data. WDAqua [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is a system that does not
rely on syntactic analysis, but leverages the graph to induce connections
between the entities and words mentioned in an NL question,
constructing different hypothesis for SPARQL queries and ranking them according
to a set of features. If training data is available, a learned linear
combination of the features can be used for ranking. Further, there have been a
number of systems that rely on pre-refined templates [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] or patterns [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
to map queries into a SPARQL query. The above mentioned systems
including WDAqua, Aqualog/PowerAqua and Qakis are systems that in principle
do not require training data to adapt the system to a different domain, as
the interpretation of the question is guided by the underlying knowledge
graph. We have proposed a different paradigm to the above mentioned
systems that rely on the generation of a lexicalized question grammar from a
lemon lexicon that can be used both to generate and parse questions into
SPARQL queries. Our approach has the benefit that the grammars can be
used in a guided QA interface, supporting a user in selecting a question
that approximates his/her information need the best. Further, our approach
has the advantage of supporting the portability across domains without
having to provide training data, albeit requiring the creation of a lemon
lexicon if not available. The basic idea of generating lexicalized grammars from
ontology lexica goes back to our earlier work on generating LTAG
grammars from ontology lexica [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. While we sketched the approach in earlier
work, we only worked on a very restricted domain (GEOBASE) and did not
provide the empirical proof that the approach is feasible. In this paper we
have revisited the approach in the context of the lemon model and
empirically shown on the QALD-7 dataset that our grammar, given the appropriate
rephrasings, has a reasonable recall and precision. Our work is related to
approaches to question answering over linked data building on controlled
languages. The work of Ferré [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], for example, has proposed a controlled
language approach to translating questions into full SPARQL 1.1, supporting
all the relevant SPARQL features including joins, union, optionals, negation,
quantification, aggregation/grouping, etc. However, the language proposed
by Ferré, SQUALL, is not a natural language, but an artificial language
using RDF elements including URIs. In contrast, our goal has been to propose
an approach in which users can formulate their queries in natural language
without knowing anything about the underlying data model or graph. Other
natural language interfaces based on controlled natural language are based
on finite-state automatons that guide a user in composing a question [
        <xref ref-type="bibr" rid="ref12 ref16">16,
12</xref>
        ]. In terms of guided interfaces to question answering over linked data,
different interfaces have been proposed. In our own work, we presented two
interactive guided interfaces that rely on the automatic generation of
questions [
        <xref ref-type="bibr" rid="ref19 ref4">4, 19</xref>
        ]. However, the mechanisms for generating the questions were
quite adhoc. In this paper we have presented a grammar formalism that is
declarative and allows thus to share the grammar as an important asset.
Further, the grammar supports some level of compositionality in that more
complex NPs can be nested into questions. Other interactive interfaces to
RDF data have been proposed as well [
        <xref ref-type="bibr" rid="ref12 ref16 ref24 ref3">16, 12, 24, 3</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>5 Conclusion and Future Work</title>
      <p>In this paper we have presented an approach by which question
answering grammars can be automatically generated from lemon lexica. We have
proposed a specific grammar formalism that has been developed to provide
a basic level of composition. We have presented preliminary results on the
QALD-7 English training dataset showing that our approach is feasible and
provides very good results provided that questions are rephrased with our
grammar and a corresponding lemon lexicon is available. While the need
to rephrase the questions can be seen as a limiting bottleneck, in the
context of a guided question answering interface it is possible to guide users
to formulate questions following the rules of the grammar. In this scenario
our results look promising. The advantage of our approach is that, being
model-based, the governance and control over the lifecycle of the QA
system is more transparent, as the developer can predict the impact of adding
a further lexical entry to the lemon lexicon. This is a significant advantage
over machine learning based systems where the impact of adding a further
example can not always be predicted. Further, many redundant examples
with the same content words might be needed for the machine learning
system to learn a pattern. Our approach can be straightforwardly adapted to
different ontologies and knowledge graphs for which no training datasets
are available, an important advantage compared to ML-based approaches.</p>
      <p>In the future, we will adapt our system to other languages and will
develop approaches that can scale-up the acquisition of a lemon lexicon by
semi-automatic means. We will also evaluate our system on other datasets
including more recent versions of QALD as well as LC-QuAD 2.0.
Acknowledgements: This work has been funded by the European
Commission under grant 825182 (Prêt-à-LLOD) as well as by the project ‘Unbiased
Bots That Build Bridges’ (U3B), funded by VolkswagenStiftung.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Affolter</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stockinger</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A comparative survey of recent natural language interfaces for databases</article-title>
          .
          <source>VLDB Journal 28</source>
          ,
          <fpage>793</fpage>
          -
          <lpage>819</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Athreya</surname>
            ,
            <given-names>R.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Usbeck</surname>
          </string-name>
          , R.:
          <article-title>Template-based question answering using recursive neural networks</article-title>
          .
          <source>CoRR abs/2004</source>
          .13843 (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaufmann</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Göhring</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiefer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Querying ontologies: A controlled english interface for end-users</article-title>
          .
          <source>In: Proc. of the 4th International Semantic Web Conference (ISWC)</source>
          . pp.
          <fpage>112</fpage>
          -
          <lpage>126</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Biermann</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A guided template-based question answering system over knowledge graphs</article-title>
          .
          <source>In: Proc. of the 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW)</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cabrio</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cojan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gandon</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hallili</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Querying multilingual dbpedia with qakis</article-title>
          .
          <source>In: Proc. of the 10th Extended Semantic Web Conference (ESWC)</source>
          ,
          <source>Satellite Events, Revised Selected Papers</source>
          . vol.
          <volume>7955</volume>
          , pp.
          <fpage>194</fpage>
          -
          <lpage>198</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chakraborty</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lukovnikov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maheshwari</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trivedi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Introduction to neural network based approaches for question answering over knowledge graphs</article-title>
          . CoRR abs/
          <year>1907</year>
          .09361 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buitelaar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCrae</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sintek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Lexinfo: A declarative model for the lexicon-ontology interface</article-title>
          .
          <source>J. Web Semant</source>
          .
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <fpage>29</fpage>
          -
          <lpage>51</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Diefenbach</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Both</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maret</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Towards a question answering system over the semantic web</article-title>
          .
          <source>Semantic Web</source>
          <volume>11</volume>
          (
          <issue>3</issue>
          ),
          <fpage>421</fpage>
          -
          <lpage>439</lpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ferré</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>SQUALL: A controlled natural language as expressive as SPARQL 1.1</article-title>
          .
          <source>In: Proc. of 18th International Conference on Applications of Natural Language to Information Systems (NLDB)</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hakimov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jebbara</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>AMUSE: multilingual semantic parsing for question answering over linked data</article-title>
          .
          <source>In: Proc. of the 16th International Semantic Web Conference (ISWC)</source>
          . pp.
          <fpage>329</fpage>
          -
          <lpage>346</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hakimov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jebbara</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Evaluating architectural choices for deep learning approaches for question answering over knowledge bases</article-title>
          .
          <source>In: Proc. of the 13th IEEE International Conference on Semantic Computing (ICSC)</source>
          . pp.
          <fpage>110</fpage>
          -
          <lpage>113</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Karam</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Streibel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karjauv</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coskun</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paschke</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Answering controlled natural language questions over rdf clinical data</article-title>
          .
          <source>In: Proc. of the Demo/- Poster Session of the European Semantic Web Conference (ESWC)</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>López</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernández</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stieler</surname>
          </string-name>
          , N.:
          <article-title>Poweraqua: Supporting users in querying and exploring the semantic web</article-title>
          .
          <source>Semantic Web</source>
          <volume>3</volume>
          (
          <issue>3</issue>
          ),
          <fpage>249</fpage>
          -
          <lpage>265</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>López</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uren</surname>
            ,
            <given-names>V.S.:</given-names>
          </string-name>
          <article-title>Aqualog: An ontology-driven question answering system to interface the semantic web</article-title>
          .
          <source>In: Proc. of the Human Language Technology Conference of the of the North American Chapter of the Association of Computational Linguistics (NAACL)</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lukovnikov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
          </string-name>
          , J.:
          <article-title>Pretrained transformers for simple question answering over knowledge graphs</article-title>
          .
          <source>In: Proc. of the 18th International Semantic Web Conference (ISWC)</source>
          . pp.
          <fpage>470</fpage>
          -
          <lpage>486</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Mazzeo</surname>
            ,
            <given-names>G.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaniolo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Answering controlled natural language questions on rdf knowledge bases</article-title>
          .
          <source>In: Proc. of the 19th International Conference on Extending Database Technology (EDBT)</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>McCrae</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montiel-Ponsoda</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Collaborative semantic editing of linked data lexica</article-title>
          .
          <source>In: Proc. of the Eighth International Conference on Language Resources and Evaluation (LREC)</source>
          . pp.
          <fpage>2619</fpage>
          -
          <lpage>2625</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>McCrae</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spohr</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Linking lexical resources and ontologies on the semantic web with lemon</article-title>
          .
          <source>In: Proc. of the 8th Extended Semantic Web Conference (ESWC)</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Rico</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unger</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Sorry, I only speak natural language: a patternbased, data-driven and guided approach to mapping natural language queries to SPARQL</article-title>
          .
          <source>In: Proc. of the 4th International Workshop on Intelligent Exploration of Semantic Data</source>
          (
          <article-title>IESD) co-located with the 14th</article-title>
          <source>International Semantic Web Conference (ISWC)</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Unger</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bühmann</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gerber</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Template-based question answering over RDF data</article-title>
          .
          <source>In: Proc. of the 21st World Wide Web Conference (WWW)</source>
          . pp.
          <fpage>639</fpage>
          -
          <lpage>648</lpage>
          . ACM (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Unger</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hieber</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Generating LTAG grammars from a lexicon/ontology interface</article-title>
          .
          <source>In: Proc. of the 10th International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG)</source>
          . pp.
          <fpage>61</fpage>
          -
          <lpage>68</lpage>
          . Yale University (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Usbeck</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.C.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haarmann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krithara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Röder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napolitano</surname>
          </string-name>
          , G.:
          <article-title>7th open challenge on question answering over linked data (QALD-7)</article-title>
          .
          <source>In: Semantic Web Evaluation Challenge</source>
          . pp.
          <fpage>59</fpage>
          -
          <lpage>69</lpage>
          . Springer International Publishing (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gromann</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudolph</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Neural machine translating from natural language to SPARQL</article-title>
          . CoRR abs/
          <year>1906</year>
          .09302 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Zafar</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demidova</surname>
          </string-name>
          , E.: Iqa:
          <article-title>Interactive query construction in semantic question answering systems</article-title>
          .
          <source>Journal of Web Semantics</source>
          <volume>64</volume>
          ,
          <issue>100586</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>