<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards an Ontology-Driven Evolutionary Programming-Based Approach for Answering Natural Language Queries against RDF Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sebastian Schrage</string-name>
          <email>schrage@cs.uni-goettingen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wolfgang May</string-name>
          <email>may@cs.uni-goettingen.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georg-August University of Göttingen, Institute, of Computer Science</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Georg-August University of Göttingen, Institute, of Computer Science</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>In this paper, we present an ontology-driven evolutionary learning system for natural language querying of complex relational databases or RDF graphs to give users who are not familiar with formal database query languages the opportunity to express complex queries against a database. This approach learns how to arrange and when to use given functions to process Natural Language Queries (NLQ).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Natural language interfaces for databases (NLIDB) are
likely the easiest way for a user to access a database. It does
not require the user to learn the specific query language
nor the schema or ontology of the data set. But this lack
of knowledge must be compensated by the interface. It not
only has to understand the user input and to extract the
information from the natural language query (NLQ), but
also the user could have a different concept in mind than
the one implemented. This can range from a smaller
deviation in the vocabulary, or using abbreviations or incomplete
names, over using ambiguous formulations to using
relationships which are not in the model or fusing different entities
to one. Therefore, a NLIDB should be flexible enough to
allow the user to operate on her concepts, not on those of the
implementer.</p>
      <p>
        This approach consists of two major parts, the
evolutionary agent framework loosely based on the work of Turk [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
and Hoverd et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and the NLQ-to-SPARQL application
of this framework, which uses pre-processed NLP data by
the Stanford’s NLP Core [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and ontology-based methods
to translate a NLQ into a SPARQL query against a given
RDF database which is described by an ontology. According
to the definition from Vikhar [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] the framework can be
categorized as evolutionary programming, since other than in
genetic algorithms, the structure of the subprograms is fix
and only the execution order of those can differ and other
than in evolutionary strategies the data types of the
solution is not limited to a numeric vector. Due to the nature of
the NLQ it can easily be decomposed into several layers of
sub-objectives, therefore a multiobjective evolutionary
algorithms (MOEAs) approach like extensively investigated
from Li et al.[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] was used. This provides the possibility if
the system is not able to provide the correct solution, to
train it with further example queries of this kind with
corresponding SPARQL queries and the framework extends the
model via the evolutionary learning algorithm to improve it
and to learn this new kind of queries.
2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK.</title>
      <p>
        Basically there are two environments in which NLIDB
systems are developed, for Knowledge Graphs (KG) like
DBpedia[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and for smaller data sets like Mondial [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. For KGs
with huge amounts of data and entities but no reliable or
well-defined ontology given, approaches based on predefined
graph pattern matching like the approach from Steinmetz et
al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or pattern learning approaches like STF from Hu et
al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] relying less on ontologies have shown the most success.
      </p>
      <p>
        On the other hand for smaller data sets with well-defined
ontologies, which is also the scope of this work, approaches
more focused on ontology usage like Athena from Saha et
al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or schemata usage like Precise from Popescu et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
have shown the better results. Both approaches first analyze
the NLQ, then assign values to recognized parts representing
how confident those parts are considered and then try to
connect them to a minimal graph that spans all parts that
are considered evident with weighted edges according to the
confidence values or with high penalties if not found at all.
Structure of the paper: Next, a short overview of the
system architecture is given, followed by description of the
learning framework. Then, the NLQ-to-SPARQL
application is discussed, with a some example queries. If the approach
could not answer a question correctly, a brief explanation is
given why. Last a brief conclusion is given.
3.
      </p>
    </sec>
    <sec id="sec-3">
      <title>SYSTEM OVERVIEW</title>
      <p>
        This approach is based on evolutionary programming. The
central component is an agent whose input is obtained by
preprocessing the NLQ with NLP Core [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and which
outputs a SPARQL query (cf. Figure 1).
      </p>
      <p>
        The system is initialized with an ontology that covers the
application domain. The ontology, given as an OWL
ontology, is analyzed by RDF2SQL [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and the results are stored in
the Semantical Data Dictionary which is a collection of
relational tables stored in an SQL database. When an NLQ is
asked, it is first processed with Core NLP [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] using its part of
speech module, the entity recognition module and the
grammatical dependencies module. Then the preprocessed NLQ
is given to an agent which returns a SPARQL query, which
can be stated against an RDF data storage or further
processed by ODBA applications.
      </p>
      <p>As depicted in Figure 1, at runtime there is a single agent.
During the learning phase, there are multiple agents, and
the learning phase results in the “fittest” agent for a given
learning set, as described in the following section.</p>
    </sec>
    <sec id="sec-4">
      <title>LEARNING FRAMEWORK</title>
      <p>Therefore the structure for agents that are subject to
evolutionary programming has been developed accordingly: The
inner structure of an agent consists of application-specific
nodes. There are different node types, and from each type
there may be multiple instances. The general idea of the
node types is to provide a set of operations which might be
useful for solving the task, but it is not known, which of them
are needed, in which order and in which cases they must be
executed and with which settings, to reach the objectives.
Additionally, there are connections between nodes for the
data flow inside the agent. The information flow is handled in
so called products, which are just an application-specific
predefined encapsulation of arbitrary data types. Which kind
of products and how many at the same time are accepted by
the node is type specific. An agent is a network of such nodes
(for an example see Figure 2) and the computed solution is
returned as a set of products.
4.1</p>
    </sec>
    <sec id="sec-5">
      <title>General Notions</title>
      <p>4.1.1</p>
      <sec id="sec-5-1">
        <title>Agent Configuration</title>
        <p>The configuration, i.e. the concrete internal structure of
an agent, implements its functionality as the cooperation of
the nodes. It is a directed graph (which may contain cycles)
consisting of a set of nodes and a set of connections. There
are input nodes, a single output node, and inner,
processing nodes. The graph must be connected, i.e. no isolated
fragments are allowed.
4.1.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Nodes</title>
        <p>The nodes themselves have all the same general
structure, each node n of type t has at least one input or one
output conduit, usually one or several of both. The conduits
are typed according to which kind of data, called products
(cf. Section 5.2) they communicate. The product types are
organized in a class hierarchy. The input conduits are
enumerated as in1, in2, . . . with types type(ini) ; the output
conduits are enumerated as out1, out2, . . . with type(outj).
There might be several input conduits with the same
product type. Nodes have one or more output conduits for every
product type that it can produce (which in course can be
connected to multiple inputs). A node can generates one or
more results of one or more product types.</p>
        <p>Every node type t implements a certain functionality, which
satisfies a certain signature wrt. its inputs in1, . . . , inc(t), and
out1, . . . outd(t) (i.e. c(t) and d(t) are the indegrees and
outdegrees of nodetype t, resp.),
ft(type(in1), . . ., type(inc(t))) → (type(out1)∗, . . ., type(outd(t))∗),
where the ∗ means that there might be zero, one, or more
elements (e.g., if the node implements a conditional, one out
of two outputs will be set, and if a node cannot do
something useful with the current inputs, no output might be
generated; or if a list is split, the (only) output is fed with
the sequence of all its elements). From a practical point of
view, the output can also bee seen as a set of elements of
arbitrary product types.</p>
        <p>Every output conduit can be connected to multiple input
conduits, and every input conduit can have multiple
incoming connections from output conduits.</p>
        <p>The used product types, the concrete functionality (ft
including the number of input and output conduits, and their
product types) of the nodetypes depend on the application.
The structure of the agent as a graph of nodes of these node
types and conduits connecting compatible output and input
is subject to learning. Usually, it is started with a concrete
proposal of a standard agent which is then improved during
the learning process.
4.2</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>The Evolutionary Process</title>
      <p>
        The evolutionary process controls the evolution of agents
in order to improve their competences. It starts with a set of
mutated standard agents. Then, in an iterative process, the
agents have a chance to change their configuration every
time they reproduce. The basic idea from [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is that each
solution to each problem is assigned an amount of energy. An
agent gets energy for correct solutions of the problems. With
a growing population of agents, the energy is divided by
more agents and pressures them to win more energy overall
and suppresses unlimited growth in numbers.
4.2.1
      </p>
      <sec id="sec-6-1">
        <title>Stepwise Evolution</title>
        <p>The framework is organized as a sequence of runs.
There is a fixed training set T provided by the user consisting
of test pairs t = (pt, solt) consisting of a problem pt and a
corresponding solution. The solutions, and often also their
components, are assigned an initial energy (=value).
Initial and new agents can be created from a problem-specific
standard agent. Each run is done by an agent set, whose
population changes by evolution. All agents have to solve the
problems, and the produced solutions are evaluated. Then,
for each solution (resp. solution component) it is checked to
which extent the solution of an agent matches that solution
component. The energy assigned to the solution components
is distributed to the agents that found it.</p>
        <p>Given a threshold esus that defines how much energy is
required to sustain an agent, the next step is to check which
agents collected at least esus energy. Those are then added to
the agent set of the next round. If an agent earned more than
2esus, it reproduces (i.e., mutates) itself and the offspring is
added to the agent set as well. After being unchanged a
certain number of runs, it can mutate itself.
4.2.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Reproduction</title>
        <p>If e ≥ 2esus for an agent, it reproduces itself and adds
itself and its offspring to the agent set as well. During
reproduction one of two scenarios could happen.</p>
        <p>1.) The offspring is a perfect copy of its parent, without
any changes. This means for the next run there are more
agents of this configuration and during reproduction the
likelihood of successful mutations is higher.</p>
        <p>2.) The offspring is a mutation of the agents, meaning
it makes a random number of changes (based on a
normal distribution centered around a value &gt; 0) on its
nodes and connections. These changes can be adding a new
node/connection, removing a node/connection or changing
the configuration of a node.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>NLQ-TO-SPARQL TRANSLATION</title>
      <p>For every application, the specific node types must be
designed and implemented. This requires a profound idea of
useful small steps of the process. Then, the learning
process consists of combining such local behavior into a smooth
global behavior.</p>
      <p>For the NLQ-to-SPARQL translation, the task of the agent
is to translate the outcome of the Core NLP analysis
into a SPARQL query. So, the solution components
mentioned above are query fragments. There are different
issues to be done and combined by the agent:
Named-EntityRecognition, translation of class names and property names
into the notions of the database (represented by its
ontology), and the structural generation of a SPARQL query by
basic graph patterns (BGPs), logical connectives and
conditions, and to deal with the variables.</p>
      <p>The training set consists of a set of pairs of NLQs and
the corresponding (usually handmade) annotated SPARQL
queries that adhere to conventions to give hints to the
translation of the sentence structure.
5.1</p>
    </sec>
    <sec id="sec-8">
      <title>Ontology Representation and Access</title>
      <p>
        The Semantic Data Dictionary (SDD) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] gives
comprehensive access to the metadata: Basically it contains the
same knowledge as an OWL ontology (where it can be
extracted from to provide originally an OBDA RDF-to-SQL
mapping), extended with knowledge which concrete (sub)classes
provide which properties, and their ranges. It is used here
instead of the OWL ontology because it is easier to access
and does not require further reasoning.
      </p>
      <p>The SDD has no information about the instances in the
data set. Since identifying instances in NLQs is one of the
major tasks for answering them, a data structure for efficient
searching is necessary. Therefore the SDD is extended with
an identifier mapping IM: string → (class, property)∗, e.g.
“Monaco” 7→ ((Country, name), (City, name)). To identify
which properties are potential identifiers for the mapping,
the training set is searched for cases where the SPARQL
solution contains a variable whose name is not equal to its
class – these denote the named entities (Great (Britain), in
the example, whose class is Country). For instances of
these classes all string-valued properties are searched whether
their value equals the name of the training set variable (i.e.,
“Great Britain”). If so, the property is considered as
identifying property and generates an entry for each instance of
this class with this property in the identifier mapping.
5.2</p>
    </sec>
    <sec id="sec-9">
      <title>Products</title>
      <p>
        The products for the NLQ application are divided into two
major groups: primitive and compound products. Usually,
primitive products can contain complex information, but as
a product they are seen as-a-whole (see Table 1 for the exact
data composition of each product, where e.g. most products
carry the position in the sentence from which they have been
derived as information). At the primary stage of the
processing, there are primitive products of type nlpdata which is a
reduced version of the output of NLP Core [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Nlpdata can
be turned either into tripleparts or symbols, which are
primitive products towards the SPARQL side. Triplepart is an
abstract superclass of the product types variable, constant,
and predicate, while symbol is the abstract superclass of the
product types operator (e.g., +, ≤, ≥, =, 6=), aggregation
or except. Products of the type variable can be part of the
solution set (i.e. of the fragments solt,j of the query
expression to be generated) and can generate SPARQL statements
of the type ?x rdf:type class where the class information is
contained in the information of the variable. Constants are
fixed (literal) values from the NLQ, like names or numbers.
Products of the type predicate are a set of properties (i.e.,
the properties used in the ontology that may fit the verbal
query). Products of type except correspond to negation in
the NLQ.
      </p>
      <p>Compound products are either triple, condition or graph
products. Triples always consist of a subject which must be
an object-valued variable, a predicate, and an object which is
also a variable (object- or literal-valued). Note that IRI
constants cannot yet exist, since they do not occur in NLQs;
and constant values occur only in comparisons in conditions.
Triples can be translated directly to SPARQL. Conditions
consist of a left product of the type variable, a right product
of type variable or constant, and one operator. Products of
type graph are basically lists of triples and conditions, but
can also contain primitive products that are not yet
integrated with the rest of the graph.</p>
      <p>For calculating to what extent an agent found a solution
component (i.e., a fragment of the query), the partial tasks
are valuated as sketched in Table 1.
5.3</p>
    </sec>
    <sec id="sec-10">
      <title>Nodes and Operations</title>
      <p>Each node type implements an operation that corresponds
to a single conceptual step. The node types are grouped
into the following four categories: reader, generator, relator,
and reducer. Node types also have parameters to configure
their concrete instances. Parameter settings can be changed
by mutations through evolution. Further, nodes can have a
confidence value which can have the values [evident, derived,
necessary]. Each node gets a confidence value assigned when
created or mutated and gives it to all products. Some nodes
are sensitive to those values and base decisions on them.</p>
      <p>
        The nodes can access the SDD via a SQL database and
WordNet via the API [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In the following, some of the
nodes types are described. For the generated output products,
the components are indexed with their provenance; im
denotes the identifier mappings from the SDD described in
Section 5.1.
5.3.1
      </p>
      <sec id="sec-10-1">
        <title>Reader Node Types</title>
        <p>Reader nodes receive information from the NLP Core
output. Some example for reader nodes are:
Part of Speech: This is the most essential node type of all.
Its only parameter is, which Part-of-Speech tag is handled by
it. A Part-of-Speech node gets the whole set of output POS
from NLP Core and if the POS tag of the incoming POS
matches the parameter of the node, it generates an nlpdata
product with the content {stringpos, lemmapos, positionpos,
POSpos, namedEntitypos}.</p>
        <p>Synonym: Such nodes use WordNet to find the terms used
in the ontology for a word. Therefore, the nodes maintain a
dictionary using each known term term of the ontology and
querying WordNet for synonyms syn of term. If an
nlpdata{syn, . . . } is received, the node replaces it by nlpdata{term,
...}.</p>
        <p>Proper name: The idea of this node type is to find a
sequence of words in the NLQ which together equal a known
identifier in the database, e.g. “Great Britain”. For each
longest exact match in the input, it combines the input nlpdata
products into a single product of type nlpdata.
5.3.2</p>
      </sec>
      <sec id="sec-10-2">
        <title>Generator Node Types</title>
        <p>Generators turn one product into another type of product
using information from the SDD. Some of the more
fundamental ones are:</p>
        <p>Class Variable Generator - CVGen: Such nodes
generate variables which range over a class. Therefore they check
the string and the lemma of an nlpdata and try to find a
matching in the SDD. If it finds a matching class, it generates a
var{namenlp, positionnlp, confidencenode, ClassNameSDD,
false, POS tag nlp}.</p>
        <p>Identifier Node - IdGen: While the CVGen nodes are
responsible for variables ranging over classes, the IdGen
nodes generate products for identifying a specific instance of
a class. Incoming nlpdata is checked for containing a string
or lemma which also occurs in the property value of the
identifier mapping for a property nameim. Then it generates
subj:=var{namenlp, positionnlp, confidencenode, domainSDD,
false} describing the class, pred :=pred{nameim, positionnlp,
confidencenode, propertiesim} for the identifying property, the
literal-valued obj:=var{nameim, positionnlp, confidencenode,
string, true} for the value, a triple{subj,pred,obj} containing
these three triple parts, a val := const{nameim, positionnlp}
and condition{obj, =, val }.
5.3.3</p>
      </sec>
      <sec id="sec-10-3">
        <title>Relator Node Types</title>
        <p>Relators take two or more products and relate them into
a compound product, usually triples or conditions. Such
products are possible fragments of the final query. The modifier
nodes and reducer nodes described below in Sections 5.3.4
and 5.3.5 will remove non-helpful fragments later. E.g. :
Triple Generator - TriGen: This node type generates any
ontologically possible relationship in form of according
triples. For a subject and either a predicate or an object, a filler
for the missing position is generated. Either a var{namepred,
positionpred, confidencenode, ranges(pred )SDD, isLiteral?SDD}
is created as object, or a pred{nameobject, positionobject,
confidencenode, propertiesSDD} is generated where the
properties from the SDD are taken that are defined for the
subject’s class and where the object’s class is in the range.</p>
        <p>For literal-valued properties this is often the only way to
generate the object since they are not of a class of the
ontology and cannot be found by a CVGen.
5.3.4</p>
      </sec>
      <sec id="sec-10-4">
        <title>Modifier Node Types</title>
        <p>Nodes of modifier types perform context-sensitive tasks and
have only one input conduit that accepts graph products.
An Example for this kind is the
Reificator: The goal of nodes of this type is to access
the literal values of attributed relations which are
usually modeled in RDF through reification. While the
terminology of the reified classes is normally not used in NLQ
but the direct relation between the entities is used, like
“percentage of Russia located in Asia”, where
“percentage” seems to be a property of countries, and not, as in
a reified modeling, of an “EncompassedInfo” resource. The
SDD has information about those classes and if
encountered, the reificator breaks down the direct relation into
the detour over the reified class and generates
additionally the predicates for the properties of the reification. The
output are reifvar := var{nameSDD, min(positionA,positionB),
confidencenode, reified classSDD, false, -}, triple{VariableA,
reifiedPropertyASDD, reifvar },
triple{reifvar, reifiedPropertyBinvSDD, VariableB}, and
pred{nameSDD, positionview, confidencenode, propertySDD}
that relate VariableA to a new variable reifvar ranging over
the reified class etc.</p>
        <p>Product
nlpdata
triplepart
variable
constant
predicate
symbol
operator
aggregation
except
triple
condition
graph</p>
        <p>Content
string, lemma, position, named entity tag, POS tag
name, position, confidence
domain(s), isLiteralValued (t/f), POS tag
properties
value, position
type, variable
position
subject (variable), predicate, object (variable/constant)
left (triplepart), operator, right (triplepart)
list of products</p>
      </sec>
      <sec id="sec-10-5">
        <title>Reducer Node Types</title>
        <p>Reducer nodes reduce the number of products circulating
in the agent. Such nodes use the SDD and the context of the
products to remove products that are invalid or considered
not to be helpful. The following nodes are a selection to
demonstrate the general functions of this types.
Fusion Node - Fus: Such nodes reduce the domains or
properties if more precise information is available.
Especially relator nodes often generate two triples describing
the same fact, but since either the predicate or the
object is inferred, often the properties respectively the
domains are too general in the inferred triple part. A
fusion node checks a graph product whether it contains triples
A and B such that subjectA=subjectB, predA ⊆ predB and
objectClassA ⊇ objectClassB, and in this case replaces both
triples by triple{subjectA, predA, objectB}.</p>
        <p>Conflicting literal solver - CSolv: Nodes of this type
react to graph products with multiple object-valued variables
that refer with the same property to a single literal-valued
variable. While this is a valid operation in SPARQL, in NLQ
this is expressed in a way that would trigger the operator
generator, e.g. “where the population is equal ” or “with the
same name”. In this case, it removes all but one of the
conflicting triples, based on the grammatical distance.
5.4</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Standard Agent</title>
      <p>A solid initial basis for the structure of the agents is
constructed (automatically) from the information contained in
the training set and the application-specific nodes and
products. First, for every primitive product type, the set of
POS tags and keywords for node parameters to which they
can correspond is computed. From this, typical agent
substructures for each kind of primitive products, i.e., variables,
properties, operators, and excepts are constructed
algorithmically.</p>
      <p>Next, substructures are generated that depend on how
these primitive products are used by relators (for generating
triples and conditions). Their output conduits are directly
connected to the output node. So far, this is already a very
basic agent. At this point already more than half of the
possible rewarded energy from the used training set is achieved,
but only very simple queries are already sufficiently
answered.</p>
      <p>For achieving better results, better agents must then
evolve from the evolutionary process, where they “learn” to make
use of the context-sensitive nodes.
6.</p>
    </sec>
    <sec id="sec-12">
      <title>EVALUATION</title>
      <p>
        The approach has been tested on the Mondial RDF Data
set with a set of 51 questions. Only a few of them are simple
selections which can be answered in a single SPARQL triple,
instead the focus is on more complex and ambiguous
questions. The standard agent can answer 45% correctly while
the best learned agent was able to give the correct answer to
84% of the questions (examples shown in Table 2). Since the
approach is still under development and some key features
are still missing, mainly the aggregation functions and the
translation from the internal representation into syntactical
correct SPARQL, therefore there is no extensive comparison
with [
        <xref ref-type="bibr" rid="ref4 ref6">6, 4</xref>
        ]. The main problems at the moment are the
distinction between ”and” which mean both sides should have
a certain property like query 11 (Table 2 and ”and” which
mean a union of both sides like query 12. Agents so far only
were able to answer correctly one or the other. Another big
issue is the lexical gap, as already stated from Steinmetz et
al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], e.g. the query 13 is answered wrongly because the
approach is unable to map inhabitants to the property
population and therefore uses a union over all numeric properties
of cities. Further logical concepts are not covered at all,
like the population density in query 15 (only the properties
population and area are existent in the ontology), further
the approach is not aware, what it is describing as a whole,
therefore in query 16 it does not just list all seas, but tries to
find ”world” as an instance, does not succeed and completes
it to a union over several instances with world in their name
like the ”World Health Organization” and the ”World
Trading Union” which are not directly relatable with seas and
drifts into complete nonsense. Both are problems mentioned
by Saha et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] as well and to the best of our
knowledge, these problems have not yet been solved exhaustively for
generic cases.
7.
      </p>
    </sec>
    <sec id="sec-13">
      <title>CONCLUSION</title>
      <p>In this paper, we developed an approach that enables
agents used in artificial life to work as an functional NLIDB.</p>
      <p>Therefore we developed a framework to enable those agents
to solve complex problems (other than surviving in their
environment) which can be broken down into sub-objectives.</p>
      <p>The agents, which are based on evolutionary programming,</p>
      <p>NLQ
Give me all rivers with a length shorter than 100 kilometers.</p>
      <p>List all names except for Deserts.</p>
      <p>Give me everything located in Asia.</p>
      <p>Which cities are in Europe?
What is the depth of the Sea of Japan?
How many percent of India are Sikh?
Give me all cities where the population is greater then the population of the capital of their country.
Show me all waters with their name
Is there a city where the latitude and longitude are equal
Is the percentage of Turkish people greater than the percentage of Croat people in Austria
Which rivers are located in Poland and Germany?
Give me the name of all mountains and islands
Give me all cities that have more than 1000000 inhabitants, and are not located at any river that is more
than 1000 km long
Give me all cities that have a population higher than 1000000, and are not located at any river that is more
than 1000 km long
How high is the population density in Japan?
How many seas are there in the world?
Correct
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✗
✗
✓
✗
✗
had to be extended and transferred from a linear to a
multidimensional evaluation system to cope with the complexity
of NLQ processing. For this purpose an evaluation
technique, which not only takes into account the agents with the
highest score, but also those who have specialized in a new
direction and thus extend the functionality of the whole
approach. The agents have been equipped with specialized
operations for their architecture, but also with many common
ontological or graph pattern based operations and are able
to link them in a meaningful way to transform them into
an NLIDB. The intermediate results, although not yet
final, are comparable with existing approaches. Since some
featrues are still missing and the evaluation is not executed
in SPARQL but in the internal query language for the
moment, this is not precisely comparable. But it gives reasons
for the assumption that this approach might be comparable
to other state of the art approaches and might also provide
additional flexibility in some cases.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , G. Kobilarov et al.
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          .
          <source>In ISWC</source>
          , Springer LNCS 4825, pages
          <fpage>722</fpage>
          -
          <lpage>735</lpage>
          .
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hoverd</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Stepney</surname>
          </string-name>
          .
          <article-title>Energy as a driver of diversity in open-ended evolution</article-title>
          .
          <source>In ECAL 2011</source>
          , pp.
          <fpage>356</fpage>
          -
          <lpage>363</lpage>
          , ACM.
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Surdeanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bauer</surname>
          </string-name>
          et al.
          <article-title>The Stanford CoreNLP natural language processing toolkit</article-title>
          .
          <source>In ACL</source>
          , pages
          <fpage>55</fpage>
          -
          <lpage>60</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>A.-M. Popescu</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Etzioni</surname>
            , and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Kautz</surname>
          </string-name>
          .
          <article-title>Towards a theory of natural language interfaces to databases</article-title>
          .
          <source>In Intelligent User Interfaces</source>
          , pp.
          <fpage>149</fpage>
          -
          <lpage>157</lpage>
          . ACM,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Runge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schrage</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>May</surname>
          </string-name>
          .
          <article-title>Systematical representation of RDF-to-relational mappings for ontology-based data access</article-title>
          .
          <source>Technical report</source>
          , available at https://www.dbis.informatik.uni-goettingen. de/Publics/17/odbase17.html,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Floratou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sankaranarayanan</surname>
          </string-name>
          et al.
          <article-title>Athena: An ontology-driven system for natural language querying over relational data stores</article-title>
          .
          <source>VLDB</source>
          ,
          <volume>9</volume>
          :
          <fpage>1209</fpage>
          -
          <lpage>1220</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>[7] The Mondial database</article-title>
          . http: //dbis.informatik.uni-goettingen.de/Mondial.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fellbaum</surname>
          </string-name>
          (
          <year>1998</year>
          , ed.)
          <article-title>WordNet: An Electronic Lexical Database</article-title>
          . MIT Press.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Zhang.</surname>
          </string-name>
          <article-title>A State-transition Framework to Answer Complex Questions over Knowledge Base</article-title>
          . EMNLP, pp.
          <fpage>2098</fpage>
          -
          <lpage>2108</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Steinmetz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arning</surname>
          </string-name>
          , K.U.
          <article-title>Sattler From Natural Language Questions to SPARQL Queries: A Pattern-based Approach</article-title>
          . BTW pp.
          <fpage>289</fpage>
          -
          <lpage>308</lpage>
          . LNI,
          <year>2019</year>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Turk</surname>
          </string-name>
          .
          <article-title>Sticky feet: Evolution in a multi-creature physical simulation</article-title>
          .
          <source>InALife XII</source>
          , pages
          <fpage>496</fpage>
          -
          <lpage>503</lpage>
          . MITPress,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Vikhar</surname>
          </string-name>
          <article-title>Evolutionary algorithms: A critical review and its future prospects</article-title>
          .
          <source>ICGTSPICC</source>
          .
          <year>2016</year>
          , pp.
          <fpage>261</fpage>
          -
          <lpage>265</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.-L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-R.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-H.</given-names>
            <surname>Zhan</surname>
          </string-name>
          , J. Zhang, ”
          <article-title>A primary Theoretical Study on Decomposition Based Multiobjective Evolutionary Algorithm”</article-title>
          , IEEE, Volume:
          <volume>20</volume>
          , Issue: 4, pp.
          <fpage>563</fpage>
          -
          <lpage>576</lpage>
          ,
          <year>2015</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>