Towards an Ontology-Driven Evolutionary Programming-Based Approach for Answering Natural Language Queries against RDF Data Sebastian Schrage Wolfgang May Georg-August University of Göttingen, Institute Georg-August University of Göttingen, Institute of Computer Science of Computer Science schrage@cs.uni-goettingen.de may@cs.uni-goettingen.de ABSTRACT on is not limited to a numeric vector. Due to the nature of In this paper, we present an ontology-driven evolutionary the NLQ it can easily be decomposed into several layers of learning system for natural language querying of complex sub-objectives, therefore a multiobjective evolutionary al- relational databases or RDF graphs to give users who are gorithms (MOEAs) approach like extensively investigated not familiar with formal database query languages the op- from Li et al.[13] was used. This provides the possibility if portunity to express complex queries against a database. the system is not able to provide the correct solution, to This approach learns how to arrange and when to use given train it with further example queries of this kind with corre- functions to process Natural Language Queries (NLQ). sponding SPARQL queries and the framework extends the model via the evolutionary learning algorithm to improve it and to learn this new kind of queries. 1. INTRODUCTION Natural language interfaces for databases (NLIDB) are li- 2. RELATED WORK. kely the easiest way for a user to access a database. It does Basically there are two environments in which NLIDB sys- not require the user to learn the specific query language tems are developed, for Knowledge Graphs (KG) like DB- nor the schema or ontology of the data set. But this lack pedia[1] and for smaller data sets like Mondial [7]. For KGs of knowledge must be compensated by the interface. It not with huge amounts of data and entities but no reliable or only has to understand the user input and to extract the well-defined ontology given, approaches based on predefined information from the natural language query (NLQ), but graph pattern matching like the approach from Steinmetz et also the user could have a different concept in mind than al. [10] or pattern learning approaches like STF from Hu et the one implemented. This can range from a smaller devia- al. [9] relying less on ontologies have shown the most success. tion in the vocabulary, or using abbreviations or incomplete On the other hand for smaller data sets with well-defined names, over using ambiguous formulations to using relation- ontologies, which is also the scope of this work, approaches ships which are not in the model or fusing different entities more focused on ontology usage like Athena from Saha et to one. Therefore, a NLIDB should be flexible enough to al- al. [6] or schemata usage like Precise from Popescu et al. [4] low the user to operate on her concepts, not on those of the have shown the better results. Both approaches first analyze implementer. the NLQ, then assign values to recognized parts representing This approach consists of two major parts, the evolutiona- how confident those parts are considered and then try to ry agent framework loosely based on the work of Turk [11] connect them to a minimal graph that spans all parts that and Hoverd et al. [2] and the NLQ-to-SPARQL application are considered evident with weighted edges according to the of this framework, which uses pre-processed NLP data by confidence values or with high penalties if not found at all. the Stanford’s NLP Core [3] and ontology-based methods Structure of the paper: Next, a short overview of the to translate a NLQ into a SPARQL query against a given system architecture is given, followed by description of the RDF database which is described by an ontology. According learning framework. Then, the NLQ-to-SPARQL applicati- to the definition from Vikhar [12] the framework can be ca- on is discussed, with a some example queries. If the approach tegorized as evolutionary programming, since other than in could not answer a question correctly, a brief explanation is genetic algorithms, the structure of the subprograms is fix given why. Last a brief conclusion is given. and only the execution order of those can differ and other than in evolutionary strategies the data types of the soluti- 3. SYSTEM OVERVIEW This approach is based on evolutionary programming. The central component is an agent whose input is obtained by preprocessing the NLQ with NLP Core [3] and which out- puts a SPARQL query (cf. Figure 1). The system is initialized with an ontology that covers the application domain. The ontology, given as an OWL ontolo- gy, is analyzed by RDF2SQL [5] and the results are stored in 31st GI-Workshop on Foundations of Databases (Grundlagen von Daten- banken), 11.06.2019 - 14.06.2019, Saarburg, Germany. the Semantical Data Dictionary which is a collection of re- Copyright is held by the author/owner(s). lational tables stored in an SQL database. When an NLQ is 1 the nodes. It is a directed graph (which may contain cycles) consisting of a set of nodes and a set of connections. There are input nodes, a single output node, and inner, proces- sing nodes. The graph must be connected, i.e. no isolated fragments are allowed. 4.1.2 Nodes The nodes themselves have all the same general struc- ture, each node n of type t has at least one input or one output conduit, usually one or several of both. The conduits are typed according to which kind of data, called products (cf. Section 5.2) they communicate. The product types are organized in a class hierarchy. The input conduits are enu- merated as in1 , in2 , . . . with types type(ini ) ; the output conduits are enumerated as out1 , out2 , . . . with type(outj ). There might be several input conduits with the same pro- duct type. Nodes have one or more output conduits for every product type that it can produce (which in course can be connected to multiple inputs). A node can generates one or more results of one or more product types. Every node type t implements a certain functionality, which satisfies a certain signature wrt. its inputs in1 , . . . , inc(t) , and out1 , . . . outd(t) (i.e. c(t) and d(t) are the indegrees and out- degrees of nodetype t, resp.), ft (type(in1 ), . . ., type(inc(t) )) → (type(out1 )∗ , . . ., type(outd(t) )∗ ), Figure 1: Overall architecture of the approach where the ∗ means that there might be zero, one, or more elements (e.g., if the node implements a conditional, one out of two outputs will be set, and if a node cannot do some- asked, it is first processed with Core NLP [3] using its part of thing useful with the current inputs, no output might be speech module, the entity recognition module and the gram- generated; or if a list is split, the (only) output is fed with matical dependencies module. Then the preprocessed NLQ the sequence of all its elements). From a practical point of is given to an agent which returns a SPARQL query, which view, the output can also bee seen as a set of elements of can be stated against an RDF data storage or further pro- arbitrary product types. cessed by ODBA applications. Every output conduit can be connected to multiple input As depicted in Figure 1, at runtime there is a single agent. conduits, and every input conduit can have multiple inco- During the learning phase, there are multiple agents, and ming connections from output conduits. the learning phase results in the “fittest” agent for a given The used product types, the concrete functionality (ft in- learning set, as described in the following section. cluding the number of input and output conduits, and their product types) of the nodetypes depend on the application. 4. LEARNING FRAMEWORK The structure of the agent as a graph of nodes of these node Therefore the structure for agents that are subject to evo- types and conduits connecting compatible output and input lutionary programming has been developed accordingly: The is subject to learning. Usually, it is started with a concrete inner structure of an agent consists of application-specific proposal of a standard agent which is then improved during nodes. There are different node types, and from each type the learning process. there may be multiple instances. The general idea of the node types is to provide a set of operations which might be 4.2 The Evolutionary Process useful for solving the task, but it is not known, which of them The evolutionary process controls the evolution of agents are needed, in which order and in which cases they must be in order to improve their competences. It starts with a set of executed and with which settings, to reach the objectives. mutated standard agents. Then, in an iterative process, the Additionally, there are connections between nodes for the agents have a chance to change their configuration every data flow inside the agent. The information flow is handled in time they reproduce. The basic idea from [2] is that each so called products, which are just an application-specific pre- solution to each problem is assigned an amount of energy. An defined encapsulation of arbitrary data types. Which kind agent gets energy for correct solutions of the problems. With of products and how many at the same time are accepted by a growing population of agents, the energy is divided by the node is type specific. An agent is a network of such nodes more agents and pressures them to win more energy overall (for an example see Figure 2) and the computed solution is and suppresses unlimited growth in numbers. returned as a set of products. 4.2.1 Stepwise Evolution 4.1 General Notions The framework is organized as a sequence of runs. The- re is a fixed training set T provided by the user consisting 4.1.1 Agent Configuration of test pairs t = (pt , solt ) consisting of a problem pt and a The configuration, i.e. the concrete internal structure of corresponding solution. The solutions, and often also their an agent, implements its functionality as the cooperation of components, are assigned an initial energy (=value). Initi- 2 For the NLQ-to-SPARQL translation, the task of the agent is to translate the outcome of the Core NLP analysis in- to a SPARQL query. So, the solution components mentio- ned above are query fragments. There are different issu- es to be done and combined by the agent: Named-Entity- Recognition, translation of class names and property names into the notions of the database (represented by its ontolo- gy), and the structural generation of a SPARQL query by basic graph patterns (BGPs), logical connectives and condi- tions, and to deal with the variables. The training set consists of a set of pairs of NLQs and the corresponding (usually handmade) annotated SPARQL queries that adhere to conventions to give hints to the trans- lation of the sentence structure. 5.1 Ontology Representation and Access The Semantic Data Dictionary (SDD) [5] gives compre- hensive access to the metadata: Basically it contains the sa- me knowledge as an OWL ontology (where it can be extrac- ted from to provide originally an OBDA RDF-to-SQL map- Figure 2: Visual sketch of a learned agent. Each node ping), extended with knowledge which concrete (sub)classes is represented as a circle and each type has a distinct color and provide which properties, and their ranges. It is used here icon. The size of the glow around each node represents its activity instead of the OWL ontology because it is easier to access and the smaller dots represent the conduits of the nodes. and does not require further reasoning. The SDD has no information about the instances in the data set. Since identifying instances in NLQs is one of the al and new agents can be created from a problem-specific major tasks for answering them, a data structure for efficient standard agent. Each run is done by an agent set, whose po- searching is necessary. Therefore the SDD is extended with pulation changes by evolution. All agents have to solve the an identifier mapping IM: string → (class, property)∗ , e.g. problems, and the produced solutions are evaluated. Then, “Monaco” 7→ ((Country, name), (City, name)). To identify for each solution (resp. solution component) it is checked to which properties are potential identifiers for the mapping, which extent the solution of an agent matches that solution the training set is searched for cases where the SPARQL component. The energy assigned to the solution components solution contains a variable whose name is not equal to its is distributed to the agents that found it. class – these denote the named entities (Great (Britain), in Given a threshold esus that defines how much energy is the example, whose class is Country). For instances of the- required to sustain an agent, the next step is to check which se classes all string-valued properties are searched whether agents collected at least esus energy. Those are then added to their value equals the name of the training set variable (i.e., the agent set of the next round. If an agent earned more than “Great Britain”). If so, the property is considered as iden- 2esus , it reproduces (i.e., mutates) itself and the offspring is tifying property and generates an entry for each instance of added to the agent set as well. After being unchanged a this class with this property in the identifier mapping. certain number of runs, it can mutate itself. 4.2.2 Reproduction 5.2 Products If e ≥ 2esus for an agent, it reproduces itself and adds The products for the NLQ application are divided into two itself and its offspring to the agent set as well. During re- major groups: primitive and compound products. Usually, production one of two scenarios could happen. primitive products can contain complex information, but as 1.) The offspring is a perfect copy of its parent, without a product they are seen as-a-whole (see Table 1 for the exact any changes. This means for the next run there are more data composition of each product, where e.g. most products agents of this configuration and during reproduction the li- carry the position in the sentence from which they have been kelihood of successful mutations is higher. derived as information). At the primary stage of the proces- 2.) The offspring is a mutation of the agents, meaning sing, there are primitive products of type nlpdata which is a it makes a random number of changes (based on a nor- reduced version of the output of NLP Core [3]. Nlpdata can mal distribution centered around a value > 0) on its no- be turned either into tripleparts or symbols, which are pri- des and connections. These changes can be adding a new mitive products towards the SPARQL side. Triplepart is an node/connection, removing a node/connection or changing abstract superclass of the product types variable, constant, the configuration of a node. and predicate, while symbol is the abstract superclass of the product types operator (e.g., +, ≤, ≥, =, 6=), aggregation or except. Products of the type variable can be part of the 5. NLQ-TO-SPARQL TRANSLATION solution set (i.e. of the fragments solt,j of the query expres- For every application, the specific node types must be de- sion to be generated) and can generate SPARQL statements signed and implemented. This requires a profound idea of of the type ?x rdf:type class where the class information is useful small steps of the process. Then, the learning pro- contained in the information of the variable. Constants are cess consists of combining such local behavior into a smooth fixed (literal) values from the NLQ, like names or numbers. global behavior. Products of the type predicate are a set of properties (i.e., 3 the properties used in the ontology that may fit the verbal Class Variable Generator - CVGen: Such nodes gene- query). Products of type except correspond to negation in rate variables which range over a class. Therefore they check the NLQ. the string and the lemma of an nlpdata and try to find a mat- Compound products are either triple, condition or graph ching in the SDD. If it finds a matching class, it generates a products. Triples always consist of a subject which must be var{namenlp , positionnlp , confidencenode , ClassNameSDD , fal- an object-valued variable, a predicate, and an object which is se, POS tag nlp }. also a variable (object- or literal-valued). Note that IRI con- Identifier Node - IdGen: While the CVGen nodes are stants cannot yet exist, since they do not occur in NLQs; responsible for variables ranging over classes, the IdGen no- and constant values occur only in comparisons in conditions. des generate products for identifying a specific instance of Triples can be translated directly to SPARQL. Conditions a class. Incoming nlpdata is checked for containing a string consist of a left product of the type variable, a right product or lemma which also occurs in the property value of the of type variable or constant, and one operator. Products of identifier mapping for a property nameim . Then it generates type graph are basically lists of triples and conditions, but subj:=var{namenlp , positionnlp , confidencenode , domainSDD , can also contain primitive products that are not yet integra- false} describing the class, pred:=pred{nameim , positionnlp , ted with the rest of the graph. confidencenode , propertiesim } for the identifying property, the For calculating to what extent an agent found a solution literal-valued obj:=var{nameim , positionnlp , confidencenode , component (i.e., a fragment of the query), the partial tasks string, true} for the value, a triple{subj,pred,obj} containing are valuated as sketched in Table 1. these three triple parts, a val:= const{nameim , positionnlp } and condition{obj, =, val}. 5.3 Nodes and Operations Each node type implements an operation that corresponds 5.3.3 Relator Node Types to a single conceptual step. The node types are grouped Relators take two or more products and relate them into into the following four categories: reader, generator, relator, a compound product, usually triples or conditions. Such pro- and reducer. Node types also have parameters to configure ducts are possible fragments of the final query. The modifier their concrete instances. Parameter settings can be changed nodes and reducer nodes described below in Sections 5.3.4 by mutations through evolution. Further, nodes can have a and 5.3.5 will remove non-helpful fragments later. E.g. : confidence value which can have the values [evident, derived, necessary]. Each node gets a confidence value assigned when Triple Generator - TriGen: This node type generates any created or mutated and gives it to all products. Some nodes ontologically possible relationship in form of according trip- are sensitive to those values and base decisions on them. les. For a subject and either a predicate or an object, a filler The nodes can access the SDD via a SQL database and for the missing position is generated. Either a var{namepred , WordNet via the API [8]. In the following, some of the no- positionpred , confidencenode , ranges(pred)SDD , isLiteral?SDD } des types are described. For the generated output products, is created as object, or a pred{nameobject , positionobject , the components are indexed with their provenance; im de- confidencenode , propertiesSDD } is generated where the pro- notes the identifier mappings from the SDD described in perties from the SDD are taken that are defined for the Section 5.1. subject’s class and where the object’s class is in the range. For literal-valued properties this is often the only way to 5.3.1 Reader Node Types generate the object since they are not of a class of the onto- Reader nodes receive information from the NLP Core out- logy and cannot be found by a CVGen. put. Some example for reader nodes are: Part of Speech: This is the most essential node type of all. 5.3.4 Modifier Node Types Its only parameter is, which Part-of-Speech tag is handled by Nodes of modifier types perform context-sensitive tasks and it. A Part-of-Speech node gets the whole set of output POS have only one input conduit that accepts graph products. from NLP Core and if the POS tag of the incoming POS An Example for this kind is the matches the parameter of the node, it generates an nlpdata Reificator: The goal of nodes of this type is to access product with the content {stringpos , lemmapos , positionpos , the literal values of attributed relations which are usual- POSpos , namedEntitypos }. ly modeled in RDF through reification. While the termi- Synonym: Such nodes use WordNet to find the terms used nology of the reified classes is normally not used in NLQ in the ontology for a word. Therefore, the nodes maintain a but the direct relation between the entities is used, like dictionary using each known term term of the ontology and “percentage of Russia located in Asia”, where “percenta- querying WordNet for synonyms syn of term. If an nlpda- ge” seems to be a property of countries, and not, as in ta{syn, . . . } is received, the node replaces it by nlpdata{term, a reified modeling, of an “EncompassedInfo” resource. The ...}. SDD has information about those classes and if encoun- Proper name: The idea of this node type is to find a se- tered, the reificator breaks down the direct relation into quence of words in the NLQ which together equal a known the detour over the reified class and generates additional- identifier in the database, e.g. “Great Britain”. For each lon- ly the predicates for the properties of the reification. The gest exact match in the input, it combines the input nlpdata output are reifvar := var{name SDD , min(positionA ,positionB ), products into a single product of type nlpdata. confidencenode , reified classSDD , false, -}, triple{VariableA , reifiedPropertyASDD , reifvar }, 5.3.2 Generator Node Types triple{reifvar, reifiedPropertyBinvSDD , VariableB }, and Generators turn one product into another type of product pred{nameSDD , positionview , confidencenode , propertySDD } using information from the SDD. Some of the more funda- that relate VariableA to a new variable reifvar ranging over mental ones are: the reified class etc. 4 Table 1: Overview of all Product types. Product Content Superclass success calculation (sketch) nlpdata string, lemma, position, named entity tag, POS tag product none triplepart name, position, confidence product (abstract class) variable domain(s), isLiteralValued (t/f), POS tag triplepart name + position + domain + all constant - triplepart name predicate properties triplepart name + position + properties + all symbol - product (abstract class) operator value, position symbol value + position aggregation type, variable symbol type + variable + all except position symbol position triple subject (variable), predicate, object (variable/constant) product subject + predicate + object + all condition left (triplepart), operator, right (triplepart) product left + operator + right + all graph list of products product sum of its parts 5.3.5 Reducer Node Types ve from the evolutionary process, where they “learn” to make Reducer nodes reduce the number of products circulating use of the context-sensitive nodes. in the agent. Such nodes use the SDD and the context of the products to remove products that are invalid or considered 6. EVALUATION not to be helpful. The following nodes are a selection to The approach has been tested on the Mondial RDF Data demonstrate the general functions of this types. set with a set of 51 questions. Only a few of them are simple Fusion Node - Fus: Such nodes reduce the domains or selections which can be answered in a single SPARQL triple, properties if more precise information is available. Espe- instead the focus is on more complex and ambiguous ques- cially relator nodes often generate two triples describing tions. The standard agent can answer 45% correctly while the same fact, but since either the predicate or the ob- the best learned agent was able to give the correct answer to ject is inferred, often the properties respectively the do- 84% of the questions (examples shown in Table 2). Since the mains are too general in the inferred triple part. A fusi- approach is still under development and some key features on node checks a graph product whether it contains triples are still missing, mainly the aggregation functions and the A and B such that subjectA =subjectB , pred A ⊆ pred B and translation from the internal representation into syntactical objectClass A ⊇ objectClass B , and in this case replaces both correct SPARQL, therefore there is no extensive comparison triples by triple{subjectA , predA , objectB }. with [6, 4]. The main problems at the moment are the dis- tinction between ”and” which mean both sides should have Conflicting literal solver - CSolv: Nodes of this type re- a certain property like query 11 (Table 2 and ”and” which act to graph products with multiple object-valued variables mean a union of both sides like query 12. Agents so far only that refer with the same property to a single literal-valued were able to answer correctly one or the other. Another big variable. While this is a valid operation in SPARQL, in NLQ issue is the lexical gap, as already stated from Steinmetz et this is expressed in a way that would trigger the operator al. [10], e.g. the query 13 is answered wrongly because the generator, e.g. “where the population is equal ” or “with the approach is unable to map inhabitants to the property popu- same name”. In this case, it removes all but one of the con- lation and therefore uses a union over all numeric properties flicting triples, based on the grammatical distance. of cities. Further logical concepts are not covered at all, li- ke the population density in query 15 (only the properties 5.4 Standard Agent population and area are existent in the ontology), further the approach is not aware, what it is describing as a whole, A solid initial basis for the structure of the agents is con- therefore in query 16 it does not just list all seas, but tries to structed (automatically) from the information contained in find ”world” as an instance, does not succeed and completes the training set and the application-specific nodes and pro- it to a union over several instances with world in their name ducts. First, for every primitive product type, the set of like the ”World Health Organization” and the ”World Tra- POS tags and keywords for node parameters to which they ding Union” which are not directly relatable with seas and can correspond is computed. From this, typical agent sub- drifts into complete nonsense. Both are problems mentioned structures for each kind of primitive products, i.e., variables, by Saha et al. [6] as well and to the best of our knowled- properties, operators, and excepts are constructed algorith- ge, these problems have not yet been solved exhaustively for mically. generic cases. Next, substructures are generated that depend on how these primitive products are used by relators (for generating triples and conditions). Their output conduits are directly 7. CONCLUSION connected to the output node. So far, this is already a very In this paper, we developed an approach that enables basic agent. At this point already more than half of the pos- agents used in artificial life to work as an functional NLIDB. sible rewarded energy from the used training set is achieved, Therefore we developed a framework to enable those agents but only very simple queries are already sufficiently answe- to solve complex problems (other than surviving in their en- red. vironment) which can be broken down into sub-objectives. For achieving better results, better agents must then evol- The agents, which are based on evolutionary programming, 5 Nr NLQ Correct 1 Give me all rivers with a length shorter than 100 kilometers. ✓ 2 List all names except for Deserts. ✓ 3 Give me everything located in Asia. ✓ 4 Which cities are in Europe? ✓ 5 What is the depth of the Sea of Japan? ✓ 6 How many percent of India are Sikh? ✓ 7 Give me all cities where the population is greater then the population of the capital of their country. ✓ 8 Show me all waters with their name ✓ 9 Is there a city where the latitude and longitude are equal ✓ 10 Is the percentage of Turkish people greater than the percentage of Croat people in Austria ✓ 11 Which rivers are located in Poland and Germany? ✓ 12 Give me the name of all mountains and islands ✗ 13 Give me all cities that have more than 1000000 inhabitants, and are not located at any river that is more ✗ than 1000 km long 14 Give me all cities that have a population higher than 1000000, and are not located at any river that is more ✓ than 1000 km long 15 How high is the population density in Japan? ✗ 16 How many seas are there in the world? ✗ Table 2: Example queries from the test set had to be extended and transferred from a linear to a multi- language querying over relational data stores. VLDB, dimensional evaluation system to cope with the complexity 9:1209–1220, 2016. of NLQ processing. For this purpose an evaluation techni- [7] The Mondial database. http: que, which not only takes into account the agents with the //dbis.informatik.uni-goettingen.de/Mondial. highest score, but also those who have specialized in a new [8] C. Fellbaum (1998, ed.) WordNet: An Electronic direction and thus extend the functionality of the whole ap- Lexical Database. MIT Press. proach. The agents have been equipped with specialized ope- [9] S. Hu, L. Zou, X. Zhang. A State-transition rations for their architecture, but also with many common Framework to Answer Complex Questions over ontological or graph pattern based operations and are able Knowledge Base. EMNLP, pp. 2098-2108, 2018. to link them in a meaningful way to transform them into [10] N. Steinmetz, A. Arning, K.U. Sattler From Natural an NLIDB. The intermediate results, although not yet fi- Language Questions to SPARQL Queries: A nal, are comparable with existing approaches. Since some Pattern-based Approach. BTW pp. 289-308. LNI, featrues are still missing and the evaluation is not executed 2019 in SPARQL but in the internal query language for the mo- [11] G. Turk. Sticky feet: Evolution in a multi-creature ment, this is not precisely comparable. But it gives reasons physical simulation. InALife XII, pages 496-503. for the assumption that this approach might be comparable MITPress, 2010. to other state of the art approaches and might also provide additional flexibility in some cases. [12] P. A. Vikhar Evolutionary algorithms: A critical review and its future prospects. ICGTSPICC. 2016, pp. 261-265. 8. REFERENCES [13] Y.-L. Li, Y.-R. Zhou, Z.-H. Zhan, J. Zhang, ”A [1] S. Auer, C. Bizer, G. Kobilarov et al. Dbpedia: A primary Theoretical Study on Decomposition Based nucleus for a web of open data. In ISWC, Springer Multiobjective Evolutionary Algorithm”, IEEE, LNCS 4825, pages 722–735. 2007. Volume: 20, Issue: 4, pp. 563-576, 2015 [2] T. Hoverd and S. Stepney. Energy as a driver of diversity in open-ended evolution. In ECAL 2011, pp. 356-363, ACM. 2011. [3] C. D. Manning, M. Surdeanu, J. Bauer et al. The Stanford CoreNLP natural language processing toolkit. In ACL, pages 55–60, 2014. [4] A.-M. Popescu, O. Etzioni, and H. Kautz. Towards a theory of natural language interfaces to databases. In Intelligent User Interfaces, pp. 149–157. ACM, 2003. [5] L. Runge, S. Schrage, and W. May. Systematical representation of RDF-to-relational mappings for ontology-based data access. Technical report, available at https://www.dbis.informatik.uni-goettingen. de/Publics/17/odbase17.html, 2017. [6] D. Saha, A. Floratou, K. Sankaranarayanan et al. Athena: An ontology-driven system for natural 6