=Paper=
{{Paper
|id=Vol-1080/owled2013_18
|storemode=property
|title=Ontop at Work
|pdfUrl=https://ceur-ws.org/Vol-1080/owled2013_18.pdf
|volume=Vol-1080
|dblpUrl=https://dblp.org/rec/conf/owled/Rodriguez-MuroKZ13
}}
==Ontop at Work==
Ontop at Work Mariano Rodrı́guez-Muro1, Roman Kontchakov2 and Michael Zakharyaschev2 1 Faculty of Computer Science, Free University of Bozen-Bolzano, Italy 2 Department of Computer Science and Information Systems, Birkbeck, University of London, U.K. Abstract. We describe the architecture of the OBDA system Ontop and analyse its performance in a series of experiments. We demonstrate that, for standard ontologies, queries and data stored in relational databases, Ontop is fast, efficient and produces SQL rewritings of high quality. 1 Introduction In this paper, we report on a series of experiments designed to test the perfor- mance of the ontology-based data access (OBDA) system Ontop1 implemented at the Free University of Bozen-Bolzano. Our main concern was the quality of the query rewritings produced automatically by Ontop when given some stan- dard queries, ontologies, databases and mappings from the database schemas to the ontologies. Recall [4] that, in the OBDA paradigm, an ontology defines a high-level global schema of (already existing) data sources and provides a vocabulary for user queries. An OBDA system rewrites such queries into the vocabulary of the data sources and then delegates query evaluation to a relational database management system (RDBMS). The existing query rewriting systems include QuOnto [19], Nyaya [9], Rapid [7], Requiem [17]/Blackout [18], Clipper [8], Prexto [22] and the system of [14] (some of which use datalog engines rather than RDBMSs). To illustrate how an OBDA system works, we take a simplified IMDb database (www.imdb.com/interfaces), whose schema contains relations title[m, t, y] with in- formation about movies (ID, title, production year), and castinfo[p, m, r] with information about movie casts (person ID, movie ID, person role). The users are not supposed to know the structure of the database. Instead, they are given an ontology, say MO (www.movieontology.org), describing the application domain in terms of concepts (classes), such as mo:Movie and mo:Person, and roles and attributes (object and datatype properties), such as mo:cast and mo:year: mo:Movie ≡ ∃mo:title, mo:Movie ⊑ ∃mo:year, mo:Movie ≡ ∃mo:cast, ∃mo:cast− ⊑ mo:Person (we use the description logic parlance of OWL 2 QL). The user can query the data in terms of concepts and roles of the ontology; for example, q(t, y) ← mo:Movie(m), mo:title(m, t), mo:year(m, y), (y > 2010) 1 http://ontop.inf.unibz.it is a query asking for the titles of recent movies with their production year. To rewrite it to an SQL query over the data source, the OBDA system requires a mapping that relates the ontology terms to the database schema; for example: mo:Movie(m), mo:title(m, t), mo:year(m, y) ← title(m, t, y), mo:cast(m, p), mo:Person(p) ← castinfo(p, m, r). By evaluating this mapping over a data instance with, say, castinfo title p m r m t y n37 728 1 728 ‘Django Unchained’ 2012 n38 728 1 we obtain the ground atoms mo:Movie(728), mo:title(728, ‘Django Unchained’), mo:year(728, 2012), mo:Person(n37), mo:cast(728, n37), mo:Person(n38), mo:cast(728, n38) that can be thought of as the ABox over which we can execute the query q(t, y) taking account of the consequences implied by the MO ontology. Such an ABox is not materialised and called virtual [21]. Thus, the OBDA system is facing three tasks: it has to (i ) rewrite the original query to a query over the virtual ABox, (ii ) unfold the rewriting, using the map- ping, into an SQL query, and then (iii ) evaluate it over the data instance using an RDBMS. The idea of OBDA stems from the empirical fact that answering conjunctive queries (CQs) in RDBMSs is very efficient in practice. So one can expect task (iii ) to be smooth provided that the rewriting (i ) and unfolding (ii ) are reasonably small and standard. However, the available experimental data (see, e.g., [20, 3]) as well as the recent complexity-theoretic analysis of rewritings show that they can be pro- hibitively large or complex. First, there exist CQs and ontologies for which any (first-order or datalog) rewriting results in an exponential blowup [12]; the polynomial datalog rewriting of [10] hides this blowup behind the existential quantification over special constants. Second, even for simple and natural on- tologies and CQs, rewritings (i ) become exponential when presented as (most suitable for RDBMSs) unions of CQs (UCQs) because they must include all sub-concepts/roles of each atom in the query induced by the ontology. In Ontop, this bottleneck is tackled by making use of – the tree-witness rewriting [13] that separates the topology of the CQ from the taxonomy defined by the ontology; – an extended mapping (called a T -mapping [21]) that takes account of the taxonomy and can be optimised using database integrity constraints and SQL features; – an unfolding algorithm that employs the semantic query optimisation tech- nique with database integrity constraints to produce small and efficient SQL queries. For example, a rewriting of the query q(t, y) above can be split into the CQ q ′ (t, y) ← ext:Movie(m), mo:title(m, t), mo:year(m, y), (y > 2010) and the datalog rules for the ext:Movie predicate: ext:Movie(m) ← mo:Movie(m), (1) ext:Movie(m) ← mo:cast(m, p), (2) ext:Movie(m) ← mo:title(m, t). (3) The former inherits the topology of the original CQ, while the latter repre- sents the taxonomy defined by the ontology. In theory, the topological part can contain exponentially many rules (reflecting possible matches in the canonical models) [12], but this never happens in practice, and usually there are very few of them (see the experiments below). The taxonomical component is indepen- dent from the CQ and combines with the mapping into a T -mapping [21], which can then be drastically simplified using the database integrity constraints. For example, since castinfo has a foreign key (its movie ID attribute references ID in title), every virtual ABox of IMDb will satisfy the axiom ∃mo:cast ⊑ mo:Movie, making (2) redundant; moreover, (3) and (1) give rise to the same rule, result- ing in a T -mapping with a single rule for mo:Movie. Thus, a rewriting over IMDb ABoxes will be a single CQ. In contrast, any UCQ rewriting over arbi- trary ABoxes contains three CQs which simply duplicate the answers because the data respects the integrity constraints (a query with a few more atoms may give rise to a UCQ rewriting with thousands CQs). By straightforwardly applying the unfolding algorithm to q ′ and the T - mapping M above, we obtain the query q ′′0 (t, y) ← title(m, t0 , y0 ), title(m, t, y1 ), title(m, t2 , y), (y > 2010), which requires two (potentially) expensive Join operations. However, if we use the fact that the ID attribute is a primary key of title (uniquely defining the title and production year), then q ′ can be unfolded into a much simpler q ′′ (t, y) ← title(m, t, y), (y > 2010). In fact, such multiple Joins are very typical in OBDA because n-ary relations of data sources are reified by ontologies into binary roles and attributes. The aim of this paper is to (i ) present the rewriting and optimisation tech- niques that allow Ontop to produce optimised queries as discussed above, and (ii ) evaluate the performance of Ontop using three use cases. We demonstrate that—at least in these cases—Ontop produces query rewritings of reasonably high quality and its performance is comparable to that of traditional RDBMSs. 2 OWL 2 QL and Databases The language of OWL 2 QL contains individual names ai , concept names Ai , and role names Pi (i ≥ 1). Roles R and basic concepts B are defined by R ::= Pi | Pi− , B ::= ⊥ | Ai | ∃R. A TBox (or an ontology), T , is a finite set of inclusions of the form B1 ⊑ B2 , B1 ⊑ ∃R.B2 , B1 ⊓ B2 ⊑ ⊥, R1 ⊑ R2 , R1 ⊓ R2 ⊑ ⊥. An ABox, A, is a set of atoms of the form Ak (ai ) or Pk (ai , aj ). The semantics for OWL 2 QL is defined in the usual way based on interpretations I = (∆I , ·I ) [2]. The set of individual names in A is denoted by ind(A). A conjunctive query q(x) is a first-order formula ∃y φ(x, y), where φ is a conjunction of atoms of the form Ak (t1 ) or Pk (t1 , t2 ), and each ti is a term (an individual or a variable in x or y). We use the datalog notation for CQs, writing q(x) ← φ(x, y) (without existential quantifiers), and call q the head and φ the body of the rule. A tuple a ⊆ ind(A) is a certain answer to q(x) over (T , A) if I |= q(a) for all models I of (T , A); in this case we write (T , A) |= q(a). We assume that the data comes from a relational database rather than an ABox. We view databases [1] as triples (R, Σ, I), where R is a database schema, containing predicate symbols for both stored database relations and views (to- gether with their definitions in terms of stored relations), Σ is a set of integrity constraints over R (in the form of inclusion and functional dependencies), and I is a data instance over R (satisfying Σ). The vocabularies of R and T are linked together by means of mappings. A mapping, M, from R to T is a set of (GAV) rules of the form S(x) ← φ(x, z), where S is a concept or role name in T and φ(x, z) a conjunction of atoms with stored relations and views from R and a filter, that is, a Boolean combination of built-in predicates such as = and <. (Note that, by including views in the schema, we can express any SQL query in mappings.) Given a mapping M, the atoms S(a), for S(x) ← φ(x, z) in M and I |= ∃z φ(a, z), comprise the ABox, AI,M , which is called the virtual ABox for M over I. We can now define certain answers to a CQ q over a TBox T linked by a mapping M to a database (R, Σ, I) as certain answers to q over (T , AI,M ). 3 The Architecture of Ontop We now briefly describe the main ingredients of Ontop: the tree-witness rewriting over complete ABoxes, T -mappings and the unfolding algorithm. Suppose we are given a CQ q over an ontology T and a mapping M from a database schema R to T . The tree-witness rewriting of q and T , denoted q tw , presupposes that the underlying ABox A is H-complete with respect to T in the sense that S(a) ∈ A whenever S ′ (a) ∈ A and T |= S ′ ⊑ S, for all concept names S and basic concepts S ′ and for all role names S and roles S ′ (we identify P − (b, a) and P (a, b) in ABoxes and assume ∃R(a) ∈ A if R(a, b) ∈ A, for some b). An obvious way to define H-complete ABoxes is to take the composition MT of M and the inclusions in T given by A(x) ← φ(x, z), if A′ (x) ← φ(x, z) ∈ M and T |= A′ ⊑ A, A(x) ← φ(x, y, z), if R(x, y) ← φ(x, y, z) ∈ M and T |= ∃R ⊑ A, P (x, y) ← φ(x, y, z), if R(x, y) ← φ(x, y, z) ∈ M and T |= R ⊑ P. (We identify P − (y, x) with P (x, y) in the heads of the mapping rules.) Thus, to compute answers to q over T with M and a database instance I, it suffices to evaluate the rewriting q tw over AI,MT : (T , AI,M ) |= q(a) iff AI,MT |= q tw (a), for any I and a ⊆ ind(AI,M ). standard . virtual mapping database rewritings ABox instance completion UCQ H-complete T -mapping tw-rewriting ABox OBDA systems such as QuOnto [19] and Prexto [22] first construct rewritings over arbitrary ABoxes and only then unfold them, using mappings, into UCQs which are evaluated by an RDBMS (dashed lines above). The same result can be obtained by unfolding rewritings over H-complete ABoxes with the help of the composition MT (solid lines above). However, in practice the resulting UCQs very often turn out to be too large [20]. In Ontop, we also start with MT . But before applying it to unfold q tw , we first simplify and reduce the size of the mapping by exploiting the database integrity constraints. Following [21], a mapping M from R to T is called a T -mapping over integrity constraints Σ if the virtual ABox AI,M is H-complete w.r.t. T , for any data instance I satisfying Σ. (The composition MT is a T -mapping over any Σ.) Ontop transforms MT to a much simpler T -mapping by taking account of database integrity constraints (dependencies), and SQL features such as disjunctions in filter conditions. 3.1 Tree-Witness Rewriting We explain the essence of the tree-witness rewriting using an example. Consider an ontology T with the axioms RA ⊑ ∃worksOn.Project, Project ⊑ ∃isManagedBy.Prof, (4) − worksOn ⊑ involves, isManagedBy ⊑ involves (5) and the CQ asking to find those who work with professors: q(x) ← worksOn(x, y), involves(y, z), Prof(z). Observe that if a model I of (T , A), for some A, contains individuals a ∈ RAI and b ∈ ProjectI , then I must also contain the following fragments: a worksOn u isManagedBy v b isManagedBy w . Prof Prof − involves involves RA involves Project Project Here the points ◦ are not necessarily named individuals from the ABox, but can be generated by the axioms (4) as (anonymous) witnesses for the existential quantifiers. It follows then that a is an answer to q(x) if a ∈ RAI , in which case the atoms of q are mapped to the fragment generated by a as follows: x worksOn y involves z q(x) . Prof u a Prof RA worksOn, involves− Project isManagedBy, involves v Alternatively, if a is in both RAI and Prof I , then we obtain the following match: z Prof involves y q(x) . x worksOn u v a Prof RA worksOn, involves− Project isManagedBy, involves Another option is to map x and y to ABox individuals, a and b, and if b is in ProjectI , then the last two atoms of q can be mapped to the anonymous part: x worksOn y involves z q(x) . Prof b Prof Project isManagedBy, involves w Finally, all the atoms of q can be mapped to ABox individuals. The possible ways of mapping parts of the CQ to the anonymous part of the model are called tree witnesses. The tree witnesses for q found above give the following tree-witness rewriting q tw of q and T over H-complete ABoxes: q tw (x) ← worksOn(x, y), involves(y, z), Prof(z), (6) q tw (x) ← RA(x), (7) q tw (x) ← RA(x), Prof(x), (8) q tw (x) ← worksOn(x, y), Project(y). (9) (Note that q tw is not a rewriting over arbitrary ABoxes.) In theory, the size of the rewriting q tw can be large [12]: there exists a se- quence of q n and Tn generating exponentially many (in |q n |) tree witnesses, and any rewriting of q n and Tn is of exponential size (unless it employs |q n |-many ad- ditional existentially quantified variables [10]). Our experiments (see Section 4) demonstrate, however, that in practice, real-world ontologies and CQs generate small and simple tree-witness rewritings. There are two ways to simplify tree-witness rewritings further. First, we can use a subsumption algorithm to remove redundant CQs from the union: for example, (7) subsumes (8), which can be safely removed. Second, we can reduce the size of the individual CQs in the union using the following observation: for any CQ q (viewed as a set of atoms), q ∪ {A(x), A′ (x)} ≡c q ∪ {A(x)}, if T |= A ⊑ A′ , q ∪ {A(x), R(x, y)} ≡c q ∪ {R(x, y)}, if T |= ∃R ⊑ A, q ∪ {P (x, y), R(x, y)} ≡c q ∪ {R(x, y)}, if T |= R ⊑ P, where ≡c reads ‘has the same certain answers over H-complete ABoxes’ (we again identify P − (y, x) with P (x, y)). Surprisingly, such a simple optimisa- tion, especially for the domain/range constraints, makes rewritings substantially shorter [23, 9]. 3.2 Optimising T -mappings Suppose M ∪ {S(x) ← ψ1 (x, z)} is a T -mapping over Σ. If there is a more specific rule than S(x) ← ψ1 (x, z) in M, then M itself is also a T -mapping. To discover such ‘more specific’ rules, we run the standard query containment check (see, e.g., [1]), but taking account of the inclusion dependencies. For example, since T |= ∃mo:cast ⊑ mo:Movie, the composition MMO of the mapping in the introduction and MO contains the following rules for mo:Movie: mo:Movie(m) ← title(m, t, y), mo:Movie(m) ← castinfo(p, m, r). The latter rule is redundant since IMDb contains the foreign key ( ) ∀m ∃p, r castinfo(p, m, r) → ∃t, y title(m, t, y) . Another way to reduce the size of a T -mapping is to identify pairs of rules whose bodies are equivalent up to filters w.r.t. constant values. For example, the mapping M for IMDb and MO contains 6 rules for sub-concepts of mo:Person: mo:Actor(p) ← castinfo(c, p, m, r), (r = 1), ··· mo:Editor(p) ← castinfo(c, p, m, r), (r = 6). So, the composition MMO contains six rules for mo:Person that differ only in the last condition (r = k), for 1 ≤ k ≤ 6. These can be reduced to a single rule: mo:Person(p) ← castinfo(c, p, m, r), (r = 1) ∨ · · · ∨ (r = 6). Note that such disjunctions lend themselves to efficient evaluation by RDBMSs. 3.3 Unfolding with Semantic Query Optimisation (SQO) The unfolding procedure [19] applies SLD-resolution to q tw and the T -mapping, and returns those rules whose bodies contain only database atoms (cf. partial evaluation in [15]). Ontop applies SQO [6] to rules obtained at the intermedi- ate steps of unfolding. In particular, this eliminates redundant Join operations caused by reification of database relations by means of concepts and roles. We saw in the introduction that the primary key m of title, i.e., following two func- tional dependencies with determinant m: ( ) ∀m ∃y title(m, t1 , y) ∧ ∃y title(m, t2 , y) → (t1 = t2 ) , ( ) ∀m ∃t title(m, t, y1 ) ∧ ∃t title(m, t, y2 ) → (y1 = y2 ) , remove the two Join operations in title(m, t0 , y0 ), title(m, t, y1 ), title(m, t2 , y), resulting in a single atom title(m, t, y). Note that these two Join operations were introduced to reconstruct the ternary relation from its reification by means of roles mo:title and mo:year. The role of SQO in OBDA systems appears to be much more prominent than in conventional RDBMSs, where it was initially proposed to optimise SQL queries. While some of SQO techniques reached industrial RDBMSs, it never had a strong impact on the database community because it is costly compared to statistics- and heuristics-based methods, and because most SQL queries are written by highly-skilled experts (and so are nearly optimal anyway). In OBDA scenarios, in contrast, SQL queries are generated automatically, and so SQO becomes the only tool to avoid redundancy. 4 Experiments We illustrate the performance of Ontop by three use cases. All experiments were run on Ubuntu 12.04 64-bit with an Intel Core i5 650, 4 cores@3.20GHz, 16 GB RAM and 1 TB@7200 rpm HD. We used a Java 7 virtual machine for Ontop with MySQL 5.5 for Cases 1 and 3, and with PostgreSQL 9.1 for Case 2. Full details of the experiments are available at obda.inf.unibz.it/data/owled13. Case 1 is a simulation of a railway network for cargo delivery developed by the University of Genoa with the industrial partner Intermodal Logistics [5]. The ILog ontology, mapping and queries are used to monitor the status of the network. The case includes an ontology with 70 concepts and roles, a mapping with 43 rules and 11 queries (www.mind-lab.it/~gcicala/isf2012). For our ex- periments, we generated data for 30 days. Case 2 uses the Movie Ontology (MO) over the real data from the Internet Movie Database (IMDb) with a mapping created by the Ontop development team. We use nine complex, yet natural queries, e.g., SELECT DISTINCT ?x ?title ?actor name ?prod year ?rating WHERE { ?m a mo:Movie; mo:title ?title; mo:imdbrating ?rating; dbpedia:productionStartYear ?prod year; mo:hasActor ?x; mo:hasDirector ?x . ?x dbpedia:birthName ?actor name . FILTER ( ?rating > ’7.0’ && ?prod year >= 2000 && ?prod year <= 2010 ) } ORDER BY desc(?rating) ?prod year LIMIT 25 (full details are available at the URL above). Most queries are of high selectivity and go beyond CQs, using inequalities, ORDER BY/LIMIT and DISTINCT op- erators. Both the SQL database and the ontology were developed independently by third parties (IMDb and the University of Zurich) for purposes different from benchmarking. Case 3 is based on the Lehigh University Benchmark (LUBM, swat.cse. lehigh.edu/projects/lubm), which comes with an OWL ontology, 14 simple CQs of varying degree of selectivity and a data generator. We approximated the ontology in OWL 2 QL and created a database schema to store the data for 200 universities (1 university ≈ 130K assertions). Note that although the data has some degree of randomness, it is not arbitrary and follows what can be regarded as a natural pattern: each university has 15–25 departments, each department has 7–10 full professors, every person has a name, etc. These considerations were taken into account to produce a normalised database schema with relations of appropriate arity together with primary and foreign keys (instead of the standard universal tables storing RDF triples). case ontology rules in assertions in predicates inclusions mapping virtual ABox ILog 45 70 43 1m IMDb-MO 137 157 271 42m LUBM 75 93 79 26m The tables below give the results of the experiments with ILog and IMDb-MO. no optimisations with optimisations query rewriting unfolding rewriting unfolding SQL execution CQs time CQs time CQs time CQs time time answer size Q1 1 0.005 1 0.004 1 0.000 1 0.004 0.008 582 Q2 1 0.001 4 0.004 1 0.000 1 0.001 0.900 41,594 Q3 1 0.003 1 0.002 1 0.000 1 0.002 0.015 6,710 ILog Q4 1 0.004 1 0.004 1 0.000 1 0.003 0.660 41,000 Q5 1 0.001 2 0.004 1 0.000 2 0.004 0.008 7 Q6 1 0.007 1 0.002 1 0.000 1 0.001 0.009 170 Q7 2 0.001 2 0.002 2 0.000 2 0.001 0.080 269 Q8 1 0.000 1 0.001 1 0.000 1 0.001 0.008 621 Q9 2 0.001 2 0.007 2 0.000 2 0.007 0.009 94 Q10 1 0.004 1 0.001 1 0.000 1 0.002 1.000 22,666 Q1 1 0.008 2 0.003 1 0.010 1 0.003 1.652 25 Q2 1 0.001 2 0.004 1 0.001 1 0.003 3.371 53,400 Q3 1 0.004 7 0.012 1 0.004 1 0.000 1.316 10,681 IMDb Q4 1 0.000 2 0.002 1 0.001 1 0.001 0.219 25 Q5 2 0.010 126 0.037 2 0.016 3 0.006 1.335 327 Q6 1 0.001 2 0.000 1 0.000 1 0.000 0.085 69 Q7 1 0.001 4 0.001 1 0.000 1 0.001 0.070 4 Q8 1 0.000 156 0.026 1 0.000 1 0.003 4.144 3 Q9 1 0.001 1 0.001 1 0.001 1 0.001 0.365 37 We start with a few observations on the performance of query rewriting and unfolding. The size of the produced SQL queries is given in the column CQs under ‘unfolding.’ We can see that the Ontop optimisations produce very few Select-Project-Join queries in the union, quite often only one. In contrast, when the optimisations are disabled, the size of the union is in line with QuOnto, Rapid and Requiem and can grow considerably, in particular, when large hierar- chies are involved (cf. Q5 and Q8 in IMDb-MO where the union contains > 100 queries). To explain this behaviour, we observe first that almost all of the queries in ILog and IMDb-MO have no tree witnesses, and so the rewriting returns the original query (note that Q7, Q9 of ILog have a union in the original query). The only exception is Q5 in IMDb-MO with one tree witness, which generates two CQs in the rewriting. Second, the ratio of the number of rules in a map- ping per concept/role in both scenarios is very low when our optimisations are applied: most have at most one rule (even in the case of large hierarchies with many domain/range axioms). So, the unfolding with such a mapping produces a small number of Select-Project-Join queries in the union. These observa- tions support our claim that, in practice, there are few tree witnesses and that our T -mapping optimisations can handle efficiently concept and role hierarchies, domain and range constraints. The time required for query rewriting and optimisation is negligible and stays within 4ms. In contrast, the time required to generate queries without optimi- sations is higher, especially for queries involving large hierarchies (≥ 25ms): in particular, Q5 and Q8 in IMDb-MO, where our optimisations reduce the time of unfolding from 37/26ms to 3/6ms. Similarly to other systems, Ontop applies CQ containment (CQC) checks to reduce the number of Select-Project-Join queries, and these checks prove to be costly on large unfoldings without opti- misations. Although few milliseconds might seem negligible, the performance of such systems as RDF triple stores and DBs is measured in queries per second and is usually expected to be in thousands. With such requirements, an overhead of 20–30ms per query is not acceptable. The execution time for SQL queries produced by Ontop, in MySQL or Post- gres, is within 100ms for simple, high selectivity queries (with few results). Al- though Q2, Q4 and Q10 in ILog and Q1, Q2, Q3, Q5 and Q8 in IMDb-MO take up to 4s to execute, their SQL rewritings are optimal (in the sense that they coincide with hand-crafted queries), and their relatively long execution time is due to DISTINCT/ORDER BY over large relations. no optimisations with optimisations query rewriting unfolding rewriting unfolding SQL execution CQs time CQs time CQs time CQs time time answer size Q1 1 0.014 1 0.003 1 0.021 1 0.003 0.009 4 Q2 1 0.001 13 0.018 1 0.001 1 0.002 0.012 499 Q3 1 0.000 1 0.001 1 0.000 1 0.001 0.009 6 Q4 1 0.001 14 0.017 1 0.001 10 0.015 0.010 306 Q5 1 0.000 7 0.001 1 0.000 7 0.001 0.011 720 LUBM Q6 1 0.000 2 0.000 1 0.000 2 0.000 10.805 2,088,195 Q7 1 0.000 8 0.006 1 0.000 2 0.001 0.008 67 Q8 1 0.000 2 0.003 1 0.000 2 0.003 0.086 7790 Q9 1 0.001 128 0.115 1 0.000 6 0.006 17.419 54,285 Q10 1 0.000 1 0.000 1 0.000 1 0.000 0.008 4 Q11 1 0.002 1 0.001 1 0.001 1 0.000 0.010 224 Q12 1 0.000 2 0.001 1 0.001 2 0.001 0.007 12 Q13 1 0.001 31 0.015 1 0.000 13 0.001 0.009 917 Q14 1 0.000 1 0.000 1 0.000 1 0.000 4.494 1,584,743 In the LUBM case, the queries have no tree witnesses, which results in tree- witness rewritings that coincide with the original queries. LUBM is, however, the only case where Ontop generated unions with hundreds Select-Project-Join queries, which is due to a higher ratio of mappings per concept/role. This is a consequence of the database structure and the way in which mappings construct object URIs from integers and strings in the database (known as impedance mismatch [19]). The generated SQL queries are still optimal in the sense that they correspond to human-generated queries for the given database schema. Query execution appears to be optimal for all queries (but Q6, Q9 and Q14), with response times under 12ms even for queries with Join and filter operations over large tables. This corresponds to the expected performance of an optimised RDBMS, in which most operations can be performed using in-memory indexes (provided that SQL queries have the right structure for the query planner). Q6, Q9 and Q14 have low selectivity (large number of results) and the execution time is dominated by disk access. It is to be noted that although we used an OWL 2 QL approximation of LUBM, most queries return the same results as for the original LUBM ontology. The only exceptions are Q11 and Q12: all answers to Q11 are recovered with an extra mappings simulating transitivity (up to a predefined depth) by means of self-Joins on the transitive property; similarly, for all answers to Q12, we include an extra mapping rule expressing ∃R.B ⊑ A on the elements of the virtual ABox. The execution times in the table are given for the extensions described above, which ensure completeness (w.r.t. the original LUBM) of the returned answers. Finally, by comparing the performance of Ontop (see ontop.inf.unibz.it) with that of other open-source or commercial systems [11, 16], we see that On- top is much faster than Sesame or Jena (open-source), and similar to OWLIM (commercial), but does not pay the heavy price for inference materialisation, which can take days or hours. 5 Conclusions To conclude, we believe this paper shows that—despite the negative theoretical results on the worst-case OWL 2 QL query rewriting and sometimes disappoint- ing experiences of the first OBDA systems—high-performance OBDA is achiev- able in practice when applied to standard ontologies, queries and data stored in relational databases. In such cases, query rewriting together with SQO and SQL optimisations are fast, efficient and produce SQL queries of high quality. Acknowledgements. We thank G. Cicala and A. Taccella for their help on the ILog experiments and the Ontop development team (J. Hardi, T. Bagosi and M. Slusnys) for their help with the experiments. This work was supported by the EU FP7 project Optique (grant 318338) and UK EPSRC project ExODA (EP/H05099X). References 1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. 2. F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation, and Applica- tions. Cambridge University Press, 2003. 3. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, M. Rodriguez- Muro, R. Rosati, M. Ruzzi, and D. F. Savo. The MASTRO system for ontology- based data access. Semantic Web, 2(1):43–53, 2011. 4. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. Autom. Reasoning, 39(3):385–429, 2007. 5. M. Casu, G. Cicala, and A. Tacchella. Ontology-based data access: An application to intermodal logistics. Information Systems Frontiers, pages 1–23, 2012. 6. U. S. Chakravarthy, D. H. Fishman, and J. Minker. Semantic query optimization in expert systems and database systems. Benjamin-Cummings Publishing Co., Inc., 1986. 7. A. Chortaras, D. Trivela, and G. Stamou. Optimized query rewriting for OWL 2 QL. In Proc. of CADE-23, volume 6803 of LNCS, pages 192–206. Springer, 2011. 8. T. Eiter, M. Ortiz, M. Šimkus, T.-K. Tran, and G. Xiao. Query rewriting for Horn-SHIQ plus rules. In Proc. of AAAI 2012. AAAI Press, 2012. 9. G. Gottlob, G. Orsi, and A. Pieris. Ontological queries: Rewriting and optimiza- tion. In Proc. of ICDE 2011, pages 2–13. IEEE Computer Society, 2011. 10. G. Gottlob and T. Schwentick. Rewriting ontological queries into small nonrecur- sive datalog programs. In Proc. of KR 2012. AAAI Press, 2012. 11. V. Khadilkar, M. Kantarcioglu, B. M. Thuraisingham, and P. Castagna. Jena- HBase: A distributed, scalable and efficient RDF triple store. In Proc. of ISWC, volume 914 of CEUR-WS, 2012. 12. S. Kikot, R. Kontchakov, V. Podolskii, and M. Zakharyaschev. Exponential lower bounds and separation for query rewriting. In Proc. of ICALP 2012, Part II, volume 7392 of LNCS, pages 263–274. Springer, 2012. 13. S. Kikot, R. Kontchakov, and M. Zakharyaschev. Conjunctive query answering with OWL 2 QL. In Proc. of KR 2012. AAAI Press, 2012. 14. M. König, M. Leclère, M.-L. Mugnier, and M. Thomazo. A sound and complete backward chaining algorithm for existential rules. In Proc. of RR 2012, volume 7497 of LNCS, pages 122–138. Springer, 2012. 15. J.W. Lloyd and J.C. Shepherdson. Partial Evaluation in Logic Programming. The Journal of Logic Programming, 11(3-4):217–242, October 1991. 16. Ontotext. OWLIM performance with Jena, 2011. http://www.ontotext.com/ owlim/benchmark-results/owlim-jena-performance. 17. H. Pérez-Urbina, B. Motik, and I. Horrocks. A comparison of query rewriting techniques for DL-lite. In Proc. of DL 2009, volume 477 of CEUR-WS, 2009. 18. H. Pérez-Urbina, E. Rodrı́guez-Dı́az, M. Grove, G. Konstantinidis, and E. Sirin. Evaluation of query rewriting approaches for OWL 2. In Proc. of SSWS+HPCSW 2012, volume 943 of CEUR-WS, 2012. 19. A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati. Linking data to ontologies. J. Data Semantics, 10:133–173, 2008. 20. M. Rodrı́guez-Muro. Tools and Techniques for Ontology Based Data Access in Lightweight Description Logics. PhD thesis, KRDB Research Centre for Knowledge and Data, Free Univ. of Bozen-Bolzano, 2010. 21. M. Rodrı́guez-Muro and D. Calvanese. Dependencies: Making ontology based data access work. In Proc. of AMW 2011, volume 749. CEUR-WS.org, 2011. 22. R. Rosati. Prexto: Query rewriting under extensional constraints in DL-Lite. In Proc. of EWSC 2012, volume 7295 of LNCS, pages 360–374. Springer, 2012. 23. R. Rosati and A. Almatelli. Improving query answering over DL-Lite ontologies. In Proc. of KR 2010. AAAI Press, 2010.