<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mariano Rodr guez-Muro1, Roman Kontchakov2 and Michael Zakharyaschev2</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Faculty of Computer Science</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Free University of Bozen-Bolzano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Department of Computer Science</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Information Systems</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Birkbeck, University of London</institution>
          ,
          <country country="UK">U.K</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe the architecture of the OBDA system Ontop and analyse its performance in a series of experiments. We demonstrate that, for standard ontologies, queries and data stored in relational databases, Ontop is fast, e cient and produces SQL rewritings of high quality. In this paper, we report on a series of experiments designed to test the performance of the ontology-based data access (OBDA) system Ontop1 implemented at the Free University of Bozen-Bolzano. Our main concern was the quality of the query rewritings produced automatically by Ontop when given some standard queries, ontologies, databases and mappings from the database schemas to the ontologies. Recall [4] that, in the OBDA paradigm, an ontology de nes a high-level global schema of (already existing) data sources and provides a vocabulary for user queries. An OBDA system rewrites such queries into the vocabulary of the data sources and then delegates query evaluation to a relational database management system (RDBMS). The existing query rewriting systems include QuOnto [19], Nyaya [9], Rapid [7], Requiem [17]/Blackout [18], Clipper [8], Prexto [22] and the system of [14] (some of which use datalog engines rather than RDBMSs). To illustrate how an OBDA system works, we take a simpli ed IMDb database (www.imdb.com/interfaces), whose schema contains relations title[m; t; y] with information about movies (ID, title, production year), and castinfo[p; m; r] with information about movie casts (person ID, movie ID, person role). The users are not supposed to know the structure of the database. Instead, they are given an ontology, say MO (www.movieontology.org), describing the application domain in terms of concepts (classes), such as mo:Movie and mo:Person, and roles and attributes (object and datatype properties), such as mo:cast and mo:year:</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>mo:Movie
mo:Movie
9mo:title;
9mo:cast;
mo:Movie ⊑ 9mo:year;
9mo:cast ⊑ mo:Person
(we use the description logic parlance of OWL 2 QL). The user can query the
data in terms of concepts and roles of the ontology; for example,
q(t; y)</p>
      <p>mo:Movie(m); mo:title(m; t); mo:year(m; y); (y &gt; 2010)
1 http://ontop.inf.unibz.it
is a query asking for the titles of recent movies with their production year. To
rewrite it to an SQL query over the data source, the OBDA system requires a
mapping that relates the ontology terms to the database schema; for example:
mo:Movie(m); mo:title(m; t); mo:year(m; y)
mo:cast(m; p); mo:Person(p)
title(m; t; y);
castinfo(p; m; r):
By evaluating this mapping over a data instance with, say,
7m28 `Django tUtitnlechained' 20y12 nnp3378cas77tmi22n88fo
that can be thought of as the ABox over which we can execute the query q(t; y)
taking account of the consequences implied by the MO ontology. Such an ABox
is not materialised and called virtual [21].</p>
      <p>Thus, the OBDA system is facing three tasks: it has to (i ) rewrite the original
query to a query over the virtual ABox, (ii ) unfold the rewriting, using the
mapping, into an SQL query, and then (iii ) evaluate it over the data instance using
an RDBMS. The idea of OBDA stems from the empirical fact that answering
conjunctive queries (CQs) in RDBMSs is very e cient in practice. So one can
expect task (iii ) to be smooth provided that the rewriting (i ) and unfolding (ii )
are reasonably small and standard.</p>
      <p>However, the available experimental data (see, e.g., [20, 3]) as well as the
recent complexity-theoretic analysis of rewritings show that they can be
prohibitively large or complex. First, there exist CQs and ontologies for which
any ( rst-order or datalog) rewriting results in an exponential blowup [12]; the
polynomial datalog rewriting of [10] hides this blowup behind the existential
quanti cation over special constants. Second, even for simple and natural
ontologies and CQs, rewritings (i ) become exponential when presented as (most
suitable for RDBMSs) unions of CQs (UCQs) because they must include all
sub-concepts/roles of each atom in the query induced by the ontology.</p>
      <p>In Ontop, this bottleneck is tackled by making use of
{ the tree-witness rewriting [13] that separates the topology of the CQ from
the taxonomy de ned by the ontology;
{ an extended mapping (called a T -mapping [21]) that takes account of the
taxonomy and can be optimised using database integrity constraints and
SQL features;
{ an unfolding algorithm that employs the semantic query optimisation
technique with database integrity constraints to produce small and e cient SQL
queries.</p>
      <p>For example, a rewriting of the query q(t; y) above can be split into the CQ
q′(t; y)</p>
      <p>
        ext:Movie(m); mo:title(m; t); mo:year(m; y); (y &gt; 2010)
and the datalog rules for the ext:Movie predicate:
ext:Movie(m)
ext:Movie(m)
ext:Movie(m)
mo:Movie(m);
mo:cast(m; p);
mo:title(m; t):
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(3)
The former inherits the topology of the original CQ, while the latter
represents the taxonomy de ned by the ontology. In theory, the topological part can
contain exponentially many rules (re ecting possible matches in the canonical
models) [12], but this never happens in practice, and usually there are very few
of them (see the experiments below). The taxonomical component is
independent from the CQ and combines with the mapping into a T -mapping [21], which
can then be drastically simpli ed using the database integrity constraints. For
example, since castinfo has a foreign key (its movie ID attribute references ID in
title), every virtual ABox of IMDb will satisfy the axiom 9mo:cast ⊑ mo:Movie,
making (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) redundant; moreover, (3) and (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) give rise to the same rule,
resulting in a T -mapping with a single rule for mo:Movie. Thus, a rewriting over
IMDb ABoxes will be a single CQ. In contrast, any UCQ rewriting over
arbitrary ABoxes contains three CQs which simply duplicate the answers because
the data respects the integrity constraints (a query with a few more atoms may
give rise to a UCQ rewriting with thousands CQs).
      </p>
      <p>By straightforwardly applying the unfolding algorithm to q′ and the T
mapping M above, we obtain the query
q′0′(t; y)</p>
      <p>title(m; t0; y0); title(m; t; y1); title(m; t2; y); (y &gt; 2010);
which requires two (potentially) expensive Join operations. However, if we use
the fact that the ID attribute is a primary key of title (uniquely de ning the
title and production year), then q′ can be unfolded into a much simpler
q′′(t; y)</p>
      <p>title(m; t; y); (y &gt; 2010):
In fact, such multiple Joins are very typical in OBDA because n-ary relations
of data sources are rei ed by ontologies into binary roles and attributes.</p>
      <p>The aim of this paper is to (i ) present the rewriting and optimisation
techniques that allow Ontop to produce optimised queries as discussed above, and
(ii ) evaluate the performance of Ontop using three use cases. We demonstrate
that|at least in these cases|Ontop produces query rewritings of reasonably
high quality and its performance is comparable to that of traditional RDBMSs.
2</p>
    </sec>
    <sec id="sec-2">
      <title>OWL 2 QL and Databases</title>
      <p>The language of OWL 2 QL contains individual names ai, concept names Ai,
and role names Pi (i 1). Roles R and basic concepts B are de ned by
R
::=</p>
      <p>Pi
j</p>
      <p>Pi ;
j
9R:
A TBox (or an ontology ), T , is a nite set of inclusions of the form
B1 ⊑ B2;</p>
      <p>B1 ⊑ 9R:B2;</p>
      <p>B1 ⊓ B2 ⊑ ?;</p>
      <p>R1 ⊑ R2;</p>
      <p>R1 ⊓ R2 ⊑ ?:
An ABox, A, is a set of atoms of the form Ak(ai) or Pk(ai; aj ). The semantics for
OWL 2 QL is de ned in the usual way based on interpretations I = (∆I ; I ) [2].
The set of individual names in A is denoted by ind(A).</p>
      <p>A conjunctive query q(x) is a rst-order formula 9y φ(x; y), where φ is a
conjunction of atoms of the form Ak(t1) or Pk(t1; t2), and each ti is a term (an
individual or a variable in x or y). We use the datalog notation for CQs, writing
q(x) φ(x; y) (without existential quanti ers), and call q the head and φ the
body of the rule. A tuple a ind(A) is a certain answer to q(x) over (T ; A) if
I j= q(a) for all models I of (T ; A); in this case we write (T ; A) j= q(a).</p>
      <p>We assume that the data comes from a relational database rather than an
ABox. We view databases [1] as triples (R; ; I), where R is a database schema,
containing predicate symbols for both stored database relations and views
(together with their de nitions in terms of stored relations), is a set of integrity
constraints over R (in the form of inclusion and functional dependencies), and
I is a data instance over R (satisfying ). The vocabularies of R and T are
linked together by means of mappings. A mapping, M, from R to T is a set of
(GAV) rules of the form</p>
      <p>S(x)
where S is a concept or role name in T and φ(x; z) a conjunction of atoms with
stored relations and views from R and a lter, that is, a Boolean combination
of built-in predicates such as = and &lt;. (Note that, by including views in the
schema, we can express any SQL query in mappings.) Given a mapping M,
the atoms S(a), for S(x) φ(x; z) in M and I j= 9z φ(a; z), comprise the
ABox, AI;M, which is called the virtual ABox for M over I. We can now de ne
certain answers to a CQ q over a TBox T linked by a mapping M to a database
(R; ; I) as certain answers to q over (T ; AI;M).
3</p>
    </sec>
    <sec id="sec-3">
      <title>The Architecture of Ontop</title>
      <p>We now brie y describe the main ingredients of Ontop: the tree-witness rewriting
over complete ABoxes, T -mappings and the unfolding algorithm. Suppose we are
given a CQ q over an ontology T and a mapping M from a database schema R
to T . The tree-witness rewriting of q and T , denoted qtw, presupposes that the
underlying ABox A is H-complete with respect to T in the sense that
S(a) 2 A
whenever</p>
      <p>S′(a) 2 A and T j= S′ ⊑ S;
for all concept names S and basic concepts S′ and for all role names S and
roles S′ (we identify P (b; a) and P (a; b) in ABoxes and assume 9R(a) 2 A if
R(a; b) 2 A, for some b). An obvious way to de ne H-complete ABoxes is to
i
.
take the composition MT of M and the inclusions in T given by
A(x)</p>
      <p>A(x)
P (x; y)
(We identify P (y; x) with P (x; y) in the heads of the mapping rules.) Thus, to
compute answers to q over T with M and a database instance I, it su ces to
evaluate the rewriting qtw over AI;MT :
(T ; AI;M) j= q(a)</p>
      <p>AI;MT j= qtw(a);
for any I and a
ind(AI;M):
standard
rewritings
UCQ
tw-rewriting
virtual
ABox</p>
      <p>completion</p>
      <sec id="sec-3-1">
        <title>H-complete ABox</title>
        <p>mapping
database
instance
T -mapping</p>
        <p>OBDA systems such as QuOnto [19] and Prexto [22] rst construct rewritings
over arbitrary ABoxes and only then unfold them, using mappings, into UCQs
which are evaluated by an RDBMS (dashed lines above). The same result can be
obtained by unfolding rewritings over H-complete ABoxes with the help of the
composition MT (solid lines above). However, in practice the resulting UCQs
very often turn out to be too large [20].</p>
        <p>In Ontop, we also start with MT . But before applying it to unfold qtw, we rst
simplify and reduce the size of the mapping by exploiting the database integrity
constraints. Following [21], a mapping M from R to T is called a T -mapping
over integrity constraints if the virtual ABox AI;M is H-complete w.r.t. T ,
for any data instance I satisfying . (The composition MT is a T -mapping
over any .) Ontop transforms MT to a much simpler T -mapping by taking
account of database integrity constraints (dependencies), and SQL features such
as disjunctions in lter conditions.
3.1</p>
        <p>Tree-Witness Rewriting
We explain the essence of the tree-witness rewriting using an example. Consider
an ontology T with the axioms</p>
        <p>RA ⊑ 9worksOn:Project;
worksOn
⊑ involves;</p>
        <p>Project ⊑ 9isManagedBy:Prof;
isManagedBy ⊑ involves
(4)
(5)
and the CQ asking to nd those who work with professors:
q(x)</p>
        <p>worksOn(x; y); involves(y; z); Prof(z):
Observe that if a model I of (T ; A), for some A, contains individuals a 2 RAI
and b 2 ProjectI , then I must also contain the following fragments:
. a
RA
worksOn
involves</p>
        <p>u</p>
        <sec id="sec-3-1-1">
          <title>Project</title>
          <p>isManagedBy
involves
v</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Prof</title>
          <p>b</p>
          <p>isManagedBy
Project involves
w</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>Prof</title>
          <p>Here the points are not necessarily named individuals from the ABox, but
can be generated by the axioms (4) as (anonymous) witnesses for the existential
quanti ers. It follows then that a is an answer to q(x) if a 2 RAI , in which case
the atoms of q are mapped to the fragment generated by a as follows:
q(x) x. worksOn y involves z Prof
worksOn; involves</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>Project</title>
          <p>isManagedBy; involves
Alternatively, if a is in both RAI and Prof I , then we obtain the following match:
z
involves
worksOn
u
u
y
v</p>
        </sec>
        <sec id="sec-3-1-5">
          <title>Prof</title>
          <p>v</p>
        </sec>
        <sec id="sec-3-1-6">
          <title>Prof</title>
          <p>w</p>
        </sec>
        <sec id="sec-3-1-7">
          <title>Prof</title>
          <p>worksOn; involves</p>
        </sec>
        <sec id="sec-3-1-8">
          <title>Project</title>
          <p>isManagedBy; involves
Another option is to map x and y to ABox individuals, a and b, and if b is in
ProjectI , then the last two atoms of q can be mapped to the anonymous part:
q(x) x. worksOn y involves z Prof
Finally, all the atoms of q can be mapped to ABox individuals. The possible ways
of mapping parts of the CQ to the anonymous part of the model are called tree
witnesses. The tree witnesses for q found above give the following tree-witness
rewriting qtw of q and T over H-complete ABoxes:
b</p>
        </sec>
        <sec id="sec-3-1-9">
          <title>Project</title>
          <p>isManagedBy; involves
a</p>
          <p>RA</p>
        </sec>
        <sec id="sec-3-1-10">
          <title>Prof</title>
          <p>q(x) .</p>
          <p>x
a
RA
qtw(x)
qtw(x)
qtw(x)
qtw(x)
worksOn(x; y); involves(y; z); Prof(z);
RA(x);
RA(x); Prof(x);
worksOn(x; y); Project(y):
(6)
(7)
(8)
(9)
(Note that qtw is not a rewriting over arbitrary ABoxes.)</p>
          <p>In theory, the size of the rewriting qtw can be large [12]: there exists a
sequence of qn and Tn generating exponentially many (in jqnj) tree witnesses, and
any rewriting of qn and Tn is of exponential size (unless it employs jqnj-many
additional existentially quanti ed variables [10]). Our experiments (see Section 4)
demonstrate, however, that in practice, real-world ontologies and CQs generate
small and simple tree-witness rewritings.</p>
          <p>There are two ways to simplify tree-witness rewritings further. First, we can
use a subsumption algorithm to remove redundant CQs from the union: for
example, (7) subsumes (8), which can be safely removed. Second, we can reduce
the size of the individual CQs in the union using the following observation: for
any CQ q (viewed as a set of atoms),</p>
          <p>q [ fA(x); A′(x)g
q [ fA(x); R(x; y)g
q [ fP (x; y); R(x; y)g
c
c
c
q [ fA(x)g;
q [ fR(x; y)g;
q [ fR(x; y)g;
if T j= A ⊑ A′;
if T j= 9R ⊑ A;
if T j= R ⊑ P;
where c reads `has the same certain answers over H-complete ABoxes' (we
again identify P (y; x) with P (x; y)). Surprisingly, such a simple
optimisation, especially for the domain/range constraints, makes rewritings substantially
shorter [23, 9].
3.2</p>
          <p>Optimising T -mappings
Suppose M [ fS(x) 1(x; z)g is a T -mapping over . If there is a more
speci c rule than S(x) 1(x; z) in M, then M itself is also a T -mapping. To
discover such `more speci c' rules, we run the standard query containment check
(see, e.g., [1]), but taking account of the inclusion dependencies. For example,
since T j= 9mo:cast ⊑ mo:Movie, the composition MMO of the mapping in the
introduction and MO contains the following rules for mo:Movie:
mo:Movie(m)
mo:Movie(m)
title(m; t; y);
castinfo(p; m; r):
The latter rule is redundant since IMDb contains the foreign key
8m (9p; r castinfo(p; m; r) ! 9t; y title(m; t; y)):</p>
          <p>Another way to reduce the size of a T -mapping is to identify pairs of rules
whose bodies are equivalent up to lters w.r.t. constant values. For example, the
mapping M for IMDb and MO contains 6 rules for sub-concepts of mo:Person:
mo:Actor(p)</p>
          <p>castinfo(c; p; m; r); (r = 1);
mo:Editor(p)</p>
          <p>castinfo(c; p; m; r); (r = 6):
So, the composition MMO contains six rules for mo:Person that di er only in
the last condition (r = k), for 1 k 6. These can be reduced to a single rule:
mo:Person(p)
castinfo(c; p; m; r); (r = 1) _
_ (r = 6):
Note that such disjunctions lend themselves to e cient evaluation by RDBMSs.
3.3</p>
          <p>Unfolding with Semantic Query Optimisation (SQO)
The unfolding procedure [19] applies SLD-resolution to qtw and the T -mapping,
and returns those rules whose bodies contain only database atoms (cf. partial
evaluation in [15]). Ontop applies SQO [6] to rules obtained at the
intermediate steps of unfolding. In particular, this eliminates redundant Join operations
caused by rei cation of database relations by means of concepts and roles. We
saw in the introduction that the primary key m of title, i.e., following two
functional dependencies with determinant m:
8m (9y title(m; t1; y) ^ 9y title(m; t2; y) ! (t1 = t2));
8m (9t title(m; t; y1) ^ 9t title(m; t; y2) ! (y1 = y2));
remove the two Join operations in title(m; t0; y0), title(m; t; y1), title(m; t2; y),
resulting in a single atom title(m; t; y). Note that these two Join operations were
introduced to reconstruct the ternary relation from its rei cation by means of
roles mo:title and mo:year.</p>
          <p>The role of SQO in OBDA systems appears to be much more prominent
than in conventional RDBMSs, where it was initially proposed to optimise SQL
queries. While some of SQO techniques reached industrial RDBMSs, it never
had a strong impact on the database community because it is costly compared
to statistics- and heuristics-based methods, and because most SQL queries are
written by highly-skilled experts (and so are nearly optimal anyway). In OBDA
scenarios, in contrast, SQL queries are generated automatically, and so SQO
becomes the only tool to avoid redundancy.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>We illustrate the performance of Ontop by three use cases. All experiments were
run on Ubuntu 12.04 64-bit with an Intel Core i5 650, 4 cores@3.20GHz, 16 GB
RAM and 1 TB@7200 rpm HD. We used a Java 7 virtual machine for Ontop
with MySQL 5.5 for Cases 1 and 3, and with PostgreSQL 9.1 for Case 2. Full
details of the experiments are available at obda.inf.unibz.it/data/owled13.</p>
      <p>Case 1 is a simulation of a railway network for cargo delivery developed
by the University of Genoa with the industrial partner Intermodal Logistics [5].
The ILog ontology, mapping and queries are used to monitor the status of the
network. The case includes an ontology with 70 concepts and roles, a mapping
with 43 rules and 11 queries (www.mind-lab.it/~gcicala/isf2012). For our
experiments, we generated data for 30 days.</p>
      <p>Case 2 uses the Movie Ontology (MO) over the real data from the Internet
Movie Database (IMDb) with a mapping created by the Ontop development
team. We use nine complex, yet natural queries, e.g.,</p>
      <p>SELECT DISTINCT ?x ?title ?actor name ?prod year ?rating
WHERE f
?m a mo:Movie;
mo:title ?title;
mo:imdbrating ?rating;
dbpedia:productionStartYear ?prod year;
mo:hasActor ?x;
mo:hasDirector ?x .
?x dbpedia:birthName ?actor name .</p>
      <p>FILTER ( ?rating &gt; '7.0' &amp;&amp; ?prod year &gt;= 2000 &amp;&amp; ?prod year &lt;= 2010 )
g
ORDER BY desc(?rating) ?prod year</p>
      <p>LIMIT 25
(full details are available at the URL above). Most queries are of high selectivity
and go beyond CQs, using inequalities, ORDER BY/LIMIT and DISTINCT
operators. Both the SQL database and the ontology were developed independently
by third parties (IMDb and the University of Zurich) for purposes di erent from
benchmarking.</p>
      <p>Case 3 is based on the Lehigh University Benchmark (LUBM, swat.cse.
lehigh.edu/projects/lubm), which comes with an OWL ontology, 14 simple CQs
of varying degree of selectivity and a data generator. We approximated the
ontology in OWL 2 QL and created a database schema to store the data for 200
universities (1 university 130K assertions). Note that although the data has
some degree of randomness, it is not arbitrary and follows what can be regarded
as a natural pattern: each university has 15{25 departments, each department
has 7{10 full professors, every person has a name, etc. These considerations were
taken into account to produce a normalised database schema with relations of
appropriate arity together with primary and foreign keys (instead of the standard
universal tables storing RDF triples).</p>
      <p>case
ILog
IMDb-MO
LUBM
in ILog and IMDb-MO have no tree witnesses, and so the rewriting returns the
original query (note that Q7, Q9 of ILog have a union in the original query).
The only exception is Q5 in IMDb-MO with one tree witness, which generates
two CQs in the rewriting. Second, the ratio of the number of rules in a
mapping per concept/role in both scenarios is very low when our optimisations are
applied: most have at most one rule (even in the case of large hierarchies with
many domain/range axioms). So, the unfolding with such a mapping produces
a small number of Select-Project-Join queries in the union. These
observations support our claim that, in practice, there are few tree witnesses and that
our T -mapping optimisations can handle e ciently concept and role hierarchies,
domain and range constraints.</p>
      <p>The time required for query rewriting and optimisation is negligible and stays
within 4ms. In contrast, the time required to generate queries without
optimisations is higher, especially for queries involving large hierarchies ( 25ms): in
particular, Q5 and Q8 in IMDb-MO, where our optimisations reduce the time
of unfolding from 37/26ms to 3/6ms. Similarly to other systems, Ontop applies
CQ containment (CQC) checks to reduce the number of Select-Project-Join
queries, and these checks prove to be costly on large unfoldings without
optimisations. Although few milliseconds might seem negligible, the performance of
such systems as RDF triple stores and DBs is measured in queries per second
and is usually expected to be in thousands. With such requirements, an overhead
of 20{30ms per query is not acceptable.</p>
      <p>The execution time for SQL queries produced by Ontop, in MySQL or
Postgres, is within 100ms for simple, high selectivity queries (with few results).
Although Q2, Q4 and Q10 in ILog and Q1, Q2, Q3, Q5 and Q8 in IMDb-MO take
up to 4s to execute, their SQL rewritings are optimal (in the sense that they
coincide with hand-crafted queries), and their relatively long execution time is
due to DISTINCT/ORDER BY over large relations.</p>
      <sec id="sec-4-1">
        <title>LUBM</title>
        <p>query</p>
        <p>Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14</p>
        <p>In the LUBM case, the queries have no tree witnesses, which results in
treewitness rewritings that coincide with the original queries. LUBM is, however, the
only case where Ontop generated unions with hundreds Select-Project-Join
queries, which is due to a higher ratio of mappings per concept/role. This is a
consequence of the database structure and the way in which mappings construct
object URIs from integers and strings in the database (known as impedance
mismatch [19]). The generated SQL queries are still optimal in the sense that
they correspond to human-generated queries for the given database schema.</p>
        <p>Query execution appears to be optimal for all queries (but Q6, Q9 and Q14),
with response times under 12ms even for queries with Join and lter operations
over large tables. This corresponds to the expected performance of an optimised
RDBMS, in which most operations can be performed using in-memory indexes
(provided that SQL queries have the right structure for the query planner). Q6,
Q9 and Q14 have low selectivity (large number of results) and the execution
time is dominated by disk access.</p>
        <p>It is to be noted that although we used an OWL 2 QL approximation of
LUBM, most queries return the same results as for the original LUBM ontology.
The only exceptions are Q11 and Q12: all answers to Q11 are recovered with an
extra mappings simulating transitivity (up to a prede ned depth) by means of
self-Joins on the transitive property; similarly, for all answers to Q12, we include
an extra mapping rule expressing 9R:B ⊑ A on the elements of the virtual ABox.
The execution times in the table are given for the extensions described above,
which ensure completeness (w.r.t. the original LUBM) of the returned answers.</p>
        <p>Finally, by comparing the performance of Ontop (see ontop.inf.unibz.it)
with that of other open-source or commercial systems [11, 16], we see that
Ontop is much faster than Sesame or Jena (open-source), and similar to OWLIM
(commercial), but does not pay the heavy price for inference materialisation,
which can take days or hours.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>
        To conclude, we believe this paper shows that|despite the negative theoretical
results on the worst-case OWL 2 QL query rewriting and sometimes
disappointing experiences of the rst OBDA systems|high-performance OBDA is
achievable in practice when applied to standard ontologies, queries and data stored in
relational databases. In such cases, query rewriting together with SQO and SQL
optimisations are fast, e cient and produce SQL queries of high quality.
Acknowledgements. We thank G. Cicala and A. Taccella for their help on the
ILog experiments and the Ontop development team (J. Hardi, T. Bagosi and
M. Slusnys) for their help with the experiments. This work was supported by
the EU FP7 project Optique (grant 318338) and UK EPSRC project ExODA
(EP/H05099X).
3. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, M.
RodriguezMuro, R. Rosati, M. Ruzzi, and D. F. Savo. The MASTRO system for
ontologybased data access. Semantic Web, 2(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ):43{53, 2011.
4. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable
reasoning and e cient query answering in description logics: The DL-Lite family.
      </p>
      <p>J. Autom. Reasoning, 39(3):385{429, 2007.
5. M. Casu, G. Cicala, and A. Tacchella. Ontology-based data access: An application
to intermodal logistics. Information Systems Frontiers, pages 1{23, 2012.
6. U. S. Chakravarthy, D. H. Fishman, and J. Minker. Semantic query optimization
in expert systems and database systems. Benjamin-Cummings Publishing Co., Inc.,
1986.
7. A. Chortaras, D. Trivela, and G. Stamou. Optimized query rewriting for OWL 2</p>
      <p>QL. In Proc. of CADE-23, volume 6803 of LNCS, pages 192{206. Springer, 2011.
8. T. Eiter, M. Ortiz, M. Simkus, T.-K. Tran, and G. Xiao. Query rewriting for</p>
      <p>Horn-SHIQ plus rules. In Proc. of AAAI 2012. AAAI Press, 2012.
9. G. Gottlob, G. Orsi, and A. Pieris. Ontological queries: Rewriting and
optimization. In Proc. of ICDE 2011, pages 2{13. IEEE Computer Society, 2011.
10. G. Gottlob and T. Schwentick. Rewriting ontological queries into small
nonrecursive datalog programs. In Proc. of KR 2012. AAAI Press, 2012.
11. V. Khadilkar, M. Kantarcioglu, B. M. Thuraisingham, and P. Castagna.
JenaHBase: A distributed, scalable and e cient RDF triple store. In Proc. of ISWC,
volume 914 of CEUR-WS, 2012.
12. S. Kikot, R. Kontchakov, V. Podolskii, and M. Zakharyaschev. Exponential lower
bounds and separation for query rewriting. In Proc. of ICALP 2012, Part II,
volume 7392 of LNCS, pages 263{274. Springer, 2012.
13. S. Kikot, R. Kontchakov, and M. Zakharyaschev. Conjunctive query answering
with OWL 2 QL. In Proc. of KR 2012. AAAI Press, 2012.
14. M. Konig, M. Leclere, M.-L. Mugnier, and M. Thomazo. A sound and complete
backward chaining algorithm for existential rules. In Proc. of RR 2012, volume
7497 of LNCS, pages 122{138. Springer, 2012.
15. J.W. Lloyd and J.C. Shepherdson. Partial Evaluation in Logic Programming. The</p>
      <p>Journal of Logic Programming, 11(3-4):217{242, October 1991.
16. Ontotext. OWLIM performance with Jena, 2011. http://www.ontotext.com/
owlim/benchmark-results/owlim-jena-performance.
17. H. Perez-Urbina, B. Motik, and I. Horrocks. A comparison of query rewriting
techniques for DL-lite. In Proc. of DL 2009, volume 477 of CEUR-WS, 2009.
18. H. Perez-Urbina, E. Rodr guez-D az, M. Grove, G. Konstantinidis, and E. Sirin.</p>
      <p>Evaluation of query rewriting approaches for OWL 2. In Proc. of SSWS+HPCSW
2012, volume 943 of CEUR-WS, 2012.
19. A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati.</p>
      <p>Linking data to ontologies. J. Data Semantics, 10:133{173, 2008.
20. M. Rodr guez-Muro. Tools and Techniques for Ontology Based Data Access in
Lightweight Description Logics. PhD thesis, KRDB Research Centre for Knowledge
and Data, Free Univ. of Bozen-Bolzano, 2010.
21. M. Rodr guez-Muro and D. Calvanese. Dependencies: Making ontology based data
access work. In Proc. of AMW 2011, volume 749. CEUR-WS.org, 2011.
22. R. Rosati. Prexto: Query rewriting under extensional constraints in DL-Lite. In</p>
      <p>Proc. of EWSC 2012, volume 7295 of LNCS, pages 360{374. Springer, 2012.
23. R. Rosati and A. Almatelli. Improving query answering over DL-Lite ontologies.</p>
      <p>In Proc. of KR 2010. AAAI Press, 2010.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hull</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Vianu</surname>
          </string-name>
          .
          <source>Foundations of Databases. Addison-Wesley</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-</surname>
          </string-name>
          Schneider, editors.
          <source>The Description Logic Handbook: Theory</source>
          , Implementation, and
          <string-name>
            <surname>Applications</surname>
          </string-name>
          . Cambridge University Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>