<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mastro: A Reasoner for E ective Ontology-Based Data Access</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giuseppe De Giacomo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domenico Lembo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maurizio Lenzerini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonella Poggi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Rosati</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Ruzzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domenico Fabio Savo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dip. di Ing. Informatica, Automatica e Sistemistica Sapienza Universita di Roma</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we present Mastro, a Java tool for ontologybased data access (OBDA) developed at Sapienza Universita di Roma. Mastro manages OBDA systems in which the ontology is speci ed in a logic of the DL-Lite family of Description Logics speci cally tailored to ontology-based data access, and is connected to external data management systems through semantic mappings that associate SQL queries over the external data to the elements of the ontology. Advanced forms of integrity constraints, which turned out to be very useful in practical applications, are also enabled over the ontologies. Optimized algorithms for answering expressive queries are provided, as well as features for intensional reasoning and consistency checking. Mastro has been successfully used in several projects carried out in collaboration with important organizations, on which we brie y comment in this paper.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In this paper we present the current version of Mastro, a system for
ontologybased data access (OBDA) developed at Sapienza Universita di Roma. Mastro
allows users for accessing external data sources through an ontology expressed
in a fragment of the W3C Web Ontology Language (OWL).</p>
      <p>As in data integration systems [11], mappings are used in OBDA to specify
the semantic correspondence between a uni ed view of the domain (called global
schema in data integration terminology) and the data stored at the sources.
The distinguishing feature of the OBDA approach, however, is the fact that
the global uni ed view is speci ed using an ontology language, which typically
allows to provide a rather rich conceptualization of the domain of interest, that is
independent from the representation adopted for the data stored at the sources.
This choice provides several advantages: it allows for a declarative approach to
data access and integration and provides a speci cation of the domain that is
independent from the data layer; it realizes logical/physical independence of the
information system, which is therefore more accessible to non-experts of the
underlying databases; the conceptual approach to data access does not impose
to fully integrate the data sources at once, as it often happens in data integration
mediator-based system, but the design can be carried out in an incremental way;
the conceptual model available on the top of the system provides a common
ground for the documentation of the data stores and can be seen as a formal
speci cation for mediator design.</p>
      <p>
        Mastro has solid theoretical basis [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. In the current version of
Mastro, ontologies are speci ed in DL-LiteA;id;den , a logic of the DL-Lite family of
tractable Description Logics (DLs), which are speci cally tailored to the
management and querying of ontologies in which the extensional level, i.e., the data,
largely dominates the intensional level. From the point of view of the
expressive power, DL-LiteA;id;den captures the main modeling features of a variety of
representation languages, such as basic ontology languages and conceptual data
models. Furthermore, it allows for specifying advanced forms of identi cation
constraints [5] and denials [10], that are not part of OWL 2, the current W3C
standard language for specifying ontologies.
      </p>
      <p>
        Answering unions of conjunctive queries in OBDA systems managed by
Mastro can be done through a very e cient technique that reduces this task to
standard SQL query evaluation. Indeed, conjunctive query answering has been
shown to be in LogSpace (in fact in AC0) w.r.t. data complexity [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], i.e., the
complexity measured only w.r.t. the extensional level, which is the same
complexity of evaluating SQL queries over plain relational databases. One key feature
of the current version of Mastro, wrt previous ones [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], is that it adopts the
Presto algorithm [15] for rst-order query rewriting.
      </p>
      <p>Mastro is developed in Java and can be connected to any data management
system allowing for a JDBC connection, e.g., a relational DBMS. In those cases
in which several, possibly non-relational, sources need to be accessed, Mastro
can be coupled with a relational data federation tool1, which wraps sources and
represents them as a single (virtual) relational database.</p>
      <p>The rest of the paper is organized as follows. In Section 2, we brie y describe
the framework of ontology-based data access. In Section 3, we describe the query
answering algorithm of the Mastro system. In Section 4, we report on some real
world information integration applications where Mastro has been successfully
trialed. In Section 5, we conclude the paper by discussing related work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Ontology-based data access</title>
      <p>
        In OBDA, the aim is to give users access to a data source or a collection thereof,
by means of a high-level conceptual view speci ed as an ontology. The ontology
is usually formalized in Description Logics (DLs) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which are at the basis of
OWL. These logics allow one to represent the domain of interest in terms of
concepts, denoting sets of objects (corresponding to OWL classes), roles,
denoting binary relations between objects (OWL object properties ), and attributes,
denoting relations between objects and values from prede ned domains (OWL
data properties).
1 E.g., IBM WebSphere Application Server (http://www.ibm.com/software/
webservers/appserv/was/), Oracle Data Service Integrator (http://www.oracle.
com/us/products/middleware/data-integration/).
      </p>
      <p>
        A DL ontology is a pair hT ; Ai [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] where T , called TBox, is a nite set of
intensional assertions, and A, called ABox, is a nite set of instance assertions,
i.e, assertions on individuals. Di erent DLs allow for di erent kinds of TBox
and/or ABox assertions.
      </p>
      <p>
        The semantics of an ontology is given in terms of rst-order
interpretations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. An interpretation I is a model of an ontology O = hT ; Ai if it satis es
all assertions in T [A, where the notion of satisfaction depends on the constructs
and axioms allowed by the speci c DL in which O is expressed.
      </p>
      <p>Among the extensional reasoning tasks w.r.t. a given ontology hT ; Ai, the
most relevant ones are ontology satis ability and query answering.</p>
      <p>In particular, we are interested in the class of conjunctive queries (CQ). A
CQ q over an ontology O (resp. TBox T ) is an expression of the form q(x)
9y.conj(x; y) where x are the so-called distinguished variables, y are existentially
quanti ed variables called the non-distinguished variables, and conj(x; y) is a
conjunction of atoms of the form A(z), P (z; z0), U (z; z0) where A is a concept
name, P is a role name and U is an attribute name, and z, z0 are either variables
in x or in y or constants. The arity of q is the arity of x. A CQ of arity 0 is
called a boolean conjunctive query. A union of conjunctive queries (UCQ) is a
query of the form q(x) Wi 9yi.conj(x; yi):</p>
      <p>Given a query q(x) (either a conjunctive query or an union of
conjunctive queries) and an ontology O, the certain answers to q(x) over O is the
set cert(q; O) of all tuples t of constants appearing in O, such that, when
substituted for the variables x in q(x), we have that O j= q(t), meaning that tI 2 qI
for every I 2 M od(O). Notice that the answer to a boolean query is either the
empty tuple, considered as true, or the empty set, considered as f alse.</p>
      <p>In OBDA, the extensional level is not represented directly by an ABox, but
rather by a database that is connected to the TBox by means of suitable mapping
assertions2. Such mapping assertions have the form ; , where , called the
body of the assertion, is an arbitrary SQL query over the underlying database,
and , called the head, is a CQ over the TBox T . Intuitively, a mapping assertion
speci es that the tuples returned by the SQL query are used to generate the
facts that instantiate the concepts, roles, and attributes in .</p>
      <p>All the notions given above can be easily generalized to OBDA systems,
where a TBox T is connected to an external database D through mappings M,
denoted hT ; M; Di. In particular, the models of hT ; M; Di are those
interpretations of T that satisfy the assertions in T and that are consistent with the tuples
retrieved by M from D (see [13] for the formal details). Satis ability amounts to
checking whether hT ; M; Di admits at least one model, while answering a query
Q amounts to computing the tuples that are in the evaluation of Q in every
model of hT ; M; Di.</p>
      <p>
        Mastro is able to deal with DL TBoxes that are expressed in
DLLiteA;id;den , a member of the DL-Lite family of lightweight DLs [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In such
DLs, a good tradeo is achieved between the expressive power of the TBox
lan2 Note that, in the following, with some abuse of terminology, when we use the term
\ontology" in the context of OBDA, we implicitly refer to the TBox only.
guage used to capture the domain semantics, and the computational complexity
of inference, in particular when such a complexity is measured w.r.t. the size of
the data.
      </p>
      <p>Basic DL-LiteA;id;den expressions are de ned as follows:
B !
C !
V !</p>
      <p>A j 9Q j (U )
B j :B
U j :U</p>
      <p>Q !
R !</p>
      <p>P j P
Q j :Q</p>
      <p>E !
F !
(U )
T1 j
j Tn
where, A, P , and P denote an atomic concept, an atomic role, and the inverse
of an atomic role respectively; (U ) (resp. (U )) denotes the domain (resp. the
range) of an attribute U , i.e., the set of objects (resp. values) that U relates
to values (resp. objects); T1; : : : ; Tn are unbounded pairwise disjoint prede ned
value-domains; B is called basic concept.</p>
      <p>A DL-LiteA;id;den TBox is a nite set of the following assertions:
B v C
Q v R
U v V
E v F
(funct Q)
(funct U )
(id B 1; : : : ; n)
8y.conj(t) ! ?
(concept inclusion assertion)
(role inclusion assertion)
(attribute inclusion assertion)
(value-domain inclusion assertion )
(role functionality assertion)
(attribute functionality assertion )
(identi cation assertion)
(denial assertion)
In identi cation assertions [5], i is a path, i.e., an expression built according
to the following syntax: ! S j D? j 1 2, where S denotes an atomic
role, the inverse of an atomic role, an attribute, or the inverse of an attribute,
1 2 denotes the composition of paths 1 and 2, and D?, called test relation,
represents the identity relation on instances of D, which can be a basic concept or
a value-domain. Test relations are used to impose that a path involves instances
of a certain concept or value-domain. In DL-LiteA;id;den , identi cation assertions
are local, i.e., at least one i 2 f 1; :::; ng has length 1, i.e., it is an atomic
role, the inverse of an atomic role, or an attribute. Intuitively, an identi cation
assertion of the above form asserts that for any two di erent instances o, o0 of
B, there is at least one i such that o and o0 di er in the set of their i- llers,
that is the set of objects that are reachable from o by means of i.</p>
      <p>In denial assertions [10], conj(y) is de ned as for boolean CQs. Intuitively, a
denial assertion of the above form states that there must not exist any tuple y
satisfying conj(y), i.e., that the answer to the boolean query q() 9y.conj(y)
must be empty.</p>
      <p>Finally, in a DL-LiteA;id;den TBox T , the following condition must hold: each
role or attribute that either is functional in T or appears (in either direct or
inverse direction) in a path of an identi cation assertion in T is not specialized,
i.e., it does not appear in the right-hand side of assertions of the form Q v Q0
or U v U 0.</p>
      <p>Mapping assertions handled by Mastro are assertions of the form ; ,
where is an arbitrary SQL query over the underlying database, and is a
conjunction of atoms whose predicates are the concepts, roles, and attributes of the
TBox. Notice that, due to the fact that is a conjunction of atoms (as opposed
to a query, possibly with existentially quanti ed variables), such mappings can
be considered as a special form global-as-view (GAV) mappings [11].</p>
      <p>In order to overcome the so-called impedance mismatch between the
database, storing values, and the TBox, to be interpreted over a domain of
objects, the mapping assertions are used in Mastro to specify how to construct
abstract objects from the tuples of values retrieved from the database. This is
done by allowing one to use function symbols in the atoms in : together with
the values retrieved by , such function symbols generate so called object terms,
which serve as object identi ers for individuals in the ontology. We notice that
the semantics we adopt in Mastro establishes that di erent terms denote di
erent objects (unique name assumption), so that di erent terms never need to be
equated during reasoning, which is coherent with the assumption of not having
existentially quanti ed variables in the body of mappings.</p>
      <p>
        For the logics of the DL-Lite family it has been shown that for unions of
conjunctive queries (UCQs), under the unique name assumption, query
answering can be carried out e ciently in the size of the data, by reducing it to SQL
query evaluation over the ABox seen as a database [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Also satis ability, which
is easily reducible to query answering, can be solved through the same
mechanism. Such techniques are implemented in Mastro, we refer to [
        <xref ref-type="bibr" rid="ref4">4, 13</xref>
        ] for a more
complete treatment.
      </p>
      <p>As an example, consider the OBDA system hT ; M; Di, where the TBox T
is constituted by the following set of intensional assertions: fN ationalF light v
F light, InternationalF light v F lightg, D is a database constituted by a set of
relations with the following signature:
FL TB[fl num:string, departure:integer, arrival:integer],
AIRPORT TB[airpt code:integer, name:string, country:string],
and M contains the following mapping assertions:
SELECT fl num
FROM FL TB,AIRPORT TB A1,AIRPORT TB A2
WHERE departure = A1.airpt code and
arrival = A2.airpt code and
A1.country = 'IT' and A2.country = 'IT'
SELECT fl num
FROM FL TB,AIRPORT TB A1,AIRPORT TB A2
WHERE departure = A1.airpt code and
arrival = A2.airpt code and
(A1.country != 'IT' or A2.country != 'IT')
; NationalFlight( (fl num))
; InternationalFlight( (fl num))
which specify how to construct instances of the ontology concepts
N ationalF light and InternationalF light starting from the database relations
FL TB and AIRPORT TB.</p>
    </sec>
    <sec id="sec-3">
      <title>Query Answering</title>
      <p>In this section we describe the query rewriting process of the Mastro system.
The technique is purely intensional and is performed in three steps (see Figure 1):
1. TBox rewriting: The rst step rewrites the input UCQ according to the
knowledge expressed by the TBox. The rewriting, performed using the Presto
algorithm [15], produces as output a non-recursive Datalog program, which
encodes the knowledge expressed by the TBox and the user query. The
output Datalog program contains the de nition of auxiliary predicates, not
belonging to the alphabet of the ontology.
2. Datalog Unfolding: The output of the rst step is then unfolded into a new
UCQ by means of the Datalog Unfolding algorithm. It consists of a classic
rule unfolding technique which eliminates all the auxiliary predicate symbols
introduced by the Presto algorithm and produces a nal UCQ expressed in
terms of ontology concepts, roles, and attributes.
3. Mapping Unfolding: The last step takes the unfolded UCQ and the mapping
assertions as input and produces an SQL query which can be directly
evaluated over the data sources. In particular, the mapping assertions are rst
split into assertions of a simpler form, in which the head of every mapping
assertion contains only a single ontology predicate; then, the nal
reformulation is produced through a mapping unfolding step, as described in [13].</p>
      <p>
        More speci cally, the Presto algorithm is an optimization of the well-known
PerfectRef [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The latter, depending on the particular TBox being used, may
lead to huge UCQs, consisting of many possibly redundant queries which can be
eliminated from the nal result. Presto tries to overcome such issue, rewriting the
user query into a Datalog program whose rules encode only necessary expansion
steps, thus preventing the generation of useless queries. It is important to note
that after the Datalog unfolding program, one can have again an exponential
number of queries, but Mastro experiences on real world application showed a
dramatic performance improvement w.r.t. to the performance of PerfectRef.
      </p>
    </sec>
    <sec id="sec-4">
      <title>The system at work: experiences on real cases</title>
      <p>
        The usefulness of OBDA and the e ciency of the Mastro system were proved
by several real world applications in which it has been experimented. In the
following, we report on the experiments carried out with Banca Monte dei Paschi
di Siena (MPS), the Italian Ministry of Economy and Finance (MEF) and the
Telecom Italia, the main Italian telephone company. Other experiments have
been recently carried out with SELEX Sistemi Integrati (SELEX-SI), and
Accenture [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Monte dei Paschi di Siena. Within a joint project with Banca Monte dei Paschi
di Siena (MPS)3, Free University of Bozen-Bolzano, and Sapienza Universita di
Roma, we used Mastro for accessing a set of data sources from the actual MPS
data repository by means of an ontology [16]. In particular, we focused on the
data exploited by MPS personnel for risk estimation in the process of granting
credit to bank customers. A 15 million tuple database, stored in 12 relational
tables managed by the IBM DB2 RDBMS, has been used as data source collection
in the experimentation. Such source data are managed by a dedicated
application, which is in charge of guaranteeing data integrity (in fact, the underlying
database does not force constraints on data). Not only the application performs
various updates, but data is updated on a daily basis to identify connections
between customers that are relevant for the credit rating estimation.</p>
      <p>The main challenge that we tackled within the experimentation was the
ontology and mapping design. This was a seven man-months process that required to
both inspect the data source and interview domain experts, and was complicated
by the fact that the source was managed by a speci c application. The resulting
OBDA system is de ned in terms of approximately 600 DL-LiteA;id assertions
over 79 concepts, 33 roles and 37 attributes, and 200 mapping assertions.</p>
      <p>The experimentation showed that the usefulness of the Mastro system goes
beyond data integration applications and embraces data quality management. In
particular, it con rmed the importance of several distinguished features of our
system, namely, identi cation constraints and denial constraints, which have
been used extensively to model important business rules. Notably, checking that
such rules were satis ed by data retrieved from the sources through mappings led
to highlight unexpected incompleteness and inconsistency in the data sources.</p>
      <p>Our work has also pointed out the importance of the ontology itself, as a
precious documentation tool for the organization. Indeed, the ontology developed
in our project is adopted in MPS as a speci cation of the relevant concepts in
the organization. At present we are still working with MPS in order to extend
the work to cover the core domain of the MPS information system, with the idea
that the ontology-based approach could result in a basic step for the future IT
architecture evolution.
3 MPS is one of the main banks, and the head company of the third banking group in</p>
      <p>Italy (see http://english.mps.it/).</p>
      <p>Italian Ministry of Economy and Finance. Mastro has been used within a
joint project between Sapienza Universita di Roma and the Italian Ministry of
Economy and Finance (MEF). The main objectives of the project have been: the
design and speci cation in DL-LiteA of an ontology for the domain of the Italian
public debt; the realization of the mapping between the ontology and relational
data sources that are part of the management accounting system currently in
use at the ministry; the de nition and execution of queries over the ontology
aimed at extracting data of core interest for MEF users. In particular, the
information returned by such queries relates to sales of bonds issued by the Italian
government, maturities of bonds, monitoring of various nancial products, etc.,
and are at the basis of various reports on the overall trend of the national public
debt.</p>
      <p>The Italian public dept ontology is over an alphabet containing 164 atomic
concepts, 47 atomic roles, 86 attributes, and comprises around 1440 DL-LiteA
assertions. The 300 mapping assertions involve around 60 relational tables
managed by Microsoft SQLServer. We tested a very high number of queries and
produced through Mastro several reports of interest for the ministry. We point
out that around 80% of the queries we tested could be executed only thanks to
a series of further optimizations introduced in the system that, due to lack of
space, we cannot describe here.</p>
      <p>Telecom Italia. We nally describe a project we are carrying out in the domain
of network inventory systems, together with Telecom Italia, the main Italian
company for telecommunication services, which is also a world leading company
in this eld. The main objectives of the project are (i) the speci cation of an
ontology that formalizes the entire telecommunication network owned by
Telecom Italia and (ii) the analysis through the ontology of the information systems
that are currently used for network management. The ontology we are going to
develop can be partitioned into four layers: Infrastructures and territory layer,
which represents main infrastructures used to realize the network, the way in
which network elements (e.g., cables, apparatus, connection points) are localized
into such infrastructures, and how both infrastructures and network elements are
localized with respect to the territory; network topology layer, which represents
how connections are realized into the network, essentially representing it as a
graph in which edges represent elementary connections among apparatus, and
nodes represent apparatus which realizes signal permutations between
elementary connections; service layer, which represents all telecommunication services
that are deployed on the network and o ered to customer (e.g., voice
communication, ADSL, voip); data layer, which represents the actual data exchanged on
the network (e.g., data on telephone calls, internet access). In each such layer,
the ontology provides the means to precisely represent the current state of the
world, and, when considered of interest, also captures past situations, for
example to provide tracks to all changes to which certain in the network have
undergone. The use of identi cation assertions and epistemic constraints turned
out to be crucial for faithful representation of such aspects.</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>Accessing (possibly disperse) data through a virtual global schema has been
deeply investigated in the last two decades in the eld of data integration [11,
8]. From the modeling perspective, however, the main systems produced by this
research su er from some weakness, mainly due to the limited expressive power
of the languages provided to model the global schema of the integration
system. In this respect, Mastro aims at overcoming this limitation by providing
the best expressive power allowed while preserving tractability of conjunctive
query answering and of the integration tasks. As for the mappings, Mastro
adopts a powerful form of the so-called Global-As-View (GAV) mappings [11],
and provides optimized algorithms for rewriting global queries with respect their
speci cation.</p>
      <p>To the best of our knowledge, the only existing system designed for the same
aims of Mastro is Quest [14], which has indeed common roots with our tool.
Quest is a system for query answering over DL-LiteA ontologies, which can work
in both \classical" (i.e., with a local ABox) and \virtual" mode (i.e., as an OBDA
system). Quest implements speci c optimizations for query answering, which in
particular exploit completeness of the ABox with respect to the TBox. Although
rst experiments show e ectiveness of Quest in the classical scenario [14], its
usage in the virtual mode is in a still preliminary stage. In particular, we tried
to compare Quest with Mastro in the OBDA scenario of the Italian Ministry
of Economy and Finance described described in the previous section.
Unfortunately, we have not been able to perform such experiments for two reasons: (i)
the data source of this application is an SQL Server database; since Quest does
not support this DBMS, we could not compare query answering in the two
systems; (ii) Quest was not actually able to compute the TBox rewriting of the 23
queries used in our experiments, which are very long conjunctions of atoms, so
we could not even compare the query rewriting performances of the two systems.</p>
      <p>
        Nyaya [6] is a novel system which allows for query answering over
ontologies speci ed into linear Datalog , a language that essentially corresponds to
DLR-Lite [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] (i.e., to the extension of DL-Lite with n-ary predicates), and
allows for FOL-rewritable query answering of UCQs. In Nyaya, Datalog
ontologies are mapped through plain Datalog rules to a speci c centralized storage
system which maintains both data and meta-data according to the Nyaya
metamodel. As in Mastro, answering a query posed over such an ontology is done
by rst rewriting the query according to the ontology, using an algorithm which
can be seen as a variation of the PerfectRef algorithm of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and then rewriting it
according to the mapping, which is done in Nyaya through standard unfolding.
Nyaya does not present particular optimizations for both the rewriting steps,
whereas it concentrates in optimizing centralized data storage. In this respect,
it is not speci cally tailored to data integration and cannot be directly applied
in an e cient way to this setting.
      </p>
      <p>Other DL-Lite-based approaches and reasoners have been developed, which,
however, are not able to deal with full OBDA scenarios. In [9] an alternative
approach to query answering is presented. Besides a (less complex) query
reformulation step, such an approach requires to suitably \extend" the ABox (managed
by a RDBMS) with the aim of reducing the amount of rewritten queries produced
by the reformulation step. The experimental results support well this approach
(notice that in Mastro the size of the reformulation may be exponential in the
size of the input query). However, the ABox manipulation that it requires makes
it extremely di cult to apply this approach in an OBDA scenario.</p>
      <p>The Requiem reasoner [12] implements a rewriting algorithm which reduces
the number of queries in the nal reformulation, still being purely intensional
like Mastro. However, it currently supports none of the Mastro advanced
features, such as identi cation or EQL constraint management, nor mappings to
external databases.</p>
      <p>The OWLGres prototype [19], which allows for TBox speci cation in
DLLite, uses the PostgreSQL DBMS for the storage of the ABox, and provides
conjunctive query processing. The algorithm for query answering implemented
in OWLGres, however, is not complete with respect to the computation of the
certain answers to user queries.</p>
      <p>
        Mastro can also be compared with ontology reasoners which support DLs
di erent from DL-Lite, and in particular with their query answering
capabilities. In this respect, well-known DL reasoners such as RacerPro [7], Pellet [18],
Fact++ [20], and HermiT [17] provide only limited forms of query answering, i.e.,
instance checking/retrieval or grounded conjunctive query answering (c.f. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]),
since they are essentially focused on standard DL reasoning services. Although
some optimizations have been implemented, such systems are not able to deal
with very large ABoxes (e.g., with several millions of membership assertions) as
the ones we considered in our experiments. This is mainly due to the inherent
computational complexity of answering queries in the expressive DL languages
supported by the above mentioned systems.
      </p>
      <p>Acknowledgments. This research has been partially supported by the EU
under FP7 project ACSI { Artifact-Centric Service Interoperation (grant n.
FP7257593), and by Regione Lazio under the project \Integrazione semantica di dati
e servizi per le aziende in rete".
5. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Path-based
identi cation constraints in description logics. In Proc. of KR 2008, pages 231{241,
2008.
6. R. de Virgilio, G. Orsi, L. Tanca, and R. Torlone. Semantic data markets: a exible
environment for knowledge management. In Proc. of CIKM 2011, pages 1559{1564,
2011.
7. V. Haarslev, R. Moller, and M. Wessel. Description logic inference technology:
Lessions learned in the trenches. In Proc. of DL 2005, volume 147 of CEUR,
ceur-ws.org, 2005.
8. A. Y. Halevy, A. Rajaraman, and J. Ordille. Data integration: The teenage years.</p>
      <p>In Proc. of VLDB 2006, pages 9{16, 2006.
9. R. Kontchakov, C. Lutz, D. Toman, F. Wolter, and M. Zakharyaschev. The
combined approach to query answering in DL-Lite. In Proc. of KR 2010, pages 247{257,
2010.
10. D. Lembo, M. Lenzerini, R. Rosati, M. Ruzzi, and D. F. Savo.
Inconsistencytolerant rst-order rewritability of dl-lite with identi cation and denial assertions.</p>
      <p>In Proc. of DL 2012, volume 846 of CEUR, ceur-ws.org, 2012.
11. M. Lenzerini. Data integration: A theoretical perspective. In Proc. of PODS 2002,
pages 233{246, 2002.
12. H. Perez-Urbina, B. Motik, and I. Horrocks. A comparison of query rewriting
techniques for DL-lite. In Proc. of DL 2009, volume 477 of CEUR, ceur-ws.org,
2009.
13. A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati.</p>
      <p>Linking data to ontologies. J. on Data Semantics, X:133{173, 2008.
14. M. Rodriguez-Muro and D. Calvanese. High performance query answering over
dl-lite ontologies. In Proc. of KR 2012, 2012. To appear.
15. R. Rosati and A. Almatelli. Improving query answering over DL-Lite ontologies.</p>
      <p>In Proc. of KR 2010, pages 290{300, 2010.
16. D. F. Savo, D. Lembo, M. Lenzerini, A. Poggi, M. Rodr guez-Muro, V. Romagnoli,
M. Ruzzi, and G. Stella. Mastro at work: Experiences on ontology-based data
access. In Proc. of DL 2010, volume 573 of CEUR, ceur-ws.org, pages 20{31,
2010.
17. R. Shearer, B. Motik, and I. Horrocks. HermiT: A highly-e cient OWL reasoner.</p>
      <p>In Proc. of OWLED 2008, volume 432 of CEUR, ceur-ws.org, 2008.
18. E. Sirin and B. Parsia. Pellet: An OWL DL reasoner. In Proc. of DL 2004, volume
104 of CEUR, ceur-ws.org, 2004.
19. M. Stocker and M. Smith. Owlgres: A scalable OWL reasoner. In Proc. of</p>
      <p>OWLED 2008, volume 432 of CEUR, ceur-ws.org, 2008.
20. D. Tsarkov and I. Horrocks. FaCT++ description logic reasoner: System
description. In Proc. of IJCAR 2006, pages 292{297, 2006.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-</surname>
          </string-name>
          Schneider, editors.
          <source>The Description Logic Handbook: Theory, Implementation and Applications</source>
          . Cambridge University Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          , M. RodriguezMuro, R. Rosati,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ruzzi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Savo</surname>
          </string-name>
          .
          <article-title>The Mastro system for ontology-based data access</article-title>
          .
          <source>Semantic Web J.</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <volume>43</volume>
          {
          <fpage>53</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Data complexity of query answering in description logics</article-title>
          .
          <source>In Proc. of KR</source>
          <year>2006</year>
          , pages
          <fpage>260</fpage>
          {
          <fpage>270</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Tractable reasoning and e cient query answering in description logics: The DL-Lite family</article-title>
          .
          <source>J. of Automated Reasoning</source>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <volume>385</volume>
          {
          <fpage>429</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>