<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Synthesizing Extensional Constraints in Ontology-based Data Access</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Console</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maurizio Lenzerini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Mancini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Rosati</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Ruzzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Ing. Informatica, Automatica e Gestionale “Antonio Ruberti” S</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Several recent techniques and tools for Ontology-based Data Access (OBDA) make use of the so-called extensional constraints (a.k.a. ABox dependencies). So far, extensional constraints have been mainly considered in a setting where data are represented in an ABox, instead of external data sources connected to the ontology through declarative mappings. Moreover, the issue of how to generate extensional constraints in practice has not been addressed yet. In this paper we first provide a formal account of the notion of extensional constraints in a full-fledged OBDA setting, where an ontology is connected to the data sources of the information system by means of mappings, and then present an approach to the automatic generation of extensional constraints in such a setting. The technique we propose is based on the use of a first-order theorem prover that checks validity of relevant formulas built over the mapping views. The experiments we have carried out in real-world OBDA projects show the effectiveness of our approach in discovering large collections of extensional constraints entailed by the OBDA specification.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Ontology-based data management [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is a novel paradigm for managing,
governing and querying the data sources of an organization by means of an ontology. One
of the key aspects of ontology-based data management is ontology-based data access
(OBDA), where the ontology provides a formal representation of the domain of interest,
and is used as a tool provided to the user to express queries. A suitable system is able to
answering such queries by reasoning on the ontology and by extracting the appropriate
data from the data sources.
      </p>
      <p>
        Query answering in OBDA is currently a hot research topic (see e.g., [
        <xref ref-type="bibr" rid="ref12 ref3 ref4 ref5 ref8 ref9">5, 12, 4, 9, 3,
8</xref>
        ]). One of the outcomes of this research is a detailed study of the complexity of query
answering. In particular, the study of OBDA has focused on understanding which
ontology languages allow query answering to be performed with reasonable computational
complexity with respect to the size of the data. It is now well-known that efficiency of
query answering cannot be guaranteed if the ontology is expressed in a logic whose
expressive power exceeds the one of lightweight language [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. With such languages, in
particular with the languages of the DL-Lite family, conjunctive queries are first-order
rewritable, i.e., answering (unions of) conjunctive queries, (U)CQs, expressed over the
ontology can be reduced to the evaluation of a suitable first-order query (called the
perfect rewriting of the original query) expressed over the data sources.
      </p>
      <p>
        Recent works on this subject go beyond the mere rewritability property, and study
optimization techniques aiming at minimizing the size of the rewritten query. Many
proposals of this type assume that the query answering algorithm make use of additional
knowledge with respect to the usual input, which is constituted by the TBox
representing the ontology and the data at the sources. In [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13, 15, 14</xref>
        ] such additional knowledge
is expressed in terms of the so-called extensional constraints, where an extensional
constraint is an axiom that has the same form of a TBox assertion, but is interpreted as an
integrity constraint for the ABox (i.e., the set of asserted instances). An example of an
extensional constraint is an inclusion assertion A1 vext A2 specifying that the instances
that have been asserted for one concept A1 in the ABox form a subset of the instances
that have been asserted for the concept A2. By using such assertions, the query
answering algorithm is able to prune the space of query rewriting, for example by removing
those queries that are contained into other queries under the extensional constraints [
        <xref ref-type="bibr" rid="ref13 ref15">13,
15</xref>
        ]. For a very simple example, if we know that the extension of the concept
ForeignStudent is a subset of the extension of the concept Student, and if the query asks for all
instances of Student, then the query answering algorithm can simply avoid accessing
the instances of ForeignStudent.
      </p>
      <p>
        So far, extensional constraints have been mainly considered in the simplified setting
where data in the OBDA system are represented in an ABox, i.e., they are stored in
the form of a set of facts (possibly managed by a Database Management System
DBMS) in the alphabet of the ontology. Conversely, in a full-fledged OBDA system, the
ABox does not really exist. Rather, the TBox is mapped (through mapping assertions)
to existing data sources that are tipically managed by external systems (often, DBMSs).
In our experience, full-fledged OBDA is a very effective paradigm in practice, because
it allows mapping the ontology to the existing data sources of the organization (which,
therefore, need not to be modified). Since the data sources and the ontology are not
necessarily mutually coherent, the mapping mechanisms should be able to connect any
query over the data sources to the elements of the ontology. In this sense, OBDA can
be seen as a form of information integration, where data are not duplicated or moved
to an ABox, but remain in the data sources of the organization, continuing to serve
their purposes in the applications. Once we accept that data remain at the sources, we
also accept that they are subject to updates, according to the applications using them.
It follows that deriving extensional constraints by inspecting the current state of data
might be too expensive, or even unfeasible at all. For this reason, we follow the approach
alread implicit in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], based on static analysis of the OBDA specification. In other
words, we aim at deriving the extensional constraints from the OBDA specification,
independently of the specific state (i.e., on the specific data) of the current database.
      </p>
      <p>
        We also observe that the issue of how to generate extensional constraints has not
been addressed extensively. For example, while [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] discusses the idea of inducing
extensional constraints from mapping assertions, it does not present any systematic
techniques for doing so.
      </p>
      <p>Based on the above considerations, in this paper we provide the following
contributions.</p>
      <p>– We present (Section 3) a formal account of the notion of extensional constraints in
a full-fledged OBDA setting, where an ontology is connected to the data sources of
the information system by means of mappings. In particular, we define the notion
of valid extensional constraint in an OBDA specification.
– We observe that, due to the high expressive power of the mapping language,
deriving all the extensional constraints that are valid in an OBDA specification is
undecidable. Therefore, we propose (Section 4) a “best effort” approach to the automatic
generation of extensional constraints in such setting. The technique we present is
based on the use of a first-order theorem prover that checks validity of relevant
formulas built over the mapping views.
– We report (Section 5) on a first set of experiments we have carried out in real-world
OBDA projects. Such experiments demonstrate the effectiveness of our approach
in discovering large collections of extensional constraints entailed by the OBDA
specification, and prove that the use of theorem provers for this kind of task is very
promising.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>An OBDA system specification B = hT ; M; Si is characterized by three components:
the ontology T , the source schema S and the mapping M between the source schema
and the ontology.</p>
      <p>
        In this work, we assume that T is specified as a TBox in DL-LiteA [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which
is a logic of the DL-Lite family1 allowing for UCQ-rewritability of query answering,
i.e., answering unions of conjunctive queries (UCQ) posed over it can be reduced to
evaluation of a UCQ over the underlying data. DL-LiteA allows for specifying
concepts, representing sets of objects, roles, representing binary relations between objects,
and attributes, representing binary relations between objects and values. The syntax of
concept, role and attribute expressions in DL-LiteA is as follows:
      </p>
      <p>B
C
! A j 9Q j (U )
! B j :B</p>
      <p>R
Q
! Q j :Q
! P j P</p>
      <p>E ! (U )
F ! T1 j
V ! U j :U
j Tn</p>
      <p>In such rules, A, P , and U denote a concept name, a role name, and an attribute
name, respectively. P denotes the inverse of a role. B and R are called basic concept
and basic role, respectively. :B (resp. :Q, :U ) denotes the negation of a basic concept
B (resp. basic role, or attribute). The concept 9Q, also called unqualified existential
restriction, denotes the domain of a role Q, i.e., the set of objects that Q relates to some
object. Similarly, the concept (U ) denotes the domain of an attribute U , i.e., the set of
objects that U relates to some value. Conversely, (U ) denotes the range of an attribute
U , i.e., the set of values to which U relates some object. T1; : : : ; Tn are unbounded
pairwise disjoint predefined value-domains.</p>
      <p>A DL-LiteA TBox T is a finite set of assertions of the form</p>
      <p>B v C</p>
      <p>Q v R</p>
      <p>E v F</p>
      <p>
        U v V
(funct Q)
(funct U )
1 Not to be confused with the set of DLs studied in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which form the extended DL-Lite family,
also called the DL-Litebool.
      </p>
      <p>From left to right, assertions denote inclusions between concepts, roles, value-domains,
attributes, and functionality on roles and on attributes, respectively. Notice that in
DL-LiteA TBoxes we impose that roles and attributes occurring in functionality
assertions cannot be specialized (i.e., they cannot occur in the right-hand side of inclusions).</p>
      <p>
        The source schema S is a relational database schema, specifying both the structure
of the relational tables in the database, and the integrity constraints that the tables should
obey. A database D for S is called legal if it satisfies all the integrity constraints in S [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
To express queries over the source schema, we use SQL. Given an n-tuple of variables
x = x1; : : : ; xn, we use the notation QDB (x) for an SQL query of arity n, where xi
denotes the i-th attribute in the target list of the query. To simplify notation, we assume
that query attribute names are obtained by renaming attributes in the target list of the
SQL query with variable symbols. When the specification of the attributes x is not
necessary, we simply refer to the symbol QDB .
      </p>
      <p>
        As for the mapping M between the ontology and the source schema, we will use
a form of mapping assertion that slightly generalizes the well-known GAV mappings
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. An extended GAV mapping assertion m is an expression of the form
QDB (x)
cq (x)
where QDB (x) is an SQL query over S (called the view associated to m), and cq (x)
is a conjunctive query over T whose variables are those in x. The intuitive meaning of
a mapping assertion m of the above form is that every tuple satisfying view QDB also
satisfies the ontology query cq . Note that the extended GAV mapping language has the
same expressive power as ordinary GAV. However, in practice, the possibility of using
more than one atom in the head of a mepping assertion greatly simplifies the task of the
ontology engineer.
      </p>
      <p>
        As for the semantics of the ontology we adopt the standard semantics based on
firstorder interpretations, and we refer the reader to [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for a formal, detailed presentation
of the semantics of the whole OBDA specification.
      </p>
      <p>Example 1. We now present an OBDA system specification which is an excerpt of the
one we used in a real-world experimental project carried out in collaboration with the
Ministry of Economy and Finance.</p>
      <sec id="sec-2-1">
        <title>PublicOrg v Organization</title>
      </sec>
      <sec id="sec-2-2">
        <title>9worksWith v Organization</title>
      </sec>
      <sec id="sec-2-3">
        <title>PublicDep v PublicOrg</title>
      </sec>
      <sec id="sec-2-4">
        <title>9worksWith v Organization</title>
        <p>(funct name)
(funct address)
The concepts in the this fragment of the ontology are Organization (denoting
organizations), PublicOrg (denoting public organizations), and PublicDep (denoting public
departments). The inclusion assertions in the first row state that public organizations are
particular organizations, and public departments are particular public organizations. The
role worksWith relates organizations that work together (second row). The attributes of
the ontology are name, address, prjName. The axioms in the third column specify
that name and address are functional.</p>
        <p>The source schema S contains four relational tables.</p>
        <sec id="sec-2-4-1">
          <title>Dept MinistryA(dep id,dep name) and Dept MinistryB(dep id,dep addr)</title>
          <p>store data about departments belonging to MinistryA and MinistryB respectively.
Works On(dep id,proj name) store data about projects carried out by departments
and Cooperate(dept1,dept2) specify pairs of cooperating departments.</p>
          <p>M1 SELECT dep idASx; dep nameASy</p>
          <p>F ROM Dept M inistryA
M2 SELECT dep idASx; dep addrASy</p>
          <p>F ROM Dept M inistryB
M3 SELECT w1:dep idasx; w2:dep idasy;
w2:proj nameaszF ROM W orks Onw1;
W orks Onw2; Dept M inistryAd1;
Dept M inistryAd2
W HEREd1:dep id = w1:dep id
AN Dd2:dep id = w2:dep id
AN Dw1:proj = w2:proj</p>
          <p>AN Dw1:dep id &lt;&gt; w2:dep id
M3 SELECT d1:dep idasx; d2:dep idasy</p>
          <p>F ROM Cooperatec; Dept M inistryBd1;
Dept M inistryBd2
W HEREc:dept1 = d1:dep id
AN Dc:dept2 = d2:dep id
fx; y j PublicDep(x)^
name(x; y)g
fx; y j PublicDep(x)^
address(x; y)g
fx; y; z j worksWith(x; y)^
prjName(x; z)^
prjName(y; z)g
fx; y j worksWith(x; y)g</p>
          <p>The mapping M between the sources and the ontology contains the mapping
assertions illustrated in Figure 1. In words, M1 relates table Dept MinistryA to the
instances of concept PublicDep and their names. M2 relates table Dept MinistryB
to the instances of PublicDep and their addresses; M3 specifies that departments from
MinistryA that work on the same project actually work together, and therefore are
mapped to worksWith, together with the indication of the project they work for
(attribute prjName). Finally, M4 maps the notion of cooperation represented in table
Cooperate to the role worksWith in the ontology.</p>
          <p>
            Query answering in current OBDA systems [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] is performed by means of a
twophase query rewriting technique: the user query is first rewritten taking into account the
ontology, and then considering the mappings. The first phase, called ontology rewriting
phase, encodes part of the knowledge expressed by the TBox in the rewriting. More
precisely, given an OBDA specification B = hT ; M; Si and a query Q over T , an
ontology rewriting of Q under B is a query Q0 over T such that the certain answers
to Q0 with respect to h;; M; Si are exactly the certain answers to Q with respect to
B. Once ontology rewriting has been performed, the TBox can be disregarded and the
second phase, called mapping rewriting phase, can consider only mapping assertions.
The outcome of this second step produces an SQL query that is evaluated directly over
the sources: given and OBDA specification B = hT ; M; Si and a query Q0 over T , a
mapping rewriting is a query Q00 over S such that, for every legal database D for S, the
answers to Q00 over D are exactly the certain answers of Q0 with respect to h;; M; Si.
For further details on OBDA query rewriting we refer the reader to e.g. [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ].
3
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Extensional constraints</title>
      <p>In this section we formally define the notion of extensional constraint in an OBDA
specification.</p>
      <p>Definition 1. Let B = hT ; M; Si be an OBDA specification. An extensional constraint
for B is a statement of one of the following forms</p>
      <p>B1 vext B2
U1 vext U2</p>
      <p>B1 vext :B2
U1 vext :U2</p>
      <p>Q1 vext Q2
(functext Q)</p>
      <p>Q1 vext :Q2
(functext U )
where B1; B2 are basic concepts, Q; Q1; Q2 are roles, and U; U1; U2 are attributes.</p>
      <p>Notice that, by virtue of the characteristics of DL-LiteA, the set of all possible
extensional constraints for an OBDA specification is finite. Notice also that the syntax
of extensional constraints is essentially the same as the syntax of TBox assertions in
DL-LiteA. Intuitively, the difference between the two types of assertion is that, while
a TBox assertion states a general condition that is to be satisfied in all the models of
the TBox, an extensional constraint states a condition that the instances that have been
explicitly asserted in the knowledge base must satisfy. For example, an extensional
inclusion assertion between concepts (first type of assertions in the above list) specifies
that the instances that have been asserted for the left-hand side concept form a subset of
the instances that have been asserted for the right-hand side concept. In other words, an
extensional constraint is a sort of meta-level constraint regarding the instances explicitly
asserted in the knowledge base.</p>
      <p>The question now is to specify what does it mean for an instance to be explicitly
asserted in the knowledge base. Recent papers dealing with extensional constraints
assume that the knowledge base is constituted by a TBox and an ABox. In that context, the
explicitly asserted instances of an element of the knowledge base are those appearing in
the ABox. In this paper, the notion of OBDA specification is such that the knowledge
base is constituted by a TBox, a set of mappings and a database schema, and this has to
be taken into account in specifying the semantics of extensional constraints.</p>
      <p>To formally define the semantics of an extensional constraint, we make use of
several notions, that we now introduce. First, let us remind the reader about the meaning of
an extensional constraint in the case where we have a DL knowledge base hT ; Ai
constituted by a TBox T and an ABox A. If is an extensional constraint, let us indicate
with int the TBox axiom obtained by dropping the subscript ext from . For example,
if is A1 vext A2, then int is simply A1 v A2. We define the A-interpetation IA
for T as the interpretation for T such that a fact is true in IA if and only if 2 A.
It is now immediate to define the notion of satisfaction for extensional constraints: an
extensional constraint is said to hold in an ABox A if int is satisfied by IA.</p>
      <p>Let us now turn our attention to extensional constraints in an OBDA specification.
Since an extensional constraint says something about the extension of predicates, the
notion of satisfaction requires to fix such extensions. In an OBDA system, the extension
of predicates depends upon the data at the sources. It follows that in order to check
whether an extensional constraint is satisfied or not, we should resort to a database D
for S, and then verify whether the condition corresponding to the extensional constraint
is satisfied by the instances that the mapping retrieves from the data sources. To make
this idea precise, we introduce the notion of “ABox retrieved from a database through a
mapping”.</p>
      <p>Given an OBDA specification B = hT ; M; Si, and a database D that is legal for S,
a mapping m : QDB (x) cq (x) in M is said to generate a fact P (t) (where P is a
predicate in T and t is a tuple of constants of the same arity as P ) from D if (i) t is a
tuple in QDDB , (ii) cq (t) is the set of facts obtained by substituting t for x in cq(x), (iii)
P (t) is an atom in cq (t).</p>
      <p>The retrieved ABox associated to B and D, denoted M(D), is the set of ABox
assertions defined as fP (t) j there is m 2M that generates P (t) from Dg. With the notion
of retrieved ABox, we can now specify when an extensional constraint is satisfied by
an OBDA specification and a database.</p>
      <p>Definition 2. Let B = hT ; M; Si be an OBDA specification, D a legal database for
S, and an extensional constraint for B. Then is satisfied by (B; D) if holds in
M(D).</p>
      <p>The notion of satisfaction of an extensional constraint that we have just illustrated is
relative to a specific database D. In OBDA, the underlying database may be subject to
frequent changes. Therefore, basing our knowledge about the extensional constraints on
the specific database D at hand would imply to compute the set of relevant constraints
every time the database is modified. Since this is unfeasible, our approach is based on
the idea of considering the problem of deriving extensional constraints as a static
analysis problem, i.e., as a reasoning task over the OBDA specification. More specifically,
the extensional constraints that we will use for the optimization of query rewriting will
be those that are satisfied “independently” of the specific database at hand. To capture
this intuition, we introduce the notion of valid extensional constraints.
Definition 3. Let B = hT ; M; Si be an OBDA specification, and an extensional
constraint for B. Then is valid in B (or entailed by B) if for every legal database D
for S, we have that is satisfied by (B; D).</p>
      <p>The main problem addressed in this paper is defined as follows.</p>
      <p>Definition 4. The problem of computing the extensional closure of an OBDA
specification is the following: given an OBDA specification B = hT ; M; Si, compute the set
of all extensional constraints for B that are valid in B.</p>
      <p>Obviously, at the basis of the problem of computing the extensional closure of an
OBDA specification, there is an associated decision problem, defined as follows.
Definition 5. Extensional constraint entailment is the following decision problem:
given an OBDA specification B = hT ; M; Si, and an extensional constraint for
B, check whether is valid in B.</p>
      <p>Based on the fact the deciding containment between SQL queries is undecidable, it
is immediate to verify that the extensional constraint entailment problem is undecidable.
Theorem 1. Extensional constraint entailment is undecidable.</p>
      <p>So, on the one hand we aim at computing all the extensional constraints that are
valid in an OBDA specification; on the other hand, this task is undecidable. To address
this problem, we could in principle decide to resort to a less powerful query language
in the mapping M. For example, if we use conjunctive queries in M, then it is easy
to see that computing the closure of extensional constraints becomes feasible (since
query containment becomes decidable). Unfortunately, real world OBDA projects show
that one needs the full expressive power of SQL (or, at least, the fragment of SQL
corresponding to first-order logic queries) in modeling the mapping between the data
sources and the ontology. In order to provide a solution to this problem, we propose
to use a “best effort” approach, with the goal of deriving as many valid extensional
constraints as possible. In the rest of this paper, we show an approach based on the use
of a first-order theorem prover.
4</p>
    </sec>
    <sec id="sec-4">
      <title>A method for synthesizing extensional constraints</title>
      <p>We now describe our method for synthesizing valid extensional constraints from an
OBDA specification.</p>
      <p>
        Translation of source schema constraints Let S be a relational schema with integrity
constraints. We translate (a subset of) the integrity constraints of S into FO-sentences.
For ease of exposition, we consider the most used integrity constraints in relational
databases, i.e., key constraints and foreign key constraints [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (other kinds of constraints
that can be captured by first-order sentences could be considered as well):
– every key constraint key(r) = 1; : : : ; k is translated into the sentence
8x1; : : : ; xn; yk+1; : : : ; yn
      </p>
      <p>(r(x1; : : : ; xn) ^ r(x1; : : : ; xk; yk+1; : : : ; yn) ! xk+1 = yk+1 ^ : : : ^ xn = yn)
– every foreign key constraint r1[a1; : : : ; ak] r2[b1; : : : ; bk] is translated into the
sentence 8x1; : : : ; xn(r1(x1; : : : ; xn) ! 9y1; : : : ; yn k(r2(t1; : : : ; tm))), where,
for every j 2 f1; : : : ; kg, tbj = xaj (and ti = yi for the other positions).
Let ic(S) be the first-order sentence corresponding to the conjunction of all the above
sentences.</p>
      <p>Translation of extensional constraints In the following, and without loss of
generality, we will consider a normalized, “pure GAV” version of the set of mappings, in which
the head of every mapping assertion is constituted by a single atom (and the SQL query
is projected onto the attributes mentioned in such an atom). In fact, given an extended
GAV mapping M, it is easy to compute (in polynomial time) a pure GAV mapping
split (M) that is equivalent to M.</p>
      <p>Given a first-order formula whose free variables are x1; : : : ; xn, and given an
ntuple of terms t, we denote by (t) the formula obtained from by replacing the free
variables x1; : : : ; xn with the terms t.</p>
      <p>Given an SQL query QDB (x) used in the mapping m 2 M, there are two cases: (i)
QDB (x) corresponds to a first-order query, i.e., a first-order formula with free variables
x. In this case, let fol (QDB (x)) be such a first-order query; (ii) QDB (x) does not
correspond to any first-order query (e.g., because of aggregation operations). In this
case, we make use of a new relation name rm (i.e., a relation name not used in S)
and define fol (QDB (x)) = rm(x). The intuition behind this forced translation of
nonfirst-order SQL queries is to encode them with new, generic predicates (using a different
predicate for every such query) that have no special property and that are used to encode
queries whose semantics is “unknown”. Then, given a mapping assertion of the form
QDB (x) A(x), we define fol ( ) as fol (QDB (x)).</p>
      <p>Given a set of mappings M and a concept name A, the set of relevant mappings for
A (denoted by RelMap(A; M)) is the set of mapping assertions 2 M such that the
head of has the form A(x). Analogously, given a role name P , RelMap(P; M)) is the
set of mapping assertions 2 M such that the head of has the form P (x1; x2), and
given an attribute name U , RelMap(U; M)) is the set of mapping assertions 2 M
such that the head of has the form U (x1; x2).</p>
      <p>Given a concept A, the FOL-unfolding unf ( ; M) of a basic concept atom, basic
role atom, or attribute atom with respect to a set of mappings M is defined as follows:
unf (A(x); M) = W
unf (9P (x); M) = W 2RelMap(A;M) fol ( )(x)
unf (9P (x); M) = W 2RelMap(P;M) 9y( fol ( )(x; y))
unf (P (x1; x2); M) = W 2RelMap(P;M) 9y( fol ( )(y; x))
unf (P (x1; x2); M) = W 2RelMap(P;M) fol ( )(x1; x2)
unf (U (x1; x2); M) = W 2RelMap(P;M) fol ( )(x2; x1)</p>
      <p>2RelMap(U;M) fol ( )(x1; x2)</p>
      <p>Then, we define the function ec that translates DL-LiteA extensional constraints
with respect to a set of mappings M as follows.</p>
      <p>– inclusion axioms between basic concepts: let be the axiom B1 v B2. Then,
ec ( ; M) = 8x( unf (B1(x); M) ! unf (B2(x); M));
– disjointness axioms between basic concepts: let be the axiom B1 v :B2. Then,
ec ( ; M) = 8x( unf (B1(x); M) ! : unf (B2(x); M));
– inclusion axioms between basic roles (and attributes): let be the axiom R1 v R2.</p>
      <p>Then, ec ( ; M) = 8x; y( unf (R1(x; y); M) ! unf (R2(x; y); M))
(analogously for inclusions axioms between attributes);
– disjointness axioms between basic roles (and attributes): let be the axiom R1 v
:R2. Then, ec ( ; M) = 8x; y( unf (R1(x; y); M) ! : unf (R2(x; y); M))
(analogously for disjointness axioms between attributes);
– role (and attribute) functionality axioms: let be the axiom (funct R). Then,
ec ( ; M) = 8x; y; z( unf (R(x; y); M) ^ unf (R(x; z); M) ! y = z)
(analogously for attribute functionality axioms).</p>
      <p>Verification of extensional constraints The above first-order translation of both
database integrity constraints and extensional constraints is related to the entailment
of extensional constraints by the following theorem.</p>
      <p>Theorem 2. Let B = hT ; M; Si be an OBDA specification and let be an extensional
constraint defined over the alphabet of T . If ic(S) ! ec ( ; M) is a valid first-order
sentence, then is valid in B.</p>
      <p>Notice that the converse of above theorem holds in the case when: (i) all constraints
on the database schema S are keys or foreign key constraints; (ii) all the SQL queries
used in the mapping M correspond to first-order queries.</p>
      <p>An (incomplete) algorithm for synthesizing extensional constraints Based on
Theorem 2, we are now able to define a technique for automatically deriving a set of
extensional constraints entailed by an OBDA specification B = hT ; M; Si.</p>
      <p>The algorithm for the automatic processing of the OBDA specification and
generation of the extensional constraints is the following.</p>
      <sec id="sec-4-1">
        <title>Algorithm computeObdaEC</title>
        <p>Input: an OBDA specification B = hT ; M; Si
Output: a set of extensional constraints entailed by B
begin</p>
        <p>= ;; compute M0 = split (M); compute ic(S);
for each extensional constraint over the alphabet of T do
begin
compute ec( ; M0);
ask a FO theorem prover whether the sentence ic(S) ! ec( ; M0) is valid;
if the FO theorem prover answers YES before a fixed timeout
then = [ f g
end;
return
end</p>
        <p>The algorithm computeObdaEC starts by computing the pure GAV version M0 of
the mapping Mough the function Split . Then, it generates all the possible extensional
constraints that can be formulated in DL-LiteA over the alphabet of the TBox T ; such
a set is finite and can be computed in polynomial time with respect to the size of T .
For every such extensional constraint , the algorithm builds the first-order sentence
ic(S) ! ec( ; M0) that encodes the entailment of with respect to B, and verifies
the validity of such a formula. To solve the first-order validity problems mentioned in
Theorem 2, the algorithm resorts to an external first-order theorem prover.</p>
        <p>Of course, Theorem 1 implies that we cannot hope to actually derive the whole
extensional closure of the OBDA specification: however, we can compute a sound
approximation (i.e., a subset) of such an extensional closure. This incomplete solution is
certainly acceptable, since extensional constraints are used to optimize reasoning about
the OBDA, and even an incomplete set of such constraints can effectively be used for
such a goal.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>We have implemented the algorithm computeObdaEC in a tool for discovering
extensional constraints entailed by an OBDA specification. This tool provides a
multithreading implementation which takes advantage of the modern multi-core
architectures, performing several entailment tests at the same time. With respect to the algorithm
presented in Section 4, our implementation only considers extensional inclusions.</p>
      <p>We experimented the tool within a joint project between Sapienza University of
Rome and the Italian Ministry of Economy and Finance. The project includes an
ontology defined over an alphabet containing 164 concept names, 47 role names, 86
attribute names and a set of 263 mapping assertions involving around 60 relational
tables. The full list of mapping assertions used for the experiments, along with the
source database schema, are available at http://hyper.dis.uniroma1.it/
˜quonto/dl2013/. In the same page it is possible to find the whole set of newly
discovered extensional constraints (expressed in OWL functional-style syntax). We
remark that no constraints are asserted on the original database. Therefore the discovery
of extensional constraints is based only on mapping assertions.</p>
      <p>We ran the tests over an 8 Core CPU machine equipped with 8 GB RAM: such
an architecture allowed us to run 8 threads simultaneously. The theorem prover used
was E-Prover2, configured with 1024 MB of maximum memory and a time-out of 30
seconds. We used the computeObdaEC algorithm to generate all the possible
entailment problems for inclusion dependencies and submitted them to the theorem prover.
The algorithm ran 798336 tests in total, and only in the 5,45% of the cases the proof
exceeded the 30 seconds time out and was terminated, whereas in the 0.019% of cases
the presence of non-FOL operators (e.g., use of the group by clause) prevented the
formulation of the containment problem.</p>
      <p>The experimental results show the effectiveness of the proposed methodology with
respect to the goal of discovering extensional constraints: 177 extensional inclusion
dependencies were discovered. The whole analysis took 46h 47m 51s. We consider this
amount of time reasonable, although considerably high, because this formal analysis of
mappings is conducted off-line; moreover, the set of derived extensional constraints can
be easily adapted to changes in the set of mappings, by adding incrementally new
entailed constraints when new mapping assertions are added. Similarly, extensional
constraints can be deleted when mappings that generated it are removed from the set of
mappings.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>We have introduced the notion of extensional constraints in a full-fledged OBDA
setting, and we have illustrated a technique for synthesizing a set of extensional constraints
that are valid in an OBDA specification. Our technique is based on the use of a
firstorder theorem prover, and the experiments we have carried out in real-world projects
show that the approach is very promising.</p>
      <p>
        In the future, we aim at continuing our work along several directions. First, we
want to extend our approach to deal with the integrity contraints defined in the database
schema, based on the observation that such constraints may help in deriving new
extensional constraints. We also plan to check how the extensional constraints derived
by our method affect the whole query answering process (using, e.g., the Prexto query
rewriting algorithm [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]). The first experiments in this respect are promising. For
example the rewriting of the query q(x) : senza underlying(x); ISIN (x; y) computed
without taking into account extensional constraints contains 186 disjuncts, while such
number drops to 18 if we consider the derived extensional constraints (the two
rewritings are shown in http://hyper.dis.uniroma1.it/˜quonto/dl2013/).
Finally, we want to consider even more features of SQL, and study mechanisms for
using first-order theorem provers for approximating the reasoning tasks needed to derive
extensional constraints.
      </p>
      <p>Acknowledgments. This research has been partially supported by the EU under FP7
project Optique – Scalable End-user Access to Big Data (grant n. FP7-318338).
2 http://www.eprover.org</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hull</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Vianu</surname>
          </string-name>
          .
          <source>Foundations of Databases. Addison Wesley Publ. Co.</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Artale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          .
          <article-title>The DL-Lite family and relations</article-title>
          .
          <source>J. of Artificial Intelligence Research</source>
          ,
          <volume>36</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>69</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Cal</surname>
          </string-name>
          <article-title>`ı, G. Gottlob, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pieris</surname>
          </string-name>
          .
          <article-title>New expressive languages for ontological query answering</article-title>
          .
          <source>In Proc. of AAAI</source>
          <year>2011</year>
          , pages
          <fpage>1541</fpage>
          -
          <lpage>1546</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodriguez-Muro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ruzzi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Savo</surname>
          </string-name>
          .
          <article-title>The Mastro system for ontology-based data access</article-title>
          .
          <source>Semantic Web J.</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>43</fpage>
          -
          <lpage>53</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Tractable reasoning and efficient query answering in description logics: The DL-Lite family</article-title>
          .
          <source>J. of Automated Reasoning</source>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <fpage>385</fpage>
          -
          <lpage>429</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Conceptual modeling for data integration</article-title>
          . In A. T.
          <string-name>
            <surname>Borgida</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Chaudhri</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Giorgini</surname>
          </string-name>
          , and E. Yu, editors,
          <source>Conceptual Modeling: Foundations and Applications - Essays in Honor of John Mylopoulos</source>
          , volume
          <volume>5600</volume>
          <source>of LNCS</source>
          , pages
          <fpage>173</fpage>
          -
          <lpage>197</lpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>F.</given-names>
            <surname>Di Pinto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mancini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ruzzi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Savo</surname>
          </string-name>
          .
          <article-title>Optimizing query rewriting in ontology-based data access</article-title>
          .
          <source>In Proc. of EDBT</source>
          <year>2013</year>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>G.</given-names>
            <surname>Gottlob</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Orsi, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pieris</surname>
          </string-name>
          .
          <article-title>Ontological queries: Rewriting and optimization</article-title>
          .
          <source>In Proc. of ICDE</source>
          <year>2011</year>
          , pages
          <fpage>2</fpage>
          -
          <lpage>13</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Toman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          .
          <article-title>The combined approach to query answering in DL-Lite</article-title>
          .
          <source>In Proc. of KR</source>
          <year>2010</year>
          , pages
          <fpage>247</fpage>
          -
          <lpage>257</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          .
          <article-title>Data integration: A theoretical perspective</article-title>
          .
          <source>In Proc. of PODS</source>
          <year>2002</year>
          , pages
          <fpage>233</fpage>
          -
          <lpage>246</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          .
          <article-title>Ontology-based data management</article-title>
          .
          <source>In Proc. of CIKM</source>
          <year>2011</year>
          , pages
          <fpage>5</fpage>
          -
          <lpage>6</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Linking data to ontologies</article-title>
          .
          <source>J. on Data Semantics</source>
          , X:
          <fpage>133</fpage>
          -
          <lpage>173</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. M.
          <article-title>Rodr´ıguez-</article-title>
          <string-name>
            <surname>Muro</surname>
            and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Calvanese</surname>
          </string-name>
          . Dependencies:
          <article-title>Making ontology based data access work in practice</article-title>
          .
          <source>In Proc. of AMW</source>
          <year>2011</year>
          , volume
          <volume>749</volume>
          <source>of CEUR, ceur-ws.org</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. M.
          <string-name>
            <surname>Rodriguez-Muro</surname>
            and
            <given-names>D. Calvanese.</given-names>
          </string-name>
          <article-title>High performance query answering over DL-Lite ontologies</article-title>
          .
          <source>In Proc. of KR</source>
          <year>2012</year>
          , pages
          <fpage>308</fpage>
          -
          <lpage>318</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          . Prexto:
          <article-title>Query rewriting under extensional constraints in DL-Lite</article-title>
          .
          <source>In Proc. of ESWC</source>
          <year>2012</year>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>